• MonkderVierte@lemmy.zip
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    3 months ago

    I just thought that having a client side proof-of-work (or even only a delay) bound to the IP might deter the AI companies to choose to behave instead (because single-visit-per-IP crawlers get too expensive/slow and you can just block normal abusive crawlers). But they already have mind-blowing computing and money ressources and only want your data.

    But if there was a simple-to-use integrated solution and every single webpage used this approach?

    • witten@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 months ago

      Believe me, these AI corporations have way too many IPs to make this feasible. I’ve tried per-IP rate limiting. It doesn’t work on these crawlers.

    • explodicle@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      What if we had some protocol by which the proof-of-work is transferable? Then not only would there be a cost to using the website, but also the operator would receive that cost as payment.

      • Taldan@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 months ago

        It’s theoretically viable, but every time that has been tried has failed

        There are a lot of practical issues, mainly that it’s functionally identical to a crypto miner malware

    • Taldan@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      Are you planning to just outright ban IPv6 (and thus half the world)?

      Any IP based restriction is useless with IPv6

    • daniskarma@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      2
      ·
      3 months ago

      Solution was invented long ago. It’s called a captcha.

      A little bother for legitimate users, but a good captcha is still hard to bypass even using AI.

      And I think for the final user standpoint I prefer to lose 5 seconds in a captcha, than the browser running an unsolicited heavy crypto challenge on my end.

        • daniskarma@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          3 months ago

          I tried, and not really.

          I had to scrape a site that have some captcha and no AI was able to consistently solve it.

          In order to be able to “crack it” I had to replicate the captcha generation algorithm best I could and train a custom model to solve it. Only then I could crack it open. And I was lucky the captcha generation algorithm wasn’t to complex and it was easy to replicate.

          This amount of work is a far greater load than Anubis crypto challenges.

          Take into account that AI drive ocr drinks from existing examples, if your captcha is novel enough they are going to have a hard time solving it.

          It also would drain power, which is the only point of anubis.

          • mholiv@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            3 months ago

            There is a difference between you (or me) sitting at home working on this and a team of highly motivated people with unlimited money.

            • daniskarma@lemmy.dbzer0.com
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              3 months ago

              The thing is not that it cannot be done, the thing is that the cost is most likely higher than Anubis.