Codeberg: army of AI crawlers are extremely slowing us; AI crawlers learned how to solve the Anubis challenges.

Pro@programming.dev · edit-2 2 months ago

Codeberg: army of AI crawlers are extremely slowing us; AI crawlers learned how to solve the Anubis challenges.

MonkderVierte@lemmy.zip · edit-2 3 months ago

I just thought that having a client side proof-of-work (or even only a delay) bound to the IP might deter the AI companies to choose to behave instead (because single-visit-per-IP crawlers get too expensive/slow and you can just block normal abusive crawlers). But they already have mind-blowing computing and money ressources and only want your data.

But if there was a simple-to-use integrated solution and every single webpage used this approach?

witten@lemmy.world · 3 months ago

Believe me, these AI corporations have way too many IPs to make this feasible. I’ve tried per-IP rate limiting. It doesn’t work on these crawlers.

explodicle@sh.itjust.works · 3 months ago

What if we had some protocol by which the proof-of-work is transferable? Then not only would there be a cost to using the website, but also the operator would receive that cost as payment.

Taldan@lemmy.world · 3 months ago

It’s theoretically viable, but every time that has been tried has failed

There are a lot of practical issues, mainly that it’s functionally identical to a crypto miner malware

Taldan@lemmy.world · 3 months ago

Are you planning to just outright ban IPv6 (and thus half the world)?

Any IP based restriction is useless with IPv6

strict0768@lemmy.world · 3 months ago

Not really true, you can block ranges.

Taldan@lemmy.world · 3 months ago

Okay, but how does that help? Or are you suggesting just wholesale banning entire ISPs?

daniskarma@lemmy.dbzer0.com · 3 months ago

Solution was invented long ago. It’s called a captcha.

A little bother for legitimate users, but a good captcha is still hard to bypass even using AI.

And I think for the final user standpoint I prefer to lose 5 seconds in a captcha, than the browser running an unsolicited heavy crypto challenge on my end.

Kissaki@feddit.org · 3 months ago

For years, we’ve written that CAPTCHAs drive us crazy. Humans give up on CAPTCHA puzzles approximately 15% of the time and, maddeningly, CAPTCHAs are significantly easier for bots to solve than they are for humans.

https://blog.cloudflare.com/turnstile-ga/

I hate captchas.

MonkderVierte@lemmy.zip · 3 months ago

AI is better at solving captchas than you.

daniskarma@lemmy.dbzer0.com · 3 months ago

I tried, and not really.

I had to scrape a site that have some captcha and no AI was able to consistently solve it.

In order to be able to “crack it” I had to replicate the captcha generation algorithm best I could and train a custom model to solve it. Only then I could crack it open. And I was lucky the captcha generation algorithm wasn’t to complex and it was easy to replicate.

This amount of work is a far greater load than Anubis crypto challenges.

Take into account that AI drive ocr drinks from existing examples, if your captcha is novel enough they are going to have a hard time solving it.

It also would drain power, which is the only point of anubis.

mholiv@lemmy.world · 3 months ago

There is a difference between you (or me) sitting at home working on this and a team of highly motivated people with unlimited money.

daniskarma@lemmy.dbzer0.com · 3 months ago

The thing is not that it cannot be done, the thing is that the cost is most likely higher than Anubis.