Bots are currently scraping the internet for LLM training data at unprecedented rates[1][2][3], driving up costs and destabilizing public-facing websites. I want to talk about how this has been particularly difficult for wikis, and has gotten much worse in the last few months.
Run labyrinths and feed them bullshit
That would be great if they could handle the traffic. For a lot of smaller sites, the AI scrapers are effectively a DDOS. It’s pushing these folks into the arms of Cloudflare.
I think it’s one of the worst aspects of the AI bubble. I’m worried about Cloudflare’s outsized market power.