Then they’d have to bother understanding the content and downloading it as appropriate. And you’d think if anyone could understand and parse websites in realtime to make download decisions, it be giant AI companies. But ironically they’re only interested in hoovering up everything as plain web pages to feed into their raw training data.
Then they’d have to bother understanding the content and downloading it as appropriate. And you’d think if anyone could understand and parse websites in realtime to make download decisions, it be giant AI companies. But ironically they’re only interested in hoovering up everything as plain web pages to feed into their raw training data.
The same morons scrape Wikipedia instead of downloading the archive files which trivially can be rendered as web pages locally