• Hirom@beehaw.org
    link
    fedilink
    arrow-up
    2
    ·
    14 hours ago

    This could further accelerate the arms race between malicious srappers and websites.

    My fear is this would create collateral damage, block legitimate scrappers and visitors, hassle people with an increasing number of captcha.

  • AceFuzzLord@lemmy.zip
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    18 hours ago

    This idea that you could have Cloudflare help by telling off AI crawlers sounds nice, but how long until it becomes a premium feature that requires loads of money to operate because AI companies lobby them to make it inaccessible to the masses? Or something equally as bad happens?

    • Zerush@lemmy.mlOP
      link
      fedilink
      arrow-up
      31
      ·
      edit-2
      1 day ago

      Oh yes, they have and more, they are one of the most powerfull security and AI company with a ton of services. Perfectly capable to remove the plug of any service and web. Sadly with similar privacy concerns as Google.

      https://www.cloudflare.com/

      • Ulrich@feddit.org
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        16
        ·
        1 day ago

        Okay great. Go ahead and explain to me how they plan to fight an army of bots doing everything they can to be invisible?

        • sudo@programming.dev
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          2 hours ago

          How about you just read up on Cloudflare Turnstile instead of acting like you know anything? Here are some notable methods:

          • Residential IP requirements
          • TLS Fingerprinting
          • Canvas Fingerprinting

          It’s still possible to get around these but it’s not easy. You either must have as good network engineers on staff as Cloudflare or pay some third party service to unlock it for you. All Cloudflare needs to do is keep their prices lower than the third party services.

        • HelloRoot@lemy.lolBanned from community
          link
          fedilink
          English
          arrow-up
          34
          ·
          edit-2
          1 day ago
          1. they can already block VPN traffic (unless you use their VPN)

          2. their whole business model is based on them being a man in the middle that decrypts ssl and analyses the packets plainly

          3. about a third of the worldwide websites are using cloudflare so they have a pretry good birds eye view on behaviour of any machine, datacenter or ip range that will be visiting a lot of websites, which in turn will trivially whether it is normal user behaviour or a crawler.

          • Zerush@lemmy.mlOP
            link
            fedilink
            arrow-up
            12
            arrow-down
            3
            ·
            edit-2
            1 day ago

            Not the first time that, with all my privacy measures on, instead of an page, I see the page from Cloudflare analyzing if I am a bot, before it let me access the page I wanted. Invisible in the web is only a bad joke. Anybody is visible in the moment he goes online, irrelevant if he use VPN, TOR or whatever, this times have passed. Believing it is as hilarious as in the Movie Independence day infecting with an Virus an Alien Mothership, using an crappy Laptop (I have laughed a lot with this scene).

          • Ulrich@feddit.org
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            12
            ·
            1 day ago

            they can already block VPN traffic unless it goes through their VPN

            Yeah that’s how most VPNs work.

            their whole business model is based on them being a man in the middle that decrypts ssl and analyses the requests plainly

            Okay? Analyze all you want. They can’t stop bots on any of the other sites they regulate either.

            about a third of the worldwide websites are using cloudflare so they have a pretry good birds eye view on behaviour of any machine that will be visiting a lot of websites

            Great. Bots intentionally change up their behavior and identifying information as to be undetected.

            • PowerCrazy@lemmy.ml
              link
              fedilink
              English
              arrow-up
              8
              arrow-down
              1
              ·
              edit-2
              1 day ago

              They can’t stop bots on any of the other sites they regulate either.

              Why not? They are doing edge caching, they can literally just block the connection from visiting the site just like they do with their DDoS mitigation.

              • Ulrich@feddit.org
                link
                fedilink
                English
                arrow-up
                2
                arrow-down
                4
                ·
                1 day ago

                they can literally just block the connection

                block which connection? Again, these AI companies know people don’t want them crawling their sites and they do everything they can to be invisible. This has been an issue for years at this point.

                just like they do with their DDoS mitigation

                blocking DDoS is trivial by comparison.

            • HelloRoot@lemy.lolBanned from community
              link
              fedilink
              English
              arrow-up
              9
              arrow-down
              1
              ·
              1 day ago

              They can’t stop bots on any of the other sites they regulate either.

              They can and do. What is blocked depends on what the website owner sets as settings in cloudflare.

              Bots intentionally change up their behavior and identifying information as to be undetected.

              If they have to crawl the web while behaving like a normal human, it will be magnitudes slower and more costly.

              • Ulrich@feddit.org
                link
                fedilink
                English
                arrow-up
                2
                arrow-down
                5
                ·
                1 day ago

                What is blocked depends on what the website owner sets as settings in cloudflare.

                And how does the owner know which connections are bots?

                If they have to crawl the web while behaving like a normal human, it will be magnitudes slower and more costly.

                They don’t care, they have trillions of dollars of VC money to power through.

                • HelloRoot@lemy.lolBanned from community
                  link
                  fedilink
                  English
                  arrow-up
                  4
                  arrow-down
                  2
                  ·
                  1 day ago

                  The owner sets the level. If they set strict level, all bots are blocked.

                  They do care. VC funding happens because the result is profitable. If it is less profitable, there will be less funding because of higher investment risk.

        • webghost0101@sopuli.xyz
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          15 hours ago

          Based on the headline this is not about blocking ai scrappers but by making them pay to do it.

          Based on the discussion below which moved that goalpost the most likely answer is by making it cheaper to scrape “legally” then it costs to mimic millions of individual residential browsers with human users.

          I don’t know how many aces cloudflare has up its sleep to detect secret ai but they definitely have the tools to make it pretty costly and difficult. There is also a broadband impact difference between a few capitalist megapigs scrapping secretly versus loads of global basement dwellers and smaller companies scrapping worry free.

        • Steve@communick.news
          link
          fedilink
          English
          arrow-up
          10
          arrow-down
          2
          ·
          edit-2
          21 hours ago

          It’s literally what their entire business is based on. Filtering good and bad traffic.