It’s impossible, i got this instance to just see lemmy from my own instance, but no, it was slow as hell the whole week, i got new pods, put postgres on a different pod, pictrs on another, etc.

But it was slow as hell. I didn’t know what it was until a few hours before now. 500 GETs in a MINUTE by ClaudeBot and GPTBot, wth is this? why? I blocked the user agents, etc, using a blocking extension on NGINX and now it works.

WHY? So google can say that you should eat glass?

Life is now hell, if before at least someone could upload a website, now even that is painfull.

Sorry for the rant.

  • parpol@programming.dev
    link
    fedilink
    arrow-up
    2
    ·
    6 months ago

    Use Anubis. That’s pretty much the only thing you can do against bots that they have no way of circumventing.

  • flamingos-cant (hopepunk arc)@feddit.uk
    link
    fedilink
    English
    arrow-up
    2
    ·
    6 months ago

    You can enable Private Instance in your admin settings, this will mean only logged in users can see content. This will prevent AI scrapers from slowing down your instance as all they’ll see is an empty homepage, so no DB calls. As long as you’re on 0.19.11, federation will still work.

  • xep@fedia.io
    link
    fedilink
    arrow-up
    1
    ·
    6 months ago

    At some point they’re going to try to evade detection to continue scraping the web. The cat and mouse game continues except now the “pirates” are big tech.

  • jagged_circle@feddit.nl
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    6 months ago

    Just cache. Read only traffic should add negligible load to your server. Or you’re doing something horribly wrong

    • potatoguy@potato-guy.spaceOP
      link
      fedilink
      arrow-up
      0
      ·
      6 months ago

      They are 1 cpu and 1 gb of ram pods, postgres goes to 100% cpu on 500 requests per minute, after i put the NGINX extension, it reduced to at max 10%. On weaker servers, these bots make hell on earth, not the config.

      • Jerkface (any/all)@lemmy.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        6 months ago

        If it’s hitting postgres it’s not hitting the cache. Do you have a caching reverse proxy in front of your web application?

          • Jerkface (any/all)@lemmy.ca
            link
            fedilink
            English
            arrow-up
            1
            ·
            6 months ago

            The nginx instance you have in front of your app can perform caching and avoid hitting your app. The advantage is that it will improve performance even against the most stealthy of bots, including those that don’t even exist yet. The disadvantage is that the AI scum get what they want.

  • carrylex@lemmy.world
    link
    fedilink
    arrow-up
    0
    arrow-down
    1
    ·
    6 months ago

    So I just had a look at your robots.txt:

    User-Agent: *
      Disallow: /login
      Disallow: /login_reset
      Disallow: /settings
      Disallow: /create_community
      Disallow: /create_post
      Disallow: /create_private_message
      Disallow: /inbox
      Disallow: /setup
      Disallow: /admin
      Disallow: /password_change
      Disallow: /search/
      Disallow: /modlog
      Crawl-delay: 60
    

    You explicitly allow searching your content by bots… That’s likely one of the reasons why you get bot traffic.

    • plz1@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 months ago

      AI crawlers ignore robots.txt. The only way to get them to stop is with active counter measures.