How Open-Source Devs Can Strike Back at AI Crawlers

AI web crawlers are becoming a serious headache for open-source developers. While these bots quietly mine data from across the web, many devs see them as relentless digital pests—comparing them to cockroaches that refuse to die. In response, a growing number of open-source contributors are fighting back with wit, creativity, and a bit of vengeance.

These bots, designed to scrape websites for training data, often ignore long-standing internet protocols and leave chaos in their wake. For open-source developers, whose projects rely on shared infrastructure and limited resources, the impact is especially severe.

Unlike commercial sites with guarded backend systems, free and open-source software (FOSS) platforms tend to keep their infrastructure publicly accessible. This openness is part of their ethos—but it also makes them an easy target for bots that don’t play by the rules.

AI crawlers, particularly those powering large language models, often disregard the robots.txt file. This decades-old tool tells bots which parts of a site to avoid. But as developer Niccolò Venerandi points out, many of these AI systems simply don’t care.

In January, FOSS developer Xe Iaso published a blog post titled a “cry for help,” detailing how AmazonBot repeatedly bombarded a Git server hosting open-source projects. The attack mimicked a DDoS event—slowing the server, eating up bandwidth, and making access nearly impossible.

Even worse, Iaso said the bot evaded detection by spoofing its identity, rotating IP addresses, and pretending to be real users. “It’s futile to block AI crawler bots because they lie,” the developer wrote. “They will scrape your site until it falls over—and then scrape it some more.”

Meet Anubis: The Bot-Busting Guardian of Git Servers

In retaliation, Iaso built a smart and slightly hilarious defense system called Anubis. Named after the Egyptian god of the dead, Anubis serves as a digital gatekeeper that tests every incoming request. Only human-operated browsers can pass through. Bots get the door slammed shut.

The system is a reverse proxy that challenges visitors with a proof-of-work test. If they pass, a cheerful anime depiction of Anubis appears. If not, they’re bounced before reaching the Git server.

Iaso’s clever use of mythology and anime isn’t just for laughs—it resonated. Since being shared on GitHub on March 19, Anubis has racked up over 2,000 stars, attracted 20+ contributors, and been forked nearly 40 times. The community clearly relates to the struggle—and the solution.

Developers Are Tired—and Getting Creative

The speed at which Anubis spread reveals a deeper frustration. Open-source maintainers across the web are facing the same battles—and taking drastic action to protect their work.

Drew DeVault, the founder of SourceHut, admitted he spends “20–100%” of his time just fending off aggressive AI scrapers. Outages, he said, are now part of his weekly routine.
Jonathan Corbet, editor of LWN.net, revealed his news site frequently slows under traffic that mirrors DDoS attacks—caused entirely by AI scrapers.
Kevin Fenzi, Fedora’s system admin, went as far as blocking all traffic from Brazil after repeated attacks. Others, like Venerandi, had to ban entire countries just to stay online.

The situation has become so extreme that the once-idealistic web of open-source collaboration is turning into a battleground.

Fighting Back With Traps, Tricks, and “Poison Content”

Beyond Anubis, other devs are building digital traps designed to waste AI bots’ time—or even pollute their datasets.

A tool called Nepenthes—named after a carnivorous plant—creates an endless maze of fake web pages filled with nonsense. AI bots stuck in the trap keep crawling through meaningless content, unable to escape. Its creator, “Aaron,” admitted to Ars Technica that the tool was intentionally hostile. The goal? Make bots pay for ignoring robots.txt.

Cloudflare has joined the fight too. Last week, the company launched AI Labyrinth, a new tool aimed at confusing bots and feeding them irrelevant junk instead of real website data.

On forums like Hacker News, developers have even joked about creating traps filled with absurd content—like articles praising bleach or touting measles as a performance enhancer. While satirical, the idea underscores a real desire to make AI scrapers regret their crawling spree.

The Bigger Message: “Just Stop Using This Garbage”

While tools like Anubis and Nepenthes are effective short-term solutions, many developers are calling for a larger shift. SourceHut’s DeVault urged the public to stop using AI tools trained on stolen content.

“Please stop legitimizing LLMs or AI image generators or GitHub Copilot or any of this garbage,” he pleaded. “Just stop.”

But that plea is unlikely to be heard in a world increasingly driven by AI. For now, developers are arming themselves with creativity and turning their frustration into clever counterattacks—with some dark humor along the way.

Share with others