Iocaine poisons bots with Gergely Nagy

Josh talks to Gergely Nagy (algernon) about his tool Iocaine. Iocaine creates a maze to trap scraping bots in a world a fake pages they cannot escape. algernon tells us how Iocaine effectively traps bots by serving them endless loops of nonsensical URLs and web pages. It’s an extremely clever tool that’s designed to be completely hidden from normal users, but not hidden to the scrapers.

Episode Links

This episode is also available as a podcast, search for “Open Source Security” on your favorite podcast player.

Episode Transcript

Josh Bressers (00:00) open source security talking to algernon He’s a serial drive-by contributor and the creator of Iocaine. I’m really excited of algernon here because

The episode before this, we talked to Xe about Anubis and algernon created Iocaine which is a, I don’t know if I’d say it’s similar necessarily, but it solves the same problem. So algernon, tell us who you are and we’ll kind of go from there. Cause I’m super stoked to talk Iocaine today.

Gergely Nagy (algernon) (00:24) ⁓ I’m just our average random guy from the internet who’s obsessed with self-hosting stuff and it turns out that self-hosting has its dangers and there are some very aggressive claw-crawlers who want to take you offline as a side effect of eating everything you ever put online.

⁓ I got tired of that and decided to do something about it and Iocane was born. Since then I’ve been, well how to put it nicely, I’ve been on kind of a crusade to make the life of ⁓ crawler operators a living hell. And most of their Fridays an exercise in

panicking

Josh Bressers (01:23) I love it. Okay, so let’s just start with the name Iocane for probably the young people who might not know Iocane is a brilliant name and you must tell us why you named it that

Gergely Nagy (algernon) (01:25) Yeah.

So there’s this book and movie called The Princess Bride. Both the book and the movie are awesome, well worth watching, I highly recommend it. It’s not a modern movie, but watch it still, you won’t regret it. Iocaine in the movie is a poison, a colorless, odorless…

basically undetectable and deathly poisoned from Australia. I chose this name because the entire point of Iocane is to be undetectable and poisonous for the crawlers, because the idea is that it will only be seen by the crawlers and it will not affect the legitimate visitors. ⁓

So colorless, otherless, undetectable, yet deadly for… ⁓ not the right people, because crawlers are not people, deadly for the right agents.

Josh Bressers (02:47) Nice, nice, I love that. And I’m gonna skip ahead just a tad because one of the other things I found in your writing is you have the, you call the Nam-Shub of Enki which that, is that meant to be ⁓ a snow crash reference or did you come about it differently?

Gergely Nagy (algernon) (02:52) Yeah.

Yes.

⁓ That actually came up on the Fediverse. was writing a… So, Iocane is really just a runtime. It provides a lot of functionality ⁓ and the real brains of it, at least nowadays.

it didn’t start that way originally so the brain of it is a custom script you can write it comes with a built-in one and Nam-Shub of Enki is the one I use in production in front of my servers ⁓ when I was writing it ⁓ one of the things I usually do is live toot my development work ⁓

⁓ ideas onto the Fediverse ⁓ and someone suggested Nam-Shub of Enki which I believe is an old Sumerian legend or something. ⁓ It was also used in Snowcrash, yes, but the reference is not to Snowcrash but the original ⁓ Sumerian legend.

Josh Bressers (04:12) Yes.

Gergely Nagy (algernon) (04:27) a magical song that makes things work, kind of.

Josh Bressers (04:27) Okay, okay, that, I was just curious.

Right, right. Okay. Let’s, let’s back up to where we should have started this is explain kind of what Iocane is and how it works, because we will make the assumption that there’s a certain amount of foundational knowledge around like bots scrape things and they’re creating havoc. But you created a novel way to, I don’t know if trap the bots is the right way to think of it, but I’ll let, I’ll let you explain.

Gergely Nagy (algernon) (04:37) Yep. ⁓

Yeah.

So the original idea was actually very simple and this was before Iocane even started. So the first time I noticed that I’m being attacked by the crawlers, ⁓ I just blocked a number of ⁓ user agents.

and serve them the script of the Bee Movie as it is ⁓ because that was easy and that was fun. This worked for a while but it turned out that it doesn’t really scale because a lot of the bots don’t identify themselves they try to hide and detecting them from

the reverse proxy configuration just matching on the user agent that wasn’t working anymore because some of them are randomized so even if I write a clever regular expression I’m not going to catch all of them ⁓ so I figured I’ll use something else I wanted to take this whole logic ⁓ within

an application which is a little bit more clever than the language the reverse proxies usually allow you to do. I want to do full real programming language, ⁓ one that is also fast enough because my ⁓ reverse proxy is running on a very cheap ⁓ virtual private server like

It has two virtual cores shared with everyone else on the physical box, it’s not very powerful. I needed something that is fast. ⁓ So that’s kind of why I wrote Iocaine in the first place.

What I wanted to avoid is affecting real users. So I wanted a solution that is not proof of work. I don’t like things that make it hard for the humans. Because what is hard for the humans is usually easy for a computer. Just throw more money at it.

more compute and it will get through. ⁓ While a human will stand there with their 10 year old laptops and mobile phones and just wait for things to complete. So I wanted something that is efficient, passive, entirely on the server side. So if a user sees it, ⁓ I consider that a bug.

So yeah, that’s the basic idea. How it works is that you put it between your reverse proxy and your back-end or static files. When you pipe all the requests through it, it will look at ⁓ each request in isolation. It doesn’t keep state, nothing.

It decides what to do with the request, whether it’s a bot and to serve ⁓ with a garbage or ⁓ serve with a challenge. It can serve challenges. I’ll talk about that a little bit later. Or it tells the reverse proxy to go ahead and ⁓ serve with the real content. Iocane itself doesn’t know what the real content is. It has no idea where it is.

It just sends back. ⁓

a misdirected request response to the reverse proxy and you configure the reverse proxy to catch that response and fall back to serving the real content instead.

This way it basically stays hidden and like I said the

brains of it all is the script that runs for every request. It doesn’t just decide what to do, it also generates the content and by default it generates a lot of garbage. ⁓ The reason that generates a lot of garbage instead of cutting the connection of serving something static is to trap the bots in an infinite maze. You have like

I don’t know, a dozen links on any page with randomized words and they all lead to a different page of similar garbage. The trick is, which Iocane does, or well, Nam-Shub of Enki does, is that it also places a unique identifier into every link.

not unique in a way that it tracks the visitor, unique in the sense that it’s a substring which never ever appears outside of the maze. Which means that if the same crawler comes back, or a different crawler comes back, and tries to access that URL, if it has the substring, I can ignore everything else.

and I can directly route it back into the maze ⁓ because I know that there’s no other way it could have known that URL but if it was in the maze before. ⁓ This is very useful ⁓ when you have ⁓ crawlers which piggyback on real browsers through malware and viruses and other ⁓ malicious ⁓

ways ⁓ because detecting a real browser and figuring out whether it’s a real browser or one

controlled by a robot, it’s not easy and you can’t really do it from a single request. If you look at the whole picture and observe the behavior, you can, but Iocane does not keep that kind of state. It doesn’t remember what happened before it can observe. And the other problem is that there’s not much time to observe.

because these robots do like a dozen requests and then they disappear and come again from a different IP address. So even tracking them would be hard. But if you catch the robots which are easier to identify, serve them poisoned URLs, then the ones piggybacking on the real browsers

they will access the poisoned URLs and then you can identify them that way. And you can stop those two even though you didn’t really identify the browser as a malicious agent, you identify the URL. And this is only possible if you serve them garbage. If you simply stop them, they will remember the URLs they…

Josh Bressers (13:23) Sure.

Gergely Nagy (algernon) (13:34) ⁓ collected from some other source like ⁓ links from other sites or search results or whatever

But, yeah.

Josh Bressers (13:45) Okay. That’s

a lot. That’s really cool. No, this is amazing. Okay. Let me, let me briefly repeat what I think I heard and you can correct me because I feel like there’s a lot going on here. Okay. So Iocane, Iocane is fundamentally a programming language at the end of the day, right? That you created. Sort of.

Gergely Nagy (algernon) (13:49) Yeah, sorry.

Yeah. Okay.

Yes, sorry.

⁓ almost

it’s like a runtime or a shell so there’s actually three programming languages supported within ⁓ Iocane so you can program Iocane with three different languages Roto which is a bespoke programming language embeddable in ⁓ Rust it’s completely new it was made by ⁓

folks of ⁓ NLNet Labs who made the NSD unbound and rotunda software, named servers and routing stuff. ⁓ You can program it in Lua and you can program it in Fennel. Fennel is a Lisp which compiles to Lua.

Josh Bressers (15:05) Okay, okay.

Gergely Nagy (algernon) (15:07) It’s a runtime with three different languages you can program it in.

Josh Bressers (15:13) Okay,

okay, so then the idea is you have a script that I know it comes with one, you’ve got your nab shum script, but you have the script and the intent is that a bot makes a request that should be invalid, right, where you, I assume, return some sort of 400 normally on a server, right?

Gergely Nagy (algernon) (15:20) Mm-hmm. Mm-hmm.

⁓

No. A bot makes a request, any kind of request. ⁓ They usually start with trying to access URLs or pages which they collected from elsewhere. So if you have a site and someone linked to your site from elsewhere

or if you appear in Google results or something like that. That’s the URL bots will visit first. ⁓ Iocane doesn’t handle only the missing page requests. It filters all of them.

Josh Bressers (16:30) Okay, okay, I gotcha, right. Then the intent is that if it is a human, we traverse the website as expected, right? Where I go to normal webpages, I read normal content, everything is fine. If Iocane determines it’s a bot, which we’ll have to, I know you said it, there are many ways it does that. And I’m sure there’s even more than you’ve explained already. But if it determines it’s a bot,

Gergely Nagy (algernon) (16:31) Mm-hmm.

Mm-hmm.

Josh Bressers (17:00) then it sends it down a path of infinite garbage. And you have a demo site, I’ll make sure I put a link in the show notes specifically to the demo site, but it is, it’s brilliant. Cause it has a title that is obviously just a random collection of words. They always seem to have an image on them. ⁓ And it either looks like it’s a QR code or just like it, the first time I saw it, looks like an Atari 2600 game kind of is what the image looks like. And then,

Gergely Nagy (algernon) (17:04) Yes.

Yes.

You

Josh Bressers (17:28) you have just nonsensical text mixed with URLs and the URLs just keep going kind of deeper down the rat hole, right? And so tell me about this random text. Where do we get all this random text from?

Gergely Nagy (algernon) (17:35) Mm-hmm. Yep.

anywhere you like. By default the random text is based on Iocaine’s own source code and ⁓ half of the documentation.

⁓ That’s not very useful. It obviously looks like garbage and it’s mostly rust mixed with ⁓ markdown and yeah that’s not not very pretty. It also doesn’t do images by default. That’s a feature of Nam-Shub of Enki. ⁓

You can configure ⁓ where it takes the source from. Any kind of public domain work is awesome. The more you have, the better. It works similarly to large language models, except it’s not an LLM. It’s a very simple Markov chain.

⁓ It basically looks at all the sources you give it, ⁓ calculates the probability of one word following another, and for every request it rolls a dice, chooses a starting word and then just goes from there, and kind of randomizes which word comes next.

favoring ones with higher probability so you get pretty much nonsensical text but sometimes there’s some very little nugget of

of sensible looking text in it, sometimes.

Josh Bressers (19:37) I mean, I

would agree with that. I look at, I’m just on a random page right now, pulled up over here on the other screen. And like, one of the sentences is, guard would suddenly appear from round the table. Like, that is a collection of words that when, obviously it doesn’t mean anything, but it’s not like, I’ve seen, like if you just randomly pick words, you’d have, it wouldn’t have a flow. I don’t know if that makes sense to say how a flow to it, like especially, cause obviously you’re producing English, but it is.

Gergely Nagy (algernon) (19:43) Mm-hmm.

Uh-huh.

Yeah.

Josh Bressers (20:07) It is believable text versus clear nonsense. So, so let me ask you about that. So you are from Hungary and obviously English is not the predominant language of Hungary necessarily. So does this support, like, does it care what language you use as the input?

Gergely Nagy (algernon) (20:13) Yes.

Mm-hmm.

No, it doesn’t.

It doesn’t. You can input it. So it works best if you use the same language. ⁓ So you can give it multiple sources. On my demo site, the sources are Orville’s 1984, Aldous Huxley’s Brave New World, the script of the Bee Movie, ⁓

What else? Jonathan Swift’s Modest Proposal and a Hungarian Poem. That one is very short, it appears very, very rarely, but I’ve seen it. Well, not the whole thing, but I’ve seen parts of it appear ⁓ from time to time. ⁓

So it works best if you give it ⁓ many sources, ⁓ somewhat similar in style and in the same language, because then it can produce more believable text. Whether it’s English or Hungarian or French or German, it doesn’t really care. It’s purely statistical.

Josh Bressers (21:43) Okay, that’s pretty cool.

Gergely Nagy (algernon) (21:47) with a little bit of randomness.

Josh Bressers (21:52) That’s awesome. I love it. it’s so cool. And you have, explain this in your how-to of how to get this all set up. And in fact, you reference 1984 and brave new world as two of your data sources, which, and I love throwing in the Bee Movie script. That’s like, that’s so good. I love that.

Gergely Nagy (algernon) (21:57) Mm-hmm.

Yes.

Josh Bressers (22:10) Okay. ⁓ so, so let me ask algernon on is I feel like the, the, the problem and the idea kind of makes sense here, right? From just anyone paying attention these days, kind of on the internet. So I’m how I guess. I, I don’t know how to ask this question well, so I’m going to do my best to someone way, way through it, but you mentioned there’s kind of two angles, right? There’s the.

Gergely Nagy (algernon) (22:11) ⁓ yep, go ahead.

Mm-mm.

Josh Bressers (22:38) the kind of putting the onus on the, the other end where having them do a proof of work or something like that. And then what you are doing is kind of leading these scraper bots down these like paths of nonsense. And I’m curious, like, what is the load look like for that? Cause like I’ve heard of these scrapers in these bots sending, you know, just like absurd numbers of requests. And obviously

Gergely Nagy (algernon) (22:42) Mm-hmm.

Yes.

Josh Bressers (23:07) You’re creating load on your gear when you do this.

Gergely Nagy (algernon) (23:12) Yes, let me show you a few things. One moment, please.

Josh Bressers (23:16) Okay.

Gergely Nagy (algernon) (23:20) Okay, so this is the load on my Fronting server.

Josh Bressers (23:27) Okay, so I’m gonna describe this for the audio only listeners because algernon just pulled up ⁓ a top on a server and Iocane is currently using like 26 % of a CPU and I don’t know what caddy is. What is caddy? Caddy’s using, the proxy and that’s using the rest of the CPU basically. Okay, and then is vector the database that’s…

Gergely Nagy (algernon) (23:45) Kedi is my reverse proxy. It’s… yes.

Mm-hmm.

⁓

No vector is log collector and transformer shipper thing.

Josh Bressers (24:03) Okay.

Okay. So, I mean, ⁓ a quarter of the CPU we’re talking for Iocane usage here. And do know how many like requests per second this is right now?

Gergely Nagy (algernon) (24:14) Yes, I can check.

Josh Bressers (24:15) And of course you have the

nice and, he just pulled up a dashboard, obviously waiting for this question, I assume.

Gergely Nagy (algernon) (24:22) No,

I used it for something else.

Josh Bressers (24:28) Okay.

Gergely Nagy (algernon) (24:31) So right now it’s processing about…

260 or so requests per second, ⁓ which is not too bad. I mean, let me show you something.

Josh Bressers (24:49) And what percentage

is bots trapped in Iocane? Do you have that number?

Gergely Nagy (algernon) (24:54) ⁓ Yes, 99%.

Josh Bressers (24:57) my goodness, holy cow.

Gergely Nagy (algernon) (25:00) So this is what the last seven days look like. For the audio only listeners, the week starts with small waves of about 600 requests per second. Then…

A day later it drops down to about 200 and stays that way for a pretty long time. Then it jumps back up to 600 requests per second for five hours or so. Then it disappears for a while and jumps back to 200. ⁓

Josh Bressers (25:43) Is the disappearing? No, did it disappear because the bots quit or did your logging system fail?

Gergely Nagy (algernon) (25:51) Neither. I turned off the logging system temporarily. ⁓ So what happened here where ⁓ this green spike is, the five hour wave, is that I got attacked by a lot of bots which were all trying to download various images. They were all hitting the poisoned URLs.

but they were coming at me at like 4000 requests per second. It kept sound at 600 plus the others simply because that’s how much my CPU was able to serve. So if I had more CPU power, this would be a way higher number.

Josh Bressers (26:27) Wow.

Gergely Nagy (algernon) (26:45) At one point I had 20,000 connections open waiting to be served. it was kind of harsh. It knocked me offline for a few hours. They disappeared here ⁓ because I made a small change to my system.

Josh Bressers (26:51) Wow.

Gergely Nagy (algernon) (27:08) Instead of ⁓ driving the mode through Iocane, I configured my reverse proxy to look at the URL and if it notices the poisoned substring, it would simply cut the connection without serving anything. That brought the load down considerably. ⁓

Mostly because the bottleneck in Iocane is the communication between the reverse proxy and Iocane, because you have to serialize the entire HTTP request to Iocane, which then parses it, does its thing, and then constructs an HTTP response.

to the reverse proxy which it parses and potentially serves it. So there is a lot of ⁓ memory allocation and CPU work going on. But in this case I was able to skip all that. Unfortunately by skipping Iocane I lost the metrics because I don’t have this kind of detailed metrics for cady

So there’s a little bit of lul there. The CPU usage was still similar ⁓ except ⁓ cady was using about one and a half cores, so about ⁓ 150 % CPU and Vector was using none because I turned logging off at the same time.

and Iocane was using about 2 % because, well, vast majority of things it didn’t need to serve I turned it back on later so it came back to normal the funny thing is that if I change the order of

rule sets ⁓ Nam-Shub of Enki uses to process the request if i checked for the poison urls first then this entire graph would be very different it would pretty much all be this green thing ⁓ because by this point

almost every single robot that visits me visits these URLs. ⁓ The reason I’m not doing that is because I want these fancy stats and I want to collect more data about the robots to observe their behavior but if I wanted to be efficient I could reduce the load a lot.

Josh Bressers (30:26) Okay, so here’s how I want to end this show. And this is just asking you some questions about kind of that, that we’ll say efficiency.

So I want to ask about the kind of efficiency you mentioned.

Gergely Nagy (algernon) (30:40) Mm-hmm.

Josh Bressers (30:44) I, this is a, this is so hard to talk about cause it’s such a, it’s such a different way to think about all this, but like, so today you are serving up trash to a bot that you know is a bot. Like why not just return a 404 or something like that to the bots? Is the intent to trap them so they can’t bother other people or is the, I’m curious what the purpose of that is.

Gergely Nagy (algernon) (30:59) Mm-hmm. Yes.

⁓

Two

reasons. The original reason was that originally I served them a random choice of 404 or 401 I think or an internal server error, some random error. What I observed is that if I did that they tried to disguise themselves more and they came back with more agents.

⁓ If I serve them a 200 response they are like yay we got data and they will eat it up and we be very happy about it ⁓ and behave a little bit better so that’s the original reason ⁓ another reason is that if I serve them

some kind of garbage, I can serve them the poisoned URLs. If I serve them a 404, I can still send them poisoned URLs, but they are much less likely to crawl that because it’s coming from a basic page or an error page. If I serve them something that they believe is real, they will more likely… they will…

So they will be more likely to keep going. And the more URLs I send them, ⁓ the more URLs they will have in their queue, the more poison they ingest. ⁓ And that’s great. Because it turns out that’s helpful. Because I can indemnify themselves. ⁓

Josh Bressers (33:01) Yeah, I…

Gergely Nagy (algernon) (33:07) via those identifiers. The more fake URLs I serve, ⁓ with each page served, I increase the chance that they will visit those fake URLs a lot. And that’s very helpful for detection.

Josh Bressers (33:30) Yeah, yeah. And look, spite driven development is the best development of all. So I’m, I’m A okay with that. Okay. So this is not something I had in my brain as a possibility. So let me make sure I understand this. You’re saying when you return like a 400 error to a bot, they actually try harder to like pretend there’s something else and find more stuff. Assuming they’re assuming you’re purposely giving them a 400 because they’re a bot, not because there’s nothing there.

Gergely Nagy (algernon) (33:34) Yes. Yeah.

Yes.

Yes, not all of the bots, mind you. There are some reasonably well behaving ones. ⁓ Usually the ones that identify themselves like ⁓ ChatGPT or Anthropics Claude ⁓ they usually don’t come back with more crawlers. ⁓

They don’t really respect the 404s either. If I serve them ⁓ a temporarily overload 429, yeah, 429, ⁓ they don’t respect that either. Even if I set a retry after, they’ll just come back the next second. Doesn’t matter. But they don’t punish me for the errors. But there are…

Josh Bressers (34:34) 29.

Gergely Nagy (algernon) (34:51) These other crawlers which try to disguise themselves, ⁓ most of perplexity’s crawlers by the way, they never identify themselves.

A lot of the Chinese ones, for example, they will punish me if I serve them error pages, so I try not to.

Josh Bressers (35:22) This is bananas. Like talking to you and talking to Xe are just like, I didn’t even understand this problem. mean, I, guess what’s next. Let’s, let’s end on that. Like, what do you think is, what do you think IOK needs or what do you think is happening in this space? Is there any hope at all? I feel like this is just a lost cause some days.

Gergely Nagy (algernon) (35:24) Yes.

It is not. It’s most definitely not. So I know this for two reasons. I used to be very pessimistic and all doom-y and gloomy because I spent a metric ton of time trying to find these bots. ⁓

It turns out that…

It actually works. Why? I think it was Antropic who came out with an article ⁓ a few weeks ago about a research how a small number of malicious pages can poison an entire model. So even if they’re in just a million pages

200 of poisonous ones can pretty much kill a modell, kill the training completely, make it garbage. And I was like sitting there and looking at my metrics and at the time I was serving about 30 million garbage requests every single day. ⁓

Josh Bressers (37:13) Wow.

Gergely Nagy (algernon) (37:16) By the way, my record so far is 70 million in a single day. That’s absolutely bananas. ⁓ But yeah, so imagine that if 200 garbage pages can ruin a model, what does 30 million do? And I’m not the only person who does this. There’s a lot of us.

and it’s an infinite maze, so if you trap them there once that entire training model is completely ruined

On top of that, and this is very funny thing, ⁓ you can sometimes actually exploit their browsers. ⁓ There’s a known Chrome bug called Brash. ⁓ It’s a denial of service attack. You can server a very tiny amount of JavaScript.

which will try to update the document title like a million times a second. And if it’s run on a Chromium-derived browser, the tab will freeze, and soon after the entire browser freezes, and the only way to stop that is to kill the browser and start again. ⁓ So I figured some of the bots pretend to be Chrome.

What if I serve them this? And what I noticed is that large number of them suddenly almost disappeared. Previously they made about 100 requests per IP address within a few seconds.

About a minute after I deployed this ⁓ exploit, they made about five requests for every page, which is about the number of assets you need to load for a page. You can request about five, you can do about five requests before the JavaScript runs.

Yeah, I crashed their browsers and they disappeared.

Josh Bressers (39:51) You know they’re running Chrome.

Gergely Nagy (algernon) (39:53) Yes, they run crow, I was able to trap them because they visited the poisoned URLs. And yeah, well, sorry guys, you’re going to crash. And if I crash them, they can’t crawl anyone else. It also takes up a whole lot of memory, so there’s a small chance that I may have exhausted the memory of the entire computer.

and that would be beautiful but unfortunately I don’t have confirmation for that I can only I only know that they pretty much stopped crawling with me with and yeah that’s beautiful ⁓ the other reason I’m not do me anymore is ⁓ precisely the

Josh Bressers (40:36) my goodness.

Gergely Nagy (algernon) (40:48) ⁓ I I showed the ⁓ dashboard for the big spike where they ended up

So the big spike where the…

generated URLS the poisoned URLS were the top thing. ⁓ That meant that every other ⁓ defense I had has been defeated, because this is the last one.

but they still get caught. So if I’m… ⁓ This means that if you catch the primitive crawlers and serve them garbage URLs, you can catch the more complicated ones too. Which means they never make it through. Because they simply do not have…

⁓ the URLs necessary, because driving a real browser is expensive, so they are not going to do that all the time. They have a lot of money to burn, yes, but not that much to run real browsers all the time. ⁓ They would have to build a lot of data centers for that first. ⁓

The other, third reason I’m not worried is that detecting the primitive bots is basically three ifs in a trench coat.

And if you serve them poisoned URLs, then the whole thing is pretty much for very, very easy checks. Like the first one is there’s this project called ai.robots.txt which collects ⁓ a list of known AI crawlers which identify themselves.

So if the user agent is in that list, send it into the maze, easy. It’s a simple regular expression, very cheap to perform, very easy, catches a lot of browsers. The second check is, look at the user agent. If it contains Chrome or Firefox in the user agent, check the ⁓ sec-

⁓ What’s it called? CEH-mode. Whatever. Another header. It’s documented somewhere in the Iocaine source code and I tooted about it on the fediverse a lot of times. So anyway, you check this other header. If it doesn’t exist, it’s pretty much guaranteed to abort. You drive it into the maze. And that’s it.

you pretty much caught all of them except the real chromes which you catch by the poisoned URLs and that’s the fourth check. That’s it. This is all it takes to get rid of 99 % of the bots. Four very simple checks.

Josh Bressers (44:22) I mean, it’s…

It’s pretty complicated and you’re clearly a pretty smart guy to put all this together.

Gergely Nagy (algernon) (44:30) It sounds complicated, most of the complication is discovering the pattern. Once the pattern is discovered, putting it in your reverse proxy configuration is like four lines.

Josh Bressers (44:45) Yeah, oh yeah, it’s very easy to set up. All right, so let’s end on that. There I will have links to Iocaine and Elginon and all that fun stuff in the show notes. The how-to, the getting started is very easy, very well written. It’s fun to read. Even if you don’t want to set it up, read the how-to. It’s very amusing But Elginon, I mean, thank you so much. Thank you for the work you’re doing. Thank you for educating us on this topic, because I think it is fascinating and very important.

Gergely Nagy (algernon) (44:50) Mm-hmm.

Josh Bressers (45:13) And just thank you for being here. This has been a treat. It’s been awesome.

Gergely Nagy (algernon) (45:17) It’s been my pleasure. Thank you for having me.

Episode Links#

Episode Transcript#

Episode Links

Episode Transcript