Josh chats with Xe Iaso, the creator of Anubis the web AI firewall. We discuss how Anubis is tackling bots and scrapers. The discussion around the scrapers is fascinating and challenging, these things are everywhere and don’t behave very nicely. There’s also discussion about running a successful open source project. Xe has a lot of experience to share with us, you’re going to learn something new with this one.
Episode Links
This episode is also available as a podcast, search for “Open Source Security” on your favorite podcast player.
Episode Transcript
Josh Bressers (00:00) Today, open source securities talking to Xe Iaso the lead developer of Anubis and a prolific tech blogger. I am extremely excited to talk to Xe today because Anubis is like an amazing tool taking over the internet. So welcome to the show Xe
Xe Iaso (00:15) Thanks for having me.
Josh Bressers (00:16) I am ecstatic to have you here. So let’s just start with, tell us what Anubis is for anyone who might not know.
Xe Iaso (00:23) Well, Anubis started out as a thing that I use to make my Git server not get bullied off the internet by Amazon’s scraper. And it’s sort of developed into a generic web application firewall. And ⁓ it’s been deployed by the United Nations. ⁓ I’m in talks with educational institutions last week.
Josh Bressers (00:32) right.
Xe Iaso (00:46) at the time of recording, gave a talk to a bunch of SRE librarians about how it works and the problems that I had setting it, building it, and it’s surreal to see this happen.
Josh Bressers (01:01) I mean, that’s awesome. It’s, I guess it’s a double-edged blade, right? It’s you, create a thing and you hope it succeeds, but then when it succeeds, it’s like, my goodness, this is so much work.
Xe Iaso (01:10) Actually, it’s even more hilarious because I created a thing, put it in my throwaway code repo, and then ⁓ blogged about it, and then it took off.
Josh Bressers (01:20) that’s even better. So, okay. So explain to us what this is because right now, if you run a web server on the internet, you are getting hammered by bots and Anubis. I guess it’s, I don’t know how, what to even say it not solves it. Like maybe helps with like, like what is it doing to prevent the bot abuse? Let’s say that.
Xe Iaso (01:42) Well, Anubis is mostly doing rate limiting. ⁓ Like at first the idea was, ⁓ you know, I’ll throw some proof of work challenge because a proof of work function is something that’s really mathematically convenient for like rate limiting things because it is hard to solve, trivial to verify. And, you know, I slapped that together in an afternoon of rage while my Git server was being hammered because I made a Kubernetes misconfiguration.
Josh Bressers (02:12) Nice.
Xe Iaso (02:12) that’s a different story. Pro tip, don’t put high traffic stuff on rotational storage.
Josh Bressers (02:20) Right. man.
Xe Iaso (02:21) Yeah,
and
Over time, ⁓ I thought that the proof of work validation was actually doing something for security things, but no, it’s actually ⁓ any barrier to the website makes the low effort scrapers very confused and give up.
Josh Bressers (02:41) Okay, okay. Right, cause I know, so let’s, I’m going to tell you what I think is going on and you can tell me if I’m right, because I’ve read a bunch of your blog posts. I’ve looked at your GitHub page. I’m not an expert, but when I go to a number of websites now, I get the little Anubis pop-up, which I love your Anubis cat girl logo, which is awesome. And what’s happening is the web server is sending me some sort of computationally expensive challenge.
which my browser is then solving. And since I’m just going to one site, it’s a couple seconds, whatever, I get to look at the funny cat. And then I send that challenge back to the server. The server says, this is legit and it lets me in, right? But when I am ⁓ one of the scraper bots that misbehave, I don’t like being presented with a whole bunch of computationally expensive challenges because I’m…
Obviously, my goal is to go fast, not to spend time solving math problems.
Xe Iaso (03:43) Yeah, most of the time those scrapers don’t actually run JavaScript.
Josh Bressers (03:49) ⁓ so what happens then?
Xe Iaso (03:51) ⁓ they get presented with a challenge page and cannot continue.
Josh Bressers (03:56) that’s awesome.
Xe Iaso (03:58) That’s
basically it. Like, yeah, that’s the entire thing. They try to load a challenge and they just can’t.
Josh Bressers (04:06) ⁓ excellent. I had no idea that happened. Wow, very cool. Okay, so that, guess that was one of my questions I had was like, what happens if maybe not JavaScript or what’s going on? So what a brilliantly simple solution to a stupid web scraper. I love that.
Xe Iaso (04:21) It is annoyingly simple.
Josh Bressers (04:24) Cool, cool. Okay. So I don’t even know where to start with this conversation. So I’m going to start with one of your blog posts that really piqued my interest. And by the way, I’ll put a link in the show notes for anyone. Xe is a prolific blogger and is a very good writer. So it’s delight to read this stuff. But the thing that really piqued my interest in your writing was the odd number of cores in the bug you found.
Xe Iaso (04:50) that one.
Josh Bressers (04:51) This one is, it’s such a fun,
Xe Iaso (04:51) gods that one.
Josh Bressers (04:52) like, why don’t you explain it? Cause I won’t possibly do it justice, but it’s such a fun read and it’s such a weird bug.
Xe Iaso (04:58) Okay, in order to Anubis with the challenge page exists in a weird middle ground where you want it to be just annoying that it’s a deterrent, not for, for like automated abuse, but not so annoying that it’s a deterrent for users. So as a result, like simultaneously, it has to be mathematically hard, but go away quickly. And that’s a really annoying middle ground that I absolutely hate trying to hit, but
One of the ways that I do this is by trying to spin up like as many threads, as many workers as your CPU has, but I divide the CPU, the worker count, the core count by two in order to avoid too much impact to the system and making phone and try to make phones avoid overheating.
I found out the hard way that there are some widely deployed phones like the Pixel 8 Pro that have nine cores. And it is. And as a result, each worker gets like a .5 in its identifier and then the JavaScript just adds the number to the string because we don’t have
Josh Bressers (06:02) It’s just so weird, like.
Xe Iaso (06:17) number types in JavaScript. What are you thinking? Everything’s IEEE 758 floating point numbers. What else do you need? And that caused the computer to send something to the server and say that, hey, the nonce that I figured out is 15.5. This is what solves it. And the server tries to parse that as an integer. it’s like, what the heck?
Josh Bressers (06:42) Nice.
Xe Iaso (06:47) I don’t know how to handle this and ⁓ rejects it.
Josh Bressers (06:52) So that means that person using that phone can’t access the webpage, right?
Xe Iaso (06:52) and
Yes. And I looked through my entire house, like literally every single computer that I have. And by sheer coincidence, like I have like the Mac books that I have, none of them are the SKU that has an odd number of cores. All of the, all of the server CPUs I have obviously have an even number of cores. Like my tower, that’s a Ryzen 7950X 3D that is 16 cores, 32 threads. So, you know, even numbers like
even my phone, which you know, iPhones, more likely to have odd numbers of cores. Nope. iPhone 7 Plus that we have laying around, which is the oldest, one of the oldest CPUs I have laying around in the house, still even number of cores. Like there is legitimately no way I could have encountered this. Like even this crappy Android phone I use for testing things still has an even number of cores. So yeah, I
And with the fact that hyperthreading has basically existed my entire career and has been on by default with basically every major desktop CPU, I didn’t even think it was possible for someone to design a CPU with an odd number of cores in its geometry. ⁓ Yeah, it’s great. I hate it.
Josh Bressers (08:17) I mean, this is what happens when you have a successful project though, is every ridiculous corner case no one can imagine is going to pop up at some point.
Xe Iaso (08:21) yeah.
Yeah, there was somebody that ⁓ submitted a fix to make one of Anubis’ challenge methods work on the 3DS browser.
Josh Bressers (08:33) Excellent. Excellent.
Xe Iaso (08:36) And to my shock and horror, actually works. I couldn’t find my 2DS to confirm it, but I know it’s in my house somewhere.
Josh Bressers (08:44) my goodness. That’s awesome. I love that. This is what makes open source so cool though, right?
Xe Iaso (08:47) Yeah, it’s…
Yeah, it’s… The versatility also sometimes does have downsides, like sometimes you end up merging features that you end up regretting. There’s a couple of those, I’m gonna have to figure out what to do with them at some point.
Josh Bressers (09:02) Yeah, well, you can never remove features, right?
Xe Iaso (09:05) I mean you can.
But can and should are different words in English. Not in other languages. I don’t know which ones, but at least in English they’re different.
Josh Bressers (09:10) That’s fair, that’s fair.
my goodness, I love it. That’s so good. I still every, I think of your blog post all the time, whenever I’m talking to someone about like weird bugs, cause I’m like, this is the weird, let me tell you about the weirdest bug I’ve ever read about. Cause it’s so good.
Xe Iaso (09:29) there’s another one when I finally find out what’s wrong with ⁓ iOS Safari vs. iPad OS Safari vs. Mac OS Safari with one of the corner cases. Sometimes Safari doesn’t store cookies and I can’t figure out why. I have emailed people. ⁓ If you’re at Apple and you have any insight into what’s going on please email me. I just want to fix this.
Josh Bressers (09:44) No.
Xe Iaso (09:57) At the very least, you make iPhone OS, iPad OS, and Mac OS Safari have the same damn cookie behavior? Because like I… Or at least some way to run this in CI because…
Josh Bressers (09:58) Bye
You know it’s like three different teams that don’t talk to each other. That’s how it always works. Yeah, yeah.
Xe Iaso (10:12) I do know it’s three different teams that don’t talk to each other,
but I hope that I can probably nerd-snipe someone into being like, okay, we’re just going to fix this because this is ridiculous.
Josh Bressers (10:21) That’s awesome. That’s awesome. Okay. So I want to reel us back into kind of the, say maybe the more general topic is you mentioned a moment ago that you’ve been talking to people like at the UN about Anubis. So I’m curious, what does that look like? Like, why does the UN care about this?
Xe Iaso (10:35) Um,
so I don’t actually talk to people at the UN. Uh, when a couple of months ago I searched for the, uh, making sure you’re not a bot and specifically the no script, uh, text that Anubis puts into every page. And I got a hit for one of the domains on the United Nations website, one of the United Nations website. I think it was UNESCO. And of course, what I immediately did was I DM the artist friend that I commissioned.
Josh Bressers (10:57) ⁓ nice.
Xe Iaso (11:04) with a link to the art that she drew for me with saying, ⁓ by the way, you’ve made UNESCO. And the reply I got back was what with a period after it. All lowercase, with a period.
Josh Bressers (11:14) Nice
Fantastic.
Xe Iaso (11:19) and I didn’t hear back from her for like 45 minutes because she was laughing.
Josh Bressers (11:25) I love that. ⁓ it’s so good.
Xe Iaso (11:29) But yeah, I have been trying to get in contact with the United Nations for months. I have given up. I made a press inquiry ⁓ asking if anybody at the policy toolbox team for UNESCO could get in touch with me because like I want to know their story. ⁓
Josh Bressers (11:45) Well, it’s probably a similar story to everyone, right? Their site is no doubt getting hammered to death by crap bots. And okay, so I do want to talk about that a little bit. So I was reading, I don’t remember where I even found this, but you mentioned in one of your writings that a lot of the bots look like just Google Chrome, right? They’re clearly being deceptive on purpose. And so like explain that to us. Like, what does that kind of mean? And like, why is this a problem for websites?
Xe Iaso (12:18) So.
Basically, ⁓ because of the economic system we live under, we want to have continuous growth. And one of the ways that we measure growth is via new users. And statistically, the most common browser on the internet is unmodified vanilla Google Chrome. And so a new user from a residential IP address using unmodified vanilla Google Chrome looks like the kind of user they want around.
Josh Bressers (12:28) Yes.
What do mean by they want around?
Xe Iaso (12:56) If you’re an e-commerce website, that is a signal that somebody’s probably going to want to buy something. If you are a SaaS company, that is a signal that someone is interested in your product. And like the list continues. Like it is ridiculous. Like from…
Josh Bressers (12:58) ⁓ right.
Sure, okay, okay.
Xe Iaso (13:22) So I have an at.techaro.lol email address. ⁓ It’s basically useless because of all the spam, but from looking at the patterns of spam that I see, a lot of those automated customer outreach things are using crappy scrapers like that to find email addresses to spam with, know, I’m the CEO, you have, hey, I’m the CEO.
I need 37 gas station hot dogs in 20 minutes for an important business meeting. I will not clarify more in person.
Josh Bressers (13:50) Yeah.
Right.
Right, right, exactly. Okay, so that’s interesting. So you’re talking about like even bigger than just AI, because my perspective on this, and I’m not near as plugged in as you are, is that the worst offenders are the AI companies. But I guess they’re scraping for ⁓ millions of reasons at this point, right?
Xe Iaso (14:15) Yeah, like at first I thought it was just the generative AI companies trying to DDoS things so other people can’t get training data. But then I realized that that’s conspiratorially minded and it doesn’t make sense when held up to the scrutiny of reality and like the why bother question. So after doing some more pattern recognition, doing some more like logging and pattern recognition, I have like
20 gigabytes of HTTP request metadata in an S3 bucket somewhere.
Josh Bressers (14:46) Wow.
Xe Iaso (14:48) It’s not enough, I’m going to need to get more at some point.
The main thing that I have figured out is that
These things seem to be everywhere and there doesn’t seem to be a clear pattern as to like who is doing it. Like…
early on when I was consulting with some people, our best idea as to what was going on is that there was some kind of data set arbitrage thing going on where somebody was trying to continuously scrape things and then sell it to companies as more up to date data because the scrape date is technically newer. And that’s like the least conspiratorial thing that we could think of that would possibly make sense. But like at some level there’s just…
Ever since the rise of chat GPT, there’s just more background noise on the internet from like crawlers clicking every link of every link of every link and not caring about the impact on the servers that they do. They don’t request robots.txt They don’t ⁓ identify themselves in their user agent header. Some of them actually use headless Google Chrome to do their scraping. Yeah.
Josh Bressers (16:04) wow. Which I guess
runs the JavaScript then. Although I guess in that context, would, Anubis would run, would just slow them down, right?
Xe Iaso (16:13) Yes, they also don’t save cookies, so they have to do a new challenge for every page.
Josh Bressers (16:20) Okay. ⁓ right. And we should clarify that as well, because you only get challenged by Anubis the first time. Right? Well, that’s the goal. Right. Yeah. Okay. I didn’t know that. That’s really interesting.
Xe Iaso (16:27) That’s the goal, yeah.
Yeah. And specifically a new user for a residential IP space using Google Chrome without an existing tracker cookie means that it shows up as a new visitor on analytics dashboards. So there’s also a really interesting like incentive in companies to not make the analytics numbers go down because that puts people’s jobs at risk.
Josh Bressers (16:55) Right,
interesting. And now I…
Xe Iaso (17:00) Yeah,
there is so much background radiation that if like a lot of these bots were stopped for 24 hours, a lot of marketing teams would be very concerned all simultaneously. I mean, I work in marketing. I know how this would be perceived.
Josh Bressers (17:20) Yeah. yeah. Right. Right. Because, well, it’s, I mean, this is your typical, like the drunk looking for their keys under the light, right? Like this is the, this is the metric marketing has. This is the metric they’re going to measure. So I, I get it. I’m not saying it’s wrong, but wow. Wow. And now we should also clarify there’s in one of the things I have like 30 tabs open in this other browser over here. So I can’t even keep track of all the crap I’ve read of yours. So I’m going to do my best to just remember a whole bunch of it.
Xe Iaso (17:30) Yeah. Yeah.
Josh Bressers (17:49) In one of the things you’ve written, you did say things like the internet archive bot is blocked by Anubis, right? Which is obviously not ideal for the archival purposes of internet archive.
Xe Iaso (18:03) ⁓ I ended up exempting by both the contents of their VIA header and the fact that I know somebody who works at the Internet Archive. Love you, Foon. But yeah, I just exempted their whole ASN after confirming that ⁓ it comes from some part of their ASN and they weren’t exactly sure where because they didn’t have a list of IP addresses that were consistently assigned to it because they do some dynamic stuff. I don’t care.
Josh Bressers (18:07) Nice.
Dice.
Xe Iaso (18:31) I just exempted their ASN and called it good.
Josh Bressers (18:34) Okay, good. That makes me, that’s great. That’s perfect because we like them.
Xe Iaso (18:37) Yeah, I have also been
talking with one of the people behind Common Crawl to make sure that one of the automated ⁓ robots.txt importer thing doesn’t block Common Crawl.
Josh Bressers (18:48) I have no idea what common crawl is.
Xe Iaso (18:50) ⁓ Common Crawl is a digital public good where they have a bunch of researchers doing a read-only snapshot of the internet, kind of like the internet archive, but they distribute the content of that snapshot openly. ⁓ yes, a lot of AI companies do use that for training, but any page that is loaded from Common Crawl is not a page that’s loaded from the origin server.
Josh Bressers (19:15) Right, right. And I assume their scraper doesn’t just hammer the living crap out of you as fast as it can.
Xe Iaso (19:20) it doesn’t.
It like some administrators have blocked it because it does too much traffic for their liking. the, my advice to administrators is to let common crawl through. And that may sound weird, but if, but you know, if you let common crawl through, then, the, then like people have a way to get it even regardless, you know,
Josh Bressers (19:27) okay.
Sure, sure. I mean, that makes sense. I feel like, well, I think that makes sense, but I don’t know what’s going on. I’m not, I’m not an expert in this space at all. ⁓ interesting. Okay. Okay. So, all right. I also am curious just about your, your thoughts and lessons on like, you are now running a large popular open source project. Like what are the surprising stories you have just from that universe?
Xe Iaso (20:08) There are a large number of people running privacy browser configurations that seem to have the empathy flag compiled out of them. Like, I don’t mean to sound conceited or whatever, but like, the number of people that have said like absolutely incorrigible things to me over email is way too high.
Josh Bressers (20:14) Hahaha!
I have no doubt.
Xe Iaso (20:30) Yeah, ⁓ I am starting a policy of if you send ⁓ more than three of those emails in the span of 24 hours, you get an invoice for my time. I haven’t had to issue any invoices yet, but ⁓ I will send them to collections.
Josh Bressers (20:41) Nice. Perfect.
⁓ that’s fantastic. Yeah, yeah. I mean for anyone listening, like just be nice to people. It’s not that hard.
Xe Iaso (20:56) And also the router you are on the issue tracker, the less priority I give your issue.
Josh Bressers (21:01) That’s perfect. Yeah. ⁓ yeah. Yeah. That’s, that’s a given, I think.
Xe Iaso (21:04) being nice
is free.
Josh Bressers (21:08) Yes, yes it is.
I mean, that feels like a common problem in open source, I think sometimes. It’s like, when you do a thing, you give it away for free. And I don’t know why it is, but there’s just like, there are people who have unrealistic expectations of basically like, why aren’t you doing my homework for me for free? Like I’m giving you exposure, the most valuable currency of all. It is frustrating, yes.
Xe Iaso (21:16) yeah.
It is quite frustrating.
Josh Bressers (21:39) I do love the idea of sending an invoice though. That is very nice.
Xe Iaso (21:45) Yeah.
Josh Bressers (21:47) mean, although then you’re just going to get them complaining to you about the invoice. it will see. We’ll see. Anyway. Okay. Okay. So. ⁓
Xe Iaso (21:53) That’s when you set
up an auto-reply to that with like, I’m sorry, in order for a human to view this conversation, you must pay this invoice.
Josh Bressers (21:59) Yeah, right. Right. mean, and so, I mean, let’s talk about that. Right. So there’s not, not the invoice specifically, but the sustainability aspect of this, because I know, you know, you mentioned, I, I, it might’ve been before we hit record. can’t remember anymore. It’s, it’s been a while, but like, this is not your job. Right. Anubis is like a side project you have.
Xe Iaso (22:20) Yes, it is not my day job. It is absolutely the kind of thing that needs to be a day job, but like
I am the only income earner for my household right now. My day job is on that desk behind me where I do like marketing for an object storage company.
I have, I am basically building this up from the ground on nights and weekends and
you know, of XKCD dependency, ⁓ I’m that peg. And like the amount of money I’m making right now off of it is like, let me just look at the backend on GitHub sponsors real quick. This only counts GitHub sponsors, it doesn’t count Liberapay or Patreon. I believe my Patreon’s at the kilo dollar threshold now, but. ⁓
Josh Bressers (22:58) Yeah, right.
Xe Iaso (23:20) Yeah, ⁓ I am the equivalent of about 60 % of a junior person’s salary working on Anubis.
Josh Bressers (23:30) YAH!
Xe Iaso (23:34) Yeah, ⁓
It is a lot. It is not a lot. Sometimes it makes me wonder what if I made this purely paid access and the like, but I don’t know.
Josh Bressers (23:57) Yeah, no, I get that. mean, this is one of the challenges we have in open source right now is we have the sustainability problem where an enormous amount of infrastructure on the planet is held up on the backs of unpaid volunteers. That like the work you do is amazing and it’s obviously appreciated and it’s literally changing the planet for the better. And yeah, you got to eat, know, I get that.
Xe Iaso (24:10) Yep.
Yeah, I’ve applied for government grants several times and have been turned down.
Josh Bressers (24:26) I’m yeah, I believe that like those grants are always super narrow and they’re really hard to get justified. And I have a suspicion they don’t even have a category for the thing you’re doing right now.
Xe Iaso (24:36) No, they barely do. I need to contact my MP ⁓ for the Americans in the room. MP is the Canadian equivalent of ⁓ the House of Representatives. And ask what my options are for government grants, but like…
Josh Bressers (24:51) Yeah. Yeah.
Hey UN!
Xe Iaso (25:01) Like,
I would… I would… This is also one of those really annoying things where, like, if my husband and I had universal basic income, like, I wouldn’t need to charge for it. Because, like, I could eat.
Josh Bressers (25:13) Yeah. Yeah.
Right, right, no, seriously, I mean, I do wonder that sometimes, like with something like universal basic income, there are an enormous number of people that would do amazing work, work that has no like economic benefit, we’ll say necessarily, but it has societal benefit. And I think it would be amazing, but that’s, I mean, that’s a whole other conversation, obviously. But, ⁓
Xe Iaso (25:34) Yeah.
yeah. But
for now, ⁓
I hate the sales process for educational institutions so much.
Josh Bressers (25:56) Yeah, I mean, look, the lesson here is if you’re listening and you can like go donate to Xe for Anubis, because it’s freaking awesome. Like if you use it, help pay for it.
Xe Iaso (26:06) Yeah, I have an unbranded version that I’ve been working on where basically the main advantage is that it lets you do a waifuectomy of the software and put your own logos or put no logo or customize the HTML templates or whatever. ⁓ I arbitrarily chose $50 a month because I wasn’t sure anyone would pay it. And ⁓ now it’s at least enough to pay my rent before tax.
Josh Bressers (26:35) Nice.
⁓ taxes, yes.
Xe Iaso (26:40) So that’s something.
Josh Bressers (26:42) And I will say the logo is one of my favorite parts of Anubis. So it almost makes me sad that people would want to take it out, but.
Xe Iaso (26:46) yeah.
⁓
yeah, the… About half the reason the logo is there is to, ⁓ guilt trip corp- corpos into paying me.
Josh Bressers (27:00) Nice.
Xe Iaso (27:01) because like…
If they care enough to change a picture of a cartoon jackal, they care enough to pay me to make sure that the software that they rely on is sustainable. I… I hate that it’s like that people are so peeved about it. Like I have gotten a number of angry comments and the like because of it and they’re like, I wish there was a way to turn that damn waifu off and I’m like…
Josh Bressers (27:15) Yes.
There is. Pay me.
Yeah, right. Xe just held up a credit card for those of you listening to that video.
Xe Iaso (27:40) Yeah,
a dead debit card for the record, but yeah. I have a dead one right by my desk just for ⁓ stunts like that.
Josh Bressers (27:44) Okay, okay.
Nice. That’s a good, yeah, I suppose. Cause if you held it up the wrong way and it was live, now you’ve just doxed your credit card, which.
Xe Iaso (27:56) Yeah, that
would be, what would the kids say? unideal?
Josh Bressers (28:00) Yeah, yeah, right, right. Okay, all right, we’re coming to the end here, And also, I guess I didn’t realize it was a jackal, not a cat. I don’t care, I’m still trying to cat. But it is, it’s still, great, great logo. Okay, now that you say that, it makes perfect sense. And yes, like I should have known that, but, and what is your tagline on GitHub? It’s something like weighing the soul of your HTTP request or, it’s very good.
Xe Iaso (28:11) Well yeah, Anubis is a jackal.
Yeah, something like that. ⁓
When I was rage-hacking out Anubis earlier this year, ⁓ for some reason I was searching for a metaphor because like any good hacker, I need a name for a project before I develop it because I do a practice called name-driven development. And I was looking for a good metaphor and then somehow I got to the Wikipedia page for Weighing of Souls and that seemed like a good metaphor.
Josh Bressers (28:41) Yeah, obviously.
Xe Iaso (28:53) And then later on, ⁓ the fact that it’s Anubis and Anubis Weighs Your Soul actually inspired the name and implementation of the Suspicion Point system, which I call Request Weight, where you remove Request Weight if things look less suspicious and you add it if things look more suspicious. Like if you’re Google Chrome, but you don’t have the Sec-CH-UA header, you get more Request Weight added because that is suspicious.
Josh Bressers (29:12) That’s cool.
And what does request weight mean in this context?
Xe Iaso (29:24) Um, it is a linear scale from negative 32 from negative 4.2 billion to positive 4.2 billion, where it is like the level of suspicion and you can add and remove weight with, uh, rule matches. And at the end, it, the end, a new was takes the weight and matches it to a list of thresholds. And if it matches the threshold, it allows it through, but if it doesn’t match any thresholds, the request is just allowed through.
It’s the equivalent of the Apache deny allow. Yeah. ⁓ yeah. And like that also leads to one of the most, one of the interesting tactical decisions I made early on with Anubis is that, ⁓ I wanted to, I’ve worked as a site reliability person. I’ve had a number of bad ideas over the years, like
Josh Bressers (29:53) okay, okay.
Okay, okay, that makes sense. Cool.
Xe Iaso (30:21) You know, those types of bad ideas where you’re like, you have them and you’re like, I should never implement this because it would be an absolute disaster. But one of them is that I came up with one of my previous jobs with this idea of Machiavellian security. So in Machiavellianism, you do way more violence that you need to as a baseline in order to make sure you don’t have to do it again. And the idea with that in Anubis is that out of the gate, Anubis is
Josh Bressers (30:28) Whatever.
Yeah, yeah.
Xe Iaso (30:47) paranoid as hell. is a Machiavellian nuke of a response that is very visible, very immediately obvious, impossible to ignore. But in the process, I didn’t want to break automation, so I made sure that Anubis only fired for things that looked like browsers.
Josh Bressers (31:08) ⁓ I see.
Xe Iaso (31:11) This is something that has come back to bite me in some levels because people use like net surf and net surf doesn’t have JavaScript support. And the people that are used net surf are very. Insistent about using a browser that does not have JavaScript support and want to access a website that they believe is static, but actually does have heavy JavaScript.
Josh Bressers (31:33) I see, I see. Just give them a hand math problem to solve prior to getting it.
Xe Iaso (31:38) Okay, so one
of the early ideas that I had is I have a friend that’s going through the CCNP process and we were thinking about having a list of like 32,000 AI generated CCNP multiple choice questions and making people that don’t use JavaScript fill that out. But we ended up deciding that would be a bad idea because one of the rules I have for Anubis challenges is that they should be completed without human intervention.
Josh Bressers (31:56) Nice.
That’s fair. Yeah.
Xe Iaso (32:09) And the fact that it’s completed without human intervention has had some interesting conversations with educational institutions in terms of accessibility. it’s ⁓ like, I was in a call with someone from an educational institution. They’re like, how is your product accessible? And I’m like, okay, ⁓ can I just demonstrate it? Because I think that it will be more instructive for me to show you that it is difficult to give this an accessibility rating.
⁓ So then I screen shared, had it do it, and then said, and it’s gone. How would you rate that in terms of accessibility? And then the person just sat there for like a minute and was like, ⁓ I can see how that would be hard.
Josh Bressers (32:43) Yeah, yeah.
Yeah, yeah, right, ⁓ okay, Xe. We’re kind of coming to the end. So I’m going to give you the last word. What are the things you want people to know? What are some actions they can take? What, like, like finish this one up for us and let us know what’s next.
Xe Iaso (33:15) So one of the annoying things about securing web applications is that you really have to know what your application is doing. One of the really annoying things about writing Anubis is that I can’t know what every web application is doing.
Josh Bressers (33:28) Yes.
Xe Iaso (33:29) ⁓ like a lot of the stuff that I do is based on research, but ultimately it is a best guess by reverse engineering things like
There is so much that goes on in the backend that you don’t realize. Like right now on my computer, I have like 30 different copies of various versions of various versions and forks of Chrome and Firefox that I use for testing and validation. Like.
At one point when I was developing a new rule set to make Anubis less prominent by doing some ⁓ comparisons of what normal browsers do, I had 300 different versions of Firefox and Chrome installed on my computer in various containers and download folders and archives. ⁓ yeah. I was really glad. I was really glad to delete that folder.
Josh Bressers (34:16) Whoa!
Xe Iaso (34:28) But there is a lot of stuff that goes in behind the scenes and there are also lot of experiments that I try and fail because I try my best but I can’t predict everything.
My white whale though is being able to do like full end to end browser testing in CI. And the last time that I did the math for what I need to do for that, I would need approximately 200 gigabytes of RAM per CI run across seven machines.
Josh Bressers (35:06) Wow.
Xe Iaso (35:07) that would involve Windows 10 virtual machines, Windows 11 virtual machines, macOS Tahoe, macOS Sierra, macOS Ventura, like.
There is a shocking amount of stuff that happens behind the scenes and is basically impossible to properly, that is also basically impossible to be public about because like a lot of my browser reverse engineering. Yeah. ⁓
I keep the notes for that on paper.
Josh Bressers (35:40) Nice. Nice. I get that 100%. Wow.
Xe Iaso (35:41) just in case.
Yeah, it is really, it is a really impossible nut to crack and I’m going to mess things up. And when I mess things up, please be polite about it on the, on the issue tracker, because I do keep track of names and some, and sometimes I have, forwarded some rude comments to human resources.
Josh Bressers (36:12) Awesome, good, and you should, because there’s no reason for that. Wow. I love this conversation. This has been so much fun. Thank you for the work you’re doing. It is phenomenal and world-changing, so I appreciate it.
Xe Iaso (36:23) I try.
Hopefully I’ll be able to get the WebAssembly stuff working. gosh. When I finally get the WebAssembly stuff working, there’s going to be another article about compiling WebAssembly to JavaScript, style of… ⁓ What was that talk by Destroyall Software? It was ⁓ about metal and they called it JavaScript.
Josh Bressers (36:30) Yeah, good luck with that one.
Nice.
I’m not familiar with this, but I’m going to go look for that because that sounds really cool. So for sure, for sure. All right, Xe. Thank you so much for the time. I truly appreciate it.
Xe Iaso (36:52) put that in the show notes, that’ll be great.
No problem, happy to be here.