In this episode Seth Larson gives us a cornucopia of topics relating to Python security. Seth discusses the Python Software Foundation’s decision to reject a significant grant NSF. Diversity is a big deal to python, so this was a no brainier. We discuss the upcoming PyCon US conference, featuring a new security track that fosters collaboration between developers and security experts. Josh is a huge fan of having a security track at developer conferences. And we close on a paper about zip and tar archives Seth wrote. It seems like we should have zip and tar security figured out by now, but we don’t. Thankfully Seth is working on it.
Episode Links
- Seth Larson
- Seth’s Blog
- The PSF has withdrawn a $1.5 million proposal to US government grant program
- PyCon US CFP
- Join us in “Trailblazing Python Security” at PyCon US 2026
- SLIPPERY ZIPS AND STICKY TAR-PITS: SECURITY AND ARCHIVES
This episode is also available as a podcast, search for “Open Source Security” on your favorite podcast player.
Episode Transcript
Josh Bressers (00:00) Today, you open source securities talking to Seth Larson, the residence security developer at the Python Software Foundation. Seth, I want to welcome you back to the show, man.
Seth Larson (00:08) Yeah, thanks for having me. Always lovely to be here, Josh.
Josh Bressers (00:09) ⁓ I’m
excited. Seth is awesome. I love talking to Seth. all right. Why don’t you explain kind of who you are, what you’re doing, and we’ll go from there. Because we have like, we have a packed agenda. I don’t think we’re to get to it all, but we’ll see.
Seth Larson (00:23) Yeah, so I’m Seth. And I do security work at the Python Software Foundation. I mostly focus on kind of like standards work or tooling work or things related to like CPython, the actual programming language and the implementation. ⁓ And then I have like a counterpart at the PSF, Mike Fiedler, who focuses on PyPI, which is how a lot of people interact with the Python ecosystem. So I’m everything besides PyPI.
Josh Bressers (00:50) which is, know the work you do, but I still like the, when I think about trying to manage security for a package repository that’s well used, like I just, my brain hurts, man. know, I know it’s, there’s the things I know terrify me and I know the things Mike must work on, there are things I can’t even imagine going on. You know, it’s one of those.
Seth Larson (00:52) Which is a lot of things, but it’s still plenty.
Mike also has his work cut out for him.
Josh Bressers (01:18) No
doubt, no doubt. Okay, so I want to start us out with the Python Software Foundation recently, I don’t know if they gave it back or they just denied it how it works, but there was a grant from the National Science Foundation in the United States and the PSF basically said, no, we’re not going to take your grant because we don’t like the strings you’ve attached. So why don’t you explain this to us? Because I have an enormous amount of respect for, I think the just courage it took for this move.
Seth Larson (01:45) Yeah.
So the short of it is that there’s this opening called this, it’s like SAFE-OSC, safe open source security ecosystem, something like that, acronym. That proposal opened up sometime last year and the PSF basically saw it and was like, this is a great opportunity for us, we’re going to apply. So I was the principal investigator and Loren was my co-principal investigator. We both applied and over the course of a year,
have been kind of like working on this proposal and consulting with a bunch of people and getting recommendations and like kind of learning how to do it while we’re doing it too because it’s the very first proposal to the NSF that the PSF has ever submitted. ⁓ And then we basically got informed by the NSF that they’re like the terms that were going to be applied to the grant were changing with
a very specific line that basically said we would not be able to run any programs related to DEI at all during the term of the grant. ⁓ And so the interesting part would be that if…
you know, if you’re found doing that, then they take back the money, but a lot of the money is meant to go towards staffing. And so it’ll be money that we’ve already spent, and that’s a huge risk for us, ⁓ And so we basically withdrew our proposal, ⁓ and this was after we had been recommended for funding. So it meant that, like, if things would have proceeded, we probably would have gotten the money and been able to do the work.
Josh Bressers (03:18) Yeah. Yeah.
Seth Larson (03:19) Yeah, and then we basically, because we had put so much effort into this, ⁓ we announced that to the community. And really, it’s been kind of great seeing all the lovely messages from the community. also we raised ⁓ the last number before like our actual end of year fundraiser opened was $160,000 from the community ⁓ donations just because of this announcement. So like the response has been super overwhelmingly positive and, know,
Josh Bressers (03:41) nice.
Seth Larson (03:49) And I think just to kind of quote, this is someone, I can’t remember who said this, it was a hard decision, but it was an easy decision, right? Because our mission statement literally says promoting an equitable and diverse Python ecosystem. So if you look at that, it’s like, yeah, well, of course, we’re not going to take these strings. ⁓
Josh Bressers (03:57) Yeah.
Yeah.
Seth Larson (04:09) Yeah, turning down 1.5 million. I think the PSF, that would be a very significant percentage ⁓ of our revenue or of our operating budget for the upcoming year. So, yeah.
Josh Bressers (04:20) Yeah, that’s tough, man. mean, and the first thing I thought of is, so I talked to Deb Nicholson, it’s been a couple episodes now, but I mean, one of the things she talked about at length was just all of the various outreach campaigns the PSF does. And obviously diversity is a huge part of that. And even she talked about how amazing the diversity is and how much you learn and how advantageous it is to the community. It’s just…
I hear you. Like, I get it was hard. It’s hard to turn down a million and a half bucks, but it’s pretty easy, I think, when it’s you folks.
Seth Larson (04:54) It’s hard and easy, right? It captures the sentiment very well. ⁓ Yeah, no, the response has been really, really great. All it does is it tells me that I’m serving the right community.
Josh Bressers (05:09) Awesome. Yeah, for sure. I mean, that’s great. And I guess I’m not totally surprised. I feel like I’ve been around the open source nerd community long enough that there’s, mean, there’s some will say that maybe wouldn’t agree, but there’s a lot that certainly pushed the diversity angle. So I love it. That’s cool. That’s cool. Okay. I want to move on to PyCon US because I am so excited when you told me there is a security track
Seth Larson (05:33) Mmm.
Josh Bressers (05:39) And this is something I know, I talked to Deb, I can’t remember if I talked to Deb about this on the show or not. Like it’s one of those, it’s been long enough, it’s fallen out of my brain. But when you have security conferences, right, we get all these security people that attend them and I feel like it turns into a big echo chamber more often than not. And so I think having security tracks at non-security conferences is amazing for two reasons. Number one is security people,
get to interact, we’ll say get to, I’ll that in quotes, get to interact with non-security developers because I think it’s often easy as security people to assume we know best or assume we know what developers want or should be doing or whatever. And I think it’s also good because on the other side of the coin, obviously you have developers who get to interact with the security people and developers are legendary for being hard to work with and, said developers, security people are legendary for being hard to work with and.
I think a lot of developers just don’t like security people for a variety of reasons. tell us about this, okay, let me back up. I jumped the gun a little bit. First, tell us about the Python, PyCon, right? And then tell us a little bit about your security track you’re running and then we’ll go from there. Cause this is so cool.
Seth Larson (06:53) Yeah, so PyCon US, it’s the biggest conference in North America about Python. It’s the PSF’s, you know, premier ⁓ conference, basically the biggest one that we put on every year. And there’s just a whole lot of Pythonistas that come to PyCon US, a lot of the core team, a lot of people that are doing, you know, maintenance of packaging tools.
All of these people come to the conference, lot of experts, and we all either put on talks or we deliberate about new features. The language summit and the packaging summit, so those are places where what’s coming next for Python happen there. There’s tutorials, there’s just everything. There’s lots of open spaces, so you get to show off what cool things you’re doing with Python.
So it’s just kind of like a place for like experts, enthusiasts, and people that are just starting or people that have been doing it since the beginning to all intermix and ⁓ basically spread ideas of what Python should be and what’s happening now. So it is really an amazing conference, I will say. Yeah.
Josh Bressers (07:58) All right, so let me ask, you said there’s space
for people to show off what they’re doing with Python. So I know like in the embedded space now, like hobbyist hardware hacking type things, like it’s all micro Python everywhere I look, right? So they’re like a bunch of like weird hardware projects going on in the spaces then.
Seth Larson (08:16) Yeah, so last year, just last year, I watched a talk about basically taking old keyboards, extremely old keyboards, and making them work with MicroPython. And I was like, this is an amazing talk. And I think they use like a Raspberry Pi and everything too to kind of like translate the communication between the two computers. And I’m like, this is so cool. And I was using Python on it. So that was really incredible. There’s tons of hardware stuff.
Josh Bressers (08:45) So I have my funny story I’ll tell you. So there’s a conference I attend every year called CypherCon in Milwaukee, Wisconsin. It’s the hacker history podcast I do is actually part of the CypherCon umbrella. And the badges this year, they always have like this very hackable badge and like hacking the badge is part of the fun, you know, obviously. In this year, they just had a Raspberry Pi Pico in the badge with USB port, like reachable, running MicroPython.
Seth Larson (08:47) Mm.
Mm-hmm.
Josh Bressers (09:11) And so obviously the first thing I did was just like downloaded all the code and I stuck it all in GitHub. then, know, everyone started going to town on it. And I remember the bad creator was like, how’d how’d you find it so fast? I’m like, it’s micro Python. Like this is the easiest thing in the world to extract. Like, come on.
Seth Larson (09:25) Yeah, that’s
lovely. No, that’s super awesome. No, I have not dipped my toes a ton into like MicroPython and microprocessing, but it is a really, really fun place. The physicality of the thing, it’s something that you don’t always get with software, right? Where you’re able to control a physical device.
Josh Bressers (09:34) Nice. It’s fun.
Yeah, yeah.
Yes, yes. And it amuses me because you have the richness of the Python ecosystem then to use. I mean, I just had this where I’ve got some projects I was doing and I’m like, I wonder if someone has a library for that. I’m like, oh, there is, I’m done. But now it took me 20 minutes instead of four days.
Seth Larson (09:59) always.
Right. There’s always a library in Python. That’s one of the magical things.
Josh Bressers (10:08) That’s right. That’s right. Cool. Okay. So now tell me about the security track at PyCon, cause that’s brand new, said.
Seth Larson (10:12) Yes.
That’s brand new. in previous years, I mean, there’s always security content at PyCon. ⁓ But new this year. So there is new actual talk tracks. And so the security track that I’m chairing along with Juanita is called Trailblazing Python Security. And there’s also an AI track. And this security track is basically, well, of course, right, I see you smiling. ⁓
Josh Bressers (10:20) For sure. Yeah.
Nice.
Of there is.
Seth Larson (10:41) So this security track is basically going to be a whole bunch of security tracks all end to end to end to end in the same room. So if you’re interested in Python security, you come to that room and then you just get all the security stuff. All the security experts at PyCon will likely be there too. So there’ll be a lot of, you know, cross contamination of ideas. ⁓ We’re really excited about it. I think it’s going to be great because in previous years we’ve done things like open spaces that are themed about security.
or we’ll try to do like an unofficial talk track where you you’re running in an open space. I’ve been involved in a couple of those and now we’ve finally upgraded it to like an official thing. So if you go to the CFP of PyCon US which closes on December 19th and you have a security talk idea that has any amount of Python in it, we would really love to hear about it. Even if it’s like 95 % security, just like Josh said, like it’s really great to get the, to have that like.
rubber meets the road where you get to talk to developers and talk to security experts and you kind of get to meet in the middle. I think that that’s where the most interesting discussions happen. ⁓ Yeah, so if you have one of those talks, submit to the CFP December 19th. Be really exciting.
Josh Bressers (11:49) All right, so
let me ask a clarifying question, Seth. If I was to ask a member of the review board what sort of topics they feel are especially interesting from a security perspective right now, what would such a person say, do you think?
Seth Larson (11:54) Mm-hmm.
Hmm.
Yeah, so stuff involving ⁓ like vulnerability management or package management or how you’re deploying Python applications securely to the cloud or to bare metal, how you’re doing security work with Python, if it’s like how to use Python as a tool for security work, like vulnerability scanning or malware scanning. ⁓
Josh Bressers (12:19) Uh-huh.
Seth Larson (12:33) There’s a lot of different angles to security, just like supply chain security in general. So if you have like a specific security tool that you’re using and how does that interact with Python? How does it like work well with Python? How does it not work well with Python? Because I mean, there’s even though Python is a huge ecosystem, there’s all sorts of different like how well does it handle all these like little edge cases. ⁓ There’s a lot of different ways you can go with it. I don’t think it’s pretty open in terms of terms of theming as long as it’s security related.
Josh Bressers (13:00) Yeah, okay. I mean, that seems fair. That doesn’t surprise me. yeah, Python’s everywhere from the, we’ll say the, I don’t know, the consumer side of security, right? Where people like me write all kinds of wacky stuff. You’ve got the more, I guess, distributor side of things where we get all these various ecosystems full of data, it literally ecosystems from like Andrew Nesbitt. There’s like what Google OSV, there’s stuff you guys work on at the OpenSSF. And it’s amazing.
I feel like everywhere I look, Python. Python is there all the time. It’s awesome.
Seth Larson (13:34) There’s a
of Python.
Josh Bressers (13:36) That’s cool though. Okay. All right. So for anyone interested, I’m going to put a link in the show notes. The CFP closes on December 19th, which is going to be like what three or four weeks from when the show goes on. can’t remember exactly what the calendar looks like, but anyone interested, like don’t dawdle on this one. This is not like those CFPs that are open for like six months and then they get a million entries. So that’s really cool. I’m, I love that you’re doing this so much. wish every, I I shouldn’t say wish. I hope every
Conference like every tech conference especially developer focused has a security track in the future because I think it is so important for security people and developers to actually talk to each other That is one of my biggest complaints. I have like just in general about security stuff
All right. All right, Seth. let’s check. What’s next on this list here. There’s a white paper that I, did you write it? It was written with alpha omega. Are you, you are the primary author. Okay. ⁓ I have it in this tab here. I suppose I could have looked. what does it, what does it say? it does author Seth Larsen, Python software foundation. Okay. Okay. I was on page three anyway. Okay. So you, you wrote a paper. I, there’s so much, there’s so much cool stuff to talk about. All right.
Seth Larson (14:32) Yeah, so I was the primary author of this white paper. Yeah, yeah.
You’re just too eager, Josh.
Josh Bressers (14:52) Let me see, you named it slippery zips and sticky tar pits, security and archives. Tell us about your white paper, Seth, because I feel like this is a hip topic these days.
Seth Larson (15:02) Yeah, so this white paper is about a handful of, like, software vulnerabilities affecting tar archives and zip archives. And so those archive formats are used very just all over the place in turn both in Python and in all these other open source packaging ecosystems, right? Because we need some way to distribute all of these files for software projects to
the end user in a package, right? And so like a package actually is like a zip file. So if you’ve ever heard of like the Python wheel format, it’s a zip. ⁓
Josh Bressers (15:38) Yeah, you can unzip
them. Jar files are the same thing. Just unzip a jar file. What’s inside?
Seth Larson (15:43) Same thing with jars, same thing with, you know, what is it, like, OpenDocX. Tons of formats are just, like, secretly zipped in disguise. So I wanted to quickly just thank the security researchers that actually, found and reported these vulnerabilities. So Caleb Brown and Tim Hatch were both the reporters. They worked a whole bunch with us, like, on the CPython team and the PyPI team to, like, actually get this from reported to fixed and tested and all that. So thank you to them. ⁓
But yeah, these vulnerabilities are very wide ranging. They affect both like people that are installing packages, people that are publishing packages, people that are analyzing packages, because these vulnerabilities make it really difficult to know what is the authoritative, like what should the package have inside of it, for example, right? And then there’s also aspects of like when you extract a package, when you actually install it your system,
you want it to go into a certain directory, the place where you’re installing it to. But if you are using an implementation of tar or zip that is not secure towards making sure that the actual files end up in that directory, what can happen is it will, with the privileges of whatever is doing the installing, place files anywhere on the file system. And so those are things like you’ve heard of maybe like zip slip. It’s something like that, right? Where you have
a whole bunch of parent directories inside of a name, inside of an archive, and then it just places it. And then if that file ends up replacing another file that was like important for something, that could be bad for a system. ⁓ So that’s kind of like the gist of it. this is, paper is talking about remitting the vulnerabilities and how we were able to work with these reporters ⁓ and basically the implementations for like installers, for Python installers.
to try to make it so that the ecosystem is just safer in general, even if we don’t know how many vulnerable implementations are out there. So that’s the topic.
Josh Bressers (17:51) Yeah, for sure.
For sure. mean, Python’s a great use case where I think a lot of the weird features are need. Because fundamentally, you’re just taking a thing and you’re unpacking whatever is inside beneath some directory. in the Python, well, you correct me, but I have a suspicion in the Python ecosystem, there should never be a reason for a package to try to unpack something outside of the directory it’s operating on. Yes?
Seth Larson (18:06) Mm-hmm.
Right, yeah, that’s correct.
Josh Bressers (18:20) Okay, yeah, yeah.
And the number of times that’s needed is like exceedingly low and strange corner cases, right? And this is one of the things Seth that like drives me crazy sometimes is when a lot of these standards and formats were created back in like the 70s and 80s, generally speaking, there was this idea of like, what was it, be strict on what you send and loose on what you accept. I can’t remember the exact terminology, right?
Seth Larson (18:29) Mm-hmm.
Yeah, it’s like,
what is it, like strict in what you admit and liberal in what you accept, something to that effect, yeah.
Josh Bressers (18:53) Yeah, yeah, right. Exactly.
And so it made sense that, we have this weird tar file. We’ll do our best to, to un-archive it in the way that maybe makes some sense, right? And now we look at this stuff and it’s like, that was so dumb. We should have never done this.
Seth Larson (19:10) Right, yeah, I think that like that thinking is definitely changing especially with like internet standards where People are seeing all of these vulnerabilities in in people like to like differential differential attacks where you’re exploiting the differences between two implementations to be able to Because nowadays there’s actually like a pretty good chance that multiple implementations exist
Josh Bressers (19:18) Yeah. Yeah.
Yep. Yep.
Seth Larson (19:35) within a single application as opposed to maybe before if applications were simpler or not using ⁓ open source software with tons of different dependency trees, there maybe that was less likely. But now there’s just so many systems that are all interconnected and they all probably have different implementation of whatever. And so if they see two different things from the same data, that’s not good, right? It’s exploitable.
Josh Bressers (19:59) Yep,
yep, exactly. And in fact, I have a chat coming up with Edera about Tarmageddon and that’s exactly the attack scenario they reference in their advisory is using this certain Rust library will unarchive the TAR file in one way, but then if you say, inspect the TAR archive with your endpoint detection or virus scanner, whatever the thing is, it’s going to interpret it completely differently because it’s…
Seth Larson (20:22) Mm-hmm.
Josh Bressers (20:27) some weird corner. I don’t know the details. We’ll, we’ll figure that out next time. But yeah, like this, ah, now I also love in your paper down towards the end, you have, uh, what is it? Recommendations and future work is the name of the title. And you just say restrict uncommon archive features. Like, yes, I feel like if I was writing an archive library now in 2025, I would be like, we are doing the, the 80 % everyone needs and screw the rest of that stuff. Because like,
Seth Larson (20:30) Yeah.
Josh Bressers (20:55) There’s just dragons there, right?
Seth Larson (20:57) Yeah, definitely. So one of the features that is like you mentioned, completely relic of the past, is this idea of not wanting to overwrite and restart a zip, right? Because one of the things that was in the past is like if you had a zip that was too big, you’d have it spread across multiple drives, like multiple floppy drives. And so there’s this feature in zip, and I believe in tar as well, that basically instead of like,
Josh Bressers (21:18) Yes.
Seth Larson (21:26) deleting a file or modifying a file, you would just write the file again to the archive, just a new time, and then say this is the authoritative one. And that’s not necessary anymore. We don’t need to do that. ⁓ so, like, PyPI now rejects archives that try to take advantage of this feature. If you have, like, multiple sets of content with the same name, ⁓ it just rejects it. It says you don’t need to do that. Because we already reject
Josh Bressers (21:33) Yes.
Nice.
Seth Larson (21:55) ⁓ like zip files that have more than one drive specified, we already reject that. And so like that feature makes no sense ⁓ to even support. And so there’s like a whole bunch of features that we now reject. But the thing is, is like, you can’t just push this up and then like, you just broke, you know, 10 % of the ecosystem. You have to be kind of sure that what you’re going to do is not just going to break everything. And so part of this white paper is also like how I was able to run some regression, some testing.
beforehand, like some sampling of files to see that if I were to push this, it wouldn’t just immediately break the whole world for Python.
Josh Bressers (22:32) Yeah, and it wasn’t that bad. You have a table. I’m trying to find it somewhere in here. Your paper’s too long, Seth.
Seth Larson (22:36) Yeah, it’s towards the bottom. It’s in the…
I’m so sorry. Package as an enforcement mechanism.
Josh Bressers (22:43) Uh, there it is. Yeah. It looks like what, um, eight, eight packages. It looks like out of the 13,460 have weirdness in them. Like that seems pretty acceptable.
Seth Larson (22:58) Pretty good. Yeah, so this is like evaluating the top 15,000 projects that had zips at all. And there some that didn’t have zips at all. And most of them that did have zips had no issues. And so there was only eight that would have tripped the… And it was due to issues that were not malicious, obviously. It’s just like mistakes in the tools or some custom tool that was being used to build the packages that we actually do want to reject because it was doing something incorrect.
Josh Bressers (23:18) Yeah.
Yeah, but I mean, you have, okay, so let me ask this. And so you mentioned a little while ago that Python wheels are just zip files. So did, did I, I don’t know what I mean. I know pip handles and there’s other things, guess too, would those correctly bail on a weirdly formed wheel or was wheel a little loosely defined?
Seth Larson (23:32) Mm-hmm.
So wheel is defined as a zip with certain files in it. And so it’s pretty loosely defined. Like, I think the definition of a wheel installer is basically if you can unzip it. Because that is not super defined, I would say. The unzip tool is actually referenced there. So if unzip works, it’s a wheel.
Josh Bressers (23:55) Okay.
Nice.
Seth Larson (24:17) So, like, maybe there is some future work to revisit that definition and be a little bit more strict. I definitely want to do that as, a follow-up. And it should be a lot easier now that we’ve, kind of shown that you don’t need all of these exotic zip features to have wheels working in the Python ecosystem. I don’t think anyone believed that either, and so, like, it’s fine either way, right? Like, it wasn’t exactly people weren’t saying, like, you’re taking away my wheel features. It’s like, no, no, no, no. Everyone’s happy. ⁓
Josh Bressers (24:37) ⁓
Who
does any, I have never built a wheel in a way that I would say wasn’t just let the tooling do whatever the tooling wants to do. I mean, I’m sure someone has like created this artisanal wheel at some point, but I feel like for 99 % of the users who’ve created wheels, they, could, you could change the package format and no one would know because the tooling does the work.
Seth Larson (25:12) ⁓ it’s the law of big numbers though, Josh, right? Like there’s…
Josh Bressers (25:14) I know, I know man,
like every crazy thing happens at least once. I know, I know.
Seth Larson (25:20) Yeah,
and we’re a very, very big number. So there’s plenty of one-offs out there that are, you And the other side of it is the installer side, right? Like if you have all of these installers that are using old ⁓ versions of PIP or whatever, if there’s anything that’s not backwards compatible, those won’t be able to do the right thing by default.
Josh Bressers (25:25) Yes.
Yep. Yep.
That’s yeah. Yeah. The backwards compatibility is huge for sure. Cause I mean, I, I’ve never existed in an ecosystem as large as Python, but the things I have worked on that it always seems funny. Like you think like, this won’t matter. No one’s using this feature and like inevitably someone’s using it. And then they’re very loud.
Seth Larson (25:46) Mm-hmm.
There’s an XKCD about this, right? Like the space bar no longer overheats the CPU.
Josh Bressers (26:10) I don’t remember that. I’m sure I only look it up. There’s an XKCD for everything. My workflow. ⁓ yes. Yes. That’s everything. All right. I want to also point out in your paper. So, and I’m curious on how you plan to do this, but you have the, the last one I talked about, which was what was the name of that title restrict uncommon archive features, there’s fuzzing archive and differential testing. What does that mean? Cause I.
Seth Larson (26:13) I think it’s called My Workflow. You broke my workflow.
Josh Bressers (26:39) I’m excited for this one.
Seth Larson (26:41) Yeah, so there’s this concept in kind of like software quality called fuzzing, which is basically it’s a tool that takes like some inputs. Maybe you give it inputs of like, here’s what a zip file looks like, right? And then it will mutate little pieces of that file and then just try every single mutation, like run some code that takes that zip file in tries to parse it. And if it fails, however it blows up, whatever it will record that. And then it’ll just try the next one.
And it just does that a million times. really, really good for ⁓ like parsers. Parsers is a really good use case for fuzzing because you just get to test every single branch quite quickly usually. ⁓ And there’s some other concepts there like seed corpora where you give it weird stuff that has broken in the past and then you like mutate on that and hopefully you find more issues that way. So there’s also this paper that I reference called like a
my zip is not your zip, that basically takes a huge table of every single zip implementation that this author was able to find and then cross-references it with every other zip implementation and sees what the differences are, how it handles different zip archives. And so that data set and that set of implementations was really interesting to me because I was like, okay, well, you could technically take a zip fuzzer.
Josh Bressers (27:51) Yeah, yeah.
Seth Larson (28:04) and then run the CPython zip file implementation and then maybe some other zip file implementation and feed it the same exact zip and then just kind of see what each project sees, like how did it parse that zip and then compare. And if there’s differences, that’s interesting. That’s something to note down. Is it an issue or is it something that’s expected? Is this a backwards compatibility thing? The other side of fuzzing that I think is a lot
So that’s like really special, interesting new thing that I think hasn’t been done a lot yet. The stuff that I’m also interested in is what we were talking about earlier where you would never expect a zip to extract outside of a directory, ⁓ especially if you have like certain protection modes on. So CPython’s zip file and tar file implementations both have options that basically say like, okay, this is a secure extraction, don’t allow it to go outside. So because it’s explicit there,
what you can do is you can run a fuzzer and then say like, I’ve got the security feature on, it should never extract outside. And then you just like watch the file system or watch the standard library modules that would be involved in like opening a file or writing a file, right? And you say, you can assert that no matter what the fuzzing input is, the protections are working correctly. And so like, that’s another interesting angle for this. And you can like run protections based on that.
Josh Bressers (29:30) Yeah, yeah. And I’m really interested in the differential testing aspect of this, because you briefly touched on it a moment ago, but this is where you’re going to take different implementations and look at what happens, especially with known bad, you know, purposefully malicious, like zip files. And this I think is extra interesting because like, especially in the world of Python, hesitate to guess how many zip parsers exist in PyPI. I’m sure the number would terrify all of us, but…
More importantly, I think, is like in the world of Python now, like the two big package installers are pip and UV, right? And I know there’s some even more being worked on, but like the way, like the literal code pip uses to unzip things is not the code UV uses because they’re just fundamentally different language technologies, right? So like this whole different, like the differential aspect I feel like is more intriguing than ever before.
Seth Larson (30:28) Yeah, definitely. And some of the vulnerabilities that are a part of this white paper were discovered in UV, right? And there was like a differential between PIP, which uses CPython’s zip file module, and then UV, which uses its own zip implementation. And so like, it was discovered that way.
Josh Bressers (30:42) But wait, Seth, UV is written
in Rust. Rust doesn’t have security bugs.
Seth Larson (30:49) It avoids memory bugs, but you can always write bad logic bugs. mean, that’s always an option.
Josh Bressers (30:51) There you go. Yeah. For sure. For
sure. Yeah. Yeah. that’s cool. I didn’t realize, I guess I didn’t, I didn’t click the links in your paper, so I didn’t realize they came from UV. That’s pretty cool. Nice. And I suppose that’s one of the other ironies, I guess, is I bet the UV folks started writing UV and then once like they started dealing with like the massive sprawl of PiPI, I can only imagine the ridiculous corner cases that started showing up in the bug tracker.
Seth Larson (31:20) Yeah, yeah, there’s a lot of Python packages, it turns out.
Josh Bressers (31:26) I, yes, yes. I, I think about NPM all the time about this because NPM is like 10 times the size of everyone else. And just, it’s so big. It’s so big and every nutty thing you can imagine what’s happened. Anyway. Okay. All right, Seth. I think it’s time to go. I want to give you the floor. Like, let us know what you want us to do. What should we be paying attention to? What are you working on that we should wait to see what’s coming? So.
Seth Larson (31:28) Yeah
Josh Bressers (31:56) Take us home.
Seth Larson (31:58) Yeah, so I really want to invite everyone again to apply to that CFP, submit your talks. You are the audience for the CFP. If you’re listening to this podcast, you are the audience. So please submit talks to PyCon, come to Long Beach, California, and spend some time with us and learn about Python and teach us about security and it’s going to be a great time.
Josh Bressers (32:03) Yes.
For sure. And again, the link will be in the show notes, so go click it and submit your paper. It should be exciting. All right, Seth, it’s been a treat,
Seth Larson (32:30) Thanks for having me, Josh.
Josh Bressers (32:31) Yeah, thank you. can’t wait to see what crazy thing you got for us next time. So until then, have a good one.
Seth Larson (32:37) Bye.