In this episode we discuss crates.io trusted publishing with Tobias Bieniek. We cover the steps crates.io is taking to enhance supply chain security through trusted publishing, a method that leverages short-lived tokens and GitHub actions to safeguard against unauthorized access. Tobias shares insights into the challenges of managing a large-scale open-source repository, offering a glimpse into the future of secure software distribution. Tune in to learn how these advancements are shaping the landscape of open-source development.
Episode Links
- Tobias’ GitHub
- Tobias’ Mastodon
- Tobias’ Bluesky
- crates.io trusted publishing blog
- crates.io GitHub
- trusted publishing docs
- William Woodruff PyPI trusted publishing announcement
This episode is also available as a podcast, search for “Open Source Security” on your favorite podcast player.
Episode Transcript
Josh Bressers (00:00) Today, open source security is talking to Tobias Binek, crates.io team co-lead and software developer at the Rust Foundation. Tobias or Tobi I’m gonna call you Tobi from this point on, I think. Welcome to the show. I am ecstatic to have you here. This is so exciting.
Tobias Bieniek (00:13) Yeah, thanks for me. Hi.
Josh Bressers (00:15) So I invited Tobi on the show because of a very specific thing that the crates.io folks have done. You’re not the first, which you informed me of before you hit record, but you have added, let me, let me pull up the name to get this right over on the other. You’re calling it trusted publishing for the crates.io repository. So let’s just kind of start at the beginning. Tell us about yourself. Tell us about crates.io. Like you’re part of the Rust foundation, like just kind of lay the foundation and we’ll go from there.
Tobias Bieniek (00:44) All
right, cool. Yeah, so as I said, I’m Tobi. ⁓ I’ve been using Rust for about 10 years at this point. It took me a couple of tries to do that, like three or four times until it actually clicked. ⁓ So back then I was using, I was doing a lot of front-end work and I discovered that crates.io was using a front-end framework that I was familiar with. So I started contributing a little bit, little bits here and there, just small stuff basically.
Josh Bressers (00:54) Nice.
Tobias Bieniek (01:13) And then at some point in 2019, I was asked, do you want to join the crates.io team? I’m like, sure, why not?
So I that, started increasing my contributions in 2021. I was asked, hey, do you wanna lead the crates.io team? I’m like, sure, why not? That’s basically how it goes, Yeah, so the biggest change for me was in early 2023, the Rust Foundation approached me and asked me, hey, do you know anyone from your crates.io team who would be interested to working on this full-time?
Josh Bressers (01:32) That’s they always get ya. Yup, yup.
Tobias Bieniek (01:47) And that was the point at which I was pointing at myself. And well, I joined the Foundation in 2023. And now I’m actually being paid to work on this, which is great. And yeah, as you mentioned, crates.io obviously is the Rust package registry. So basically when you want to open source or when you want to publish your open source library for Rust, you can upload that to crates.io. so that others can basically download it from there and use it.
Josh Bressers (01:58) Very cool.
Right. And that’s kind of the expected workflow where Cargo automatically downloads your dependencies from crates.io and it’s all just magic.
Tobias Bieniek (02:19) Yeah, exactly.
Exactly, you basically put the dependency declaration in your cargo.toml file, hit cargo build or whatever. It just downloads it in the background. It compiles it and done basically.
Josh Bressers (02:36) Right, right. But you’re here to talk about the change to how you publish. So one of the topics that has been very top of mind for many of us for the last more years than I can remember at this point, I feel like pretty right around when COVID started is when this feels like it got big is the whole like supply chain security. And there’s this, there have been a number of attacks against package repositories.
where you hear about stolen API keys, stolen passwords, stolen tokens, things like that. And this, think, is where this trusted publishing concept intrigues me. And you don’t solve all the problems, but you solve some of them. So explain to us what trusted publishing is now.
Tobias Bieniek (03:19) Yeah, so the short version is ⁓ the way you usually published was you create an API key bound to your user account on crates.io and then you use that, you basically configured Cargo to use that API key to authenticate you with our API, right?
And when I started working on it full-time on Crates.io in 2023, you just created the API key, that’s it. No way to configure it or anything. You couldn’t set an expiry date. You couldn’t scope it to specific crates. We retrofitted that, but it’s still a long-lived API token usually. It used to be infinite expiry dates. Now we default to 90 days at least, but it’s still 90 days. Like in those 90 days, an attacker could be
very malicious obviously. And with this new way of… yeah.
Josh Bressers (04:09) Yeah, yeah. And now I want to clarify. When
you say it’s unscoped, that means if an attacker gets access to your API key, they have access to everything you’re doing publishing on crates.io, correct?
Tobias Bieniek (04:23) Exactly,
yeah. So that was the case, right? You now have the option to scope it down. So you can say like everything that starts with XYZ wildcard, ⁓ for example, like that, or you can scope it to only support or only be able to publish updates to existing crates stuff like that, right? So it’s gotten a lot better already. And this is basically the next step now with trusted publishing. So the way this works is instead of having this long-lived API token, ⁓
your CI system to publish something, you get a JSON web token from the GitHub actions runner, right? And that is a signed token. And you send that to crates.io And now crates.io can say, or can verify that this actually came from GitHub. And it also says this came from repository, don’t know, joshpressers slash xyz. And now if you
if you have configured crates.io to trust that repository on GitHub, then crates.io knows this is supposed to happen, like this is legitimate, and we can issue you a short-lived token now. And with this token, you can go back to cargo publish and use that as your authentication method. So that basically gives you access to just this specific crate which has the configuration, I trust this GitHub repository, right? Does that make sense?
Josh Bressers (05:49) Yes, now the token life is 30 minutes is the default versus 90 days. So if an attacker does get a hold of this token, they have 30 minutes to do their business.
Tobias Bieniek (05:53) Exactly. Yeah.
Yeah, also
the way the GitHub action is implemented, it automatically revokes the token as soon as the CI workflow is done. So it’s in practice, it’s even less than that usually.
Josh Bressers (06:07) ⁓ nice!
that’s cool. That’s perfect. Yeah. Good. Good. And so it’s probably worth explaining at this point as well, that I think the target attack that this truly prevents is people checking in their secrets into a repository, right?
Tobias Bieniek (06:32) Yes and no. So if you check it in, so there is a thing called GitHub secret scanning, a program from GitHub, ⁓ where they have a bunch of, I assume regular expressions that just run whenever someone pushes something and they check for known token formats basically. And we are participating in that. So if you would actually try to ⁓ push a…
Josh Bressers (06:38) Yes.
Tobias Bieniek (06:57) known crates.io token to your GitHub repository, GitHub would tell you and would tell us even that, hey, this leaked, please revoke it. And we do that. So it’s not quite like that. Yeah, yeah, Yeah, so that we can automatically, we can automatically revoke it. So that works quite well. I mean, it happens not that often, obviously, ⁓ but yeah, it does happen.
Josh Bressers (07:01) Okay.
wow. So GitHub tells you that someone checked in a crates.io token. Wow.
that’s cool. Holy cow, I didn’t know they did that.
Tobias Bieniek (07:24) So yeah, that’s the thing that already is happening. But like you could leak it obviously say, I don’t know, you have it in your copy paste buffer and you paste it into, I don’t know, Discord or whatever, stuff like that, right? Whatever. Or someone just like the way you configure Cargo Publish, by default, it’s being saved to a file on your disk, right?
Josh Bressers (07:24) Yeah, yeah.
Right, right.
Tobias Bieniek (07:50) If you have a malicious program on your machine that reads that file, you’re screwed. There are other ways to configure cargo as well. Like you could put it in one password, for example. ⁓ But like the default way is just a file on your disk. If that gets leaked, yeah, that happens. So that’s what this is preventing.
Josh Bressers (08:07) Sure. Well, and there’s also,
there’s also the use case where people store the secret in GitHub and then you have a malicious action or somehow a bad actor manages to run your action and gain access to your secret. I mean, we’ve heard about this with a ton of GitHub actions recently where the attacker changed an action.
An unrelated action, this would be where you have multiple actions running in your pipeline. They changed one of the actions that then snarfed all the secrets out of the environment. Cause when GitHub is passing these secrets in, it’s often like an environment variable or something like that. Snarf’s all the environment variable, ship them off to who knows where, and now you’re, you’re cooked because your 90 day API key is in the hands of an attacker, right?
And now they have 30 minutes.
Tobias Bieniek (09:01) Yeah,
if at all, right?
Josh Bressers (09:04) Right, right. Okay. So let’s kind of talk about that point. So I want to explain the, the open ID connect piece and I don’t want to dwell on it too much because I know open ID connect is super complicated, but so fundamentally this, this feels bizarre to talk and think about. So I’m going to try to explain this and hopefully I do it right. So you have an action running on GitHub. GitHub then basically it knows who you are.
in what repository you have. Like it has all this knowledge, right? Because just by definition, like, Tobi pushed something into a repository. GitHub knows Tobi did this. GitHub knows the repository it’s running in. And then GitHub can say, it generates the, you know, the, JSON web token. And then it’s going to hand that to crates.io as part of the process, which you have to use. You have a runner, you, or no, it’s like, is it a runner? It’s a GitHub action, right? That does this.
Tobias Bieniek (10:00) a good up action now.
Josh Bressers (10:02) Right. It takes, so your GitHub action takes the authentication details from GitHub, gives it to your service, and then your service validates that yes, this is GitHub, this is Tobi, this is the repository we think it is, like all the things line up, so I’m going to allow this authentication to happen. Did I get that right? Okay.
Tobias Bieniek (10:20) Exactly, yeah. Well, basically
it’s a token exchange, right? So when we say this is allowed to happen, we issue that short-lived token, we give it back to the action, and then that is being assigned as an output of the action, which can then be used in the cargo publish.
Josh Bressers (10:39) Right. Okay. Cool. Yes. And then then magic, right? We, you publish your, your crate or whatever. And then, so, so let me ask this.
Tobias Bieniek (10:46) Yeah, I mean, from then on, it’s just
the regular publish form, basically.
Josh Bressers (10:50) Okay. And what do I, as a, as a developer, what do I need to do? I assume I have to set something up on crates.io to make this happen. Yes.
Tobias Bieniek (10:57) Yeah,
so if you already are using CI for publishing, then it’s relatively straightforward. If you’re using cargo publish on your local machine, then there’s not really an advantage for you because like the workflow is entirely different, right? But if you’re using CI, the way it works is you probably already have a published workflow in your repository. And there, instead of having
Josh Bressers (11:13) Sure.
Tobias Bieniek (11:25) a secret, a repository secret with your API key, you now use that specific action in front of your cargo publish call, you assign the output from that cargo, from that specific action call and use that as your API key that you pass into cargo publish. So it’s basically a diff of like five lines and it relieves you from the hassle of having to manually maintain the API token secret in GitHub Actions.
Josh Bressers (11:55) That’s true, actually.
Tobias Bieniek (11:55) So it’s not just
a security benefit, it’s also a usability benefit.
Josh Bressers (11:59) That’s fair, because if your token expires 90 days now, every 90 days, you have to remember. I I do this all the time, right? Where there’s some service, I’m notorious for this for GitHub tokens, which I think the default is 30 days. And I get my token, I stick it on the machine I needed in, in the environment or whatever. like, okay, I gotta remember to do this again in 30 days. And 30 days comes, I’m like, I have no memory of how to do this.
Tobias Bieniek (12:05) Yeah.
Yeah. Also,
I mean, if it’s just one repository, it’s probably fine, but imagine having like 50. It’s amazing how much work that is.
Josh Bressers (12:28) Yeah, yeah.
Yes, 100%. And from talking to a lot of open source developers, it is very, very common for an open source developer to have many, many open source packages that they maintain. Like I’ve heard of people in especially like the NPM universe where they, you know, they focus on being super tiny, of people having literally thousands of repositories that they have to take care of. I can’t even imagine. Like I don’t even know what to say to that.
Tobias Bieniek (12:54) I can’t say
I maintain that many, but I have like 700-ish repositories on GitHub at this point. I mean, including forks and everything, but still, it’s like, I’ve been doing this for a while.
Josh Bressers (12:58) What? That’s-
Wow. Wow.
Yeah. You’re not maintaining API. Well, no, you know what happens? You will have one API key that has just permission to do whatever it wants because that’s less hassle than trying to maintain properly scoped keys with proper expiration dates. That’s what’s going to happen in that environment, right?
Tobias Bieniek (13:23) I I am somewhat thorough, but yes, not everything has the same one. Not everyone has a different one, yeah.
Josh Bressers (13:27) Okay, I suspect you’re special.
Right, right. mean, humans, right? Humans make silly mistakes. So, okay, I get that. This is really cool. And now you also told me at the beginning that there are a bunch of other open source repositories doing this. like, here’s the thing I told you as well, is like, I’ve only heard of crates.io doing this. And I don’t know why that is, because I feel like I try to pay attention and…
It’s probably just not getting the attention it needs. So if you’re a developer with other package repositories, like go look for this ability, like it’s not just a crates.io thing. Like everyone should be doing this all the time because it’s way more secure. Now, I also want to talk about what this feature doesn’t do because I think one of our favorite things, especially in the world of security, we’re like, use this new action from crates.io and like, you can’t, your, package can’t be hacked anymore, right? It’s safe from attackers. And it’s like,
Tobias Bieniek (14:21) Yeah… No.
Josh Bressers (14:23) No, this solves a very narrow set of problems for attacking a supply chain. It does not solve all of the problems. So let’s, I will let you start explaining this cause this is where I get all excited and I don’t want to, I don’t want to get too far ahead, but like tell us what it doesn’t do. Cause I think this is also an important part of the conversation.
Tobias Bieniek (14:29) Absolutely.
Yeah, I mean, as you already said, like if you are using other GitHub actions in your workflows, and those are compromised, we can’t do anything about that, right? So the advice we usually give is have a specific release workflow and have only the GitHub checkout action in there, have our authentication action in there, and then use cargo publish and be done with it.
Josh Bressers (14:51) Yes.
Tobias Bieniek (15:07) Nothing else, ideally. As soon as you use anything other third party, please at least look at the action, ⁓ verify that it doesn’t do anything malicious, because that’s where it’s important to look at. If your CI breaks, CI is usually read-only, it’s not that bad, but with something like publishing from CI, or is it CD at that point? I don’t know. ⁓
then it becomes a rewrite access, right? And that’s the…
Josh Bressers (15:42) Right, right. And we also don’t prevent with this if an attacker sneaks in a malicious commit to your project, right? There’s like, you can, no? Okay, explain this one.
Tobias Bieniek (15:51) Yes and no.
Yeah, so the way it works is you have support for GitHub environments in this specific case. GitHub has deployment environments. So this is basically when you wanna deploy from your repository to say server, right? You wanna deploy your website, your server code. But you can also use these environments to, for example, release a package, right?
environments can have restrictions to some degree. Like you can say, before this workflow is allowed to run for this specific environment, there needs to be an approval from, I don’t know, person A, person B, whatever, something like that. So you can basically restrict who can actually publish. Well, you can push the tag, but the workflow won’t necessarily run until it’s actually approved from someone.
Josh Bressers (16:37) yes.
Right,
right, okay, yes.
Tobias Bieniek (16:50) So there
is a certain degree of safety built in because we support these environments and these environments you can configure. you say, my crate trust this repository, you not just say this repository, but only the workflow name and optionally the environment. So this is safe on the crates.io side. And if someone then tries to publish from no environment or a different environment, those won’t get through.
Josh Bressers (17:07) Huh.
Okay, that’s pretty cool. I mean, that makes sense. But like XZ, right? The XZ attack. This has, there’s nothing you can do about this on your side.
Tobias Bieniek (17:24) Yeah, I mean.
No, no. Also,
just a fun fact, trusted publishing is ⁓ somewhat contentious because, like the name is contentious because trusted can be easily misunderstood as this is the trusted way, which is not how it’s supposed to be interpreted. The trusted in this case comes from…
Josh Bressers (17:40) Yeah.
Tobias Bieniek (17:47) creates IO trusting that other third party service like GitHub, that this is a trusted entity that can publish for you. But it’s not like if you currently publish from your local machine, it’s fine. Like this is not less trusted than what we’re introducing here necessarily, right?
Josh Bressers (18:05) Right, right. It’s machine to machine trust. Right. Okay. That’s cool. Now you also in your, your post, you support GitHub today, but you’re working on various other ecosystems. How is that coming?
Tobias Bieniek (18:08) Yeah, exactly.
Yeah. Yeah.
So.
we needed a minimum viable product, guess. We needed to start somewhere, right? So GitHub obviously being the biggest platform made sense. We currently unfortunately only support GitHub for authentication as well. So it makes sense to start there, right? But it’s still built in a way that we can easily support stuff like GitLab CICD, for example. I know PyPI, the Python Package Index supports, I wanna say Google as well.
Josh Bressers (18:23) For sure, yes. Yes.
Tobias Bieniek (18:47) So they have support for four different platforms at this point. So we’re slowly building support for the other platforms as well. Probably starting with GitLab next in a couple of weeks.
Josh Bressers (18:47) Nice.
Okay.
fantastic. Good. Good. Yeah. No, I agree. GitHub is right now the largest source of open source. That’s a silly thing to say, but yeah, I do the same. Okay. All right. I don’t feel like, or rather I should say, do you think there’s anything we’ve missed in this particular topic?
Tobias Bieniek (19:21) I’m not sure.
Josh Bressers (19:25) I mean, I’m sure we did,
but, okay. So here’s what, here’s how I want to close this one out. We got, we got a little bit, little bit more time I want to, I want to spend with you. You are. Well, the co-lead on crates.io. I bet you’ve got some stories and you’ve seen some things in this universe. So I’m just like, I would love to hear a little bit about what it’s like running like a massively successful, hugely used like public repository, because I have a suspicion that it is just.
a crazy, crazy time where nothing ever goes right. You know, there’s always weird things.
Tobias Bieniek (19:59) The short version is it’s
crazy and scary. I mean, if honestly, if you do a mistake, you could hurt a lot of people obviously. Right. So yeah, it’s important to know that as I said, like crates.io has been around for I think at this point 10 years.
Josh Bressers (20:11) Yeah, yeah.
Tobias Bieniek (20:19) And it was run by volunteers for the largest time. Like the first full-time employee, me, only started about two years ago, two and a half at this point. And until then it was just volunteers doing this in their spare time. So yeah, it’s gotten a lot better thanks to the support of various companies that supported the Rust Foundation. But yeah, when crates.io started,
it obviously wasn’t designed for the scale that we have now. So let’s just say there are certain scaling challenges. ⁓ Like two years ago, every download request for a crate file would go through our API servers. Why? Because we wanted to count the downloads, which is fine, but…
it needed a lot of API server capacity, obviously. ⁓ So yeah, also when the API servers went down or we restarted them or something, nobody could download anything anymore, right? So one of the first things I did when I got to work on this full-time, I actually had the capacity to think about this. ⁓ We moved the download traffic to actually just go through the CDNs and for download counting, we just now look at the CDN logs and like counted that way.
Josh Bressers (21:14) Yeah, yeah.
All right.
Tobias Bieniek (21:39) But that was like that was a couple of months of work to do it properly because like People were using it and just flipping the switch isn’t quite as easy in this case, right? So yeah, it’s it’s the the scaling challenges are the hardest part right now Since I started working on creates IO like since I joined the team at least in 2019 we’ve been basically doubling our traffic every year and I think in this year, we might actually triple it due to AI and everything
Josh Bressers (22:09) Wow. ⁓
Tobias Bieniek (22:10) ⁓
So scaling like that is crazy.
Josh Bressers (22:15) I can’t fathom. mean, that’s, congratulations, obviously. Like it’s a big deal and rust is super awesome. And the more rust in the world, the better is my opinion. So I’m all for that, but man, I can’t. That is bananas. Like doubling your traffic every year. mean, that’s sustainable forever. It’s fine.
Tobias Bieniek (22:20) Yeah. It’s a blessing it occurs, I guess.
Yeah, mean,
luckily we are getting sponsored by Fastly ⁓ with all the traffic.
So that helps a bunch. are still like compared to say the Python package index, we are still small peanuts. The big difference there is with Python, you can upload your compiled files basically, right? While we only support the source files. ⁓ So usually the packages you upload to the Python package index are much larger than on our case. So our total traffic is still relatively small, while our requests per second are
Josh Bressers (22:56) Yes. Yes.
That’s fair.
Tobias Bieniek (23:13) Well, not approaching them yet, but like they’re more in that direction, I guess.
Josh Bressers (23:18) Right,
right. And for anyone who doesn’t know, so this is one of the few things I do know about Rust is unlike most libraries we use, like in the C world or the Python world or something, everything in Rust, but because of how Rust works, like the borrow checker and all the other things, you have to build all of the source code at once, which obviously makes compile time slower, but you get all those added benefits. So yes, you do not download like a built Rust binary. You downloaded the source code for a crate and then you build it as part of your,
just project, I guess. So, yeah.
Tobias Bieniek (23:49) Yeah, although to be fair,
they are cached. So once you’ve built a library version, like you don’t have to rebuild it. Like mostly you’re rebuilding your own code and then it’s
Josh Bressers (23:52) Yes.
Right, right, yes, for sure, for sure. That’s right, that’s right. Okay, I had a thought and then we got off track, but that’s fine. So, okay, that is…
Tobias Bieniek (24:04) you
Josh Bressers (24:09) That is, I, all right. I remember what I was going to say now. ⁓ this is just for the audience. Fastly is like the unsung hero of open source. I think quite often they, do CDN work for a lot of projects and a lot of repositories. So like, all means, if you’re listening, go check out Fastly cause they’re super cool. I’m not being paid by them or work for them or anything. I just know like they are the, they’re like the roots buried in the earth of open source, which is awesome.
Tobias Bieniek (24:16) Absolutely.
you
can’t agree more. And it’s not just the the Crates.io download traffic that they support. When you download a new compiler, which obviously CI systems do all the time, that is a couple of hundred megabytes, I guess, and those are also going through fastly. And we wouldn’t be able to sustain it ⁓ without them basically.
Josh Bressers (24:35) Yeah. Yeah.
Yeah, yeah.
That’s a good point, actually. Right, because, yeah, CI systems love just downloading everything a million times a day because why cache it when you can download it because my internet’s faster than my disk.
Tobias Bieniek (25:03) you
I mean, to be fair, we are in talks with some of the CI providers to actually put caching machines into the data centers so that at least the route gets shorter, I guess. ⁓ But yeah, that’s just early days. So we’ll see what happens.
Josh Bressers (25:14) nice.
Yeah, yeah,
yeah. I mean, that makes sense. And yes, the Rust compiler is not small because every time I run Rust up, takes, I mean, it’s fast, but like you notice it. ⁓ Okay. All right, Tobias. We’re basically at the end here and I’m just going to give you the floor. Like tell us if there’s something people want to do to get involved. If there’s something you think people should know about like Rust or crates or trust or publishing or whatever you’re up to, like, like land this plane for us.
Tobias Bieniek (25:27) you
Cool. So let me first start with thanking everyone that was involved, aside from me, obviously. So that includes William Woodruff, who did the initial implementation for the Python Package Index. Without that reference implementation, this would have taken a lot longer, obviously. Then Matthew Trostle, who wrote the RFC for crates.io. So basically the proposal to actually do this with all of the details. And then the rest of the crates.io team, because without reviews,
Josh Bressers (25:56) Right.
Awesome.
Tobias Bieniek (26:19) wouldn’t have learned it here either. ⁓ Yeah, so thanks to all of you ⁓ and if you want to get involved go to the crates.io repository on GitHub, go through the discussions tab if you have questions, open issues if you find bugs, mean open pull requests if you want to, if you want to work on stuff and you can find all of the contact details there as well.
Josh Bressers (26:42) Awesome. Yeah, and like patches welcome, right? That’s always the dream. awesome. All right, this has been a super awesome chat. I wanna thank you so much. I’ve learned a ton and I suspect everyone else did too. So awesome, thank you.
Tobias Bieniek (26:45) Yeah, absolutely.