Updating open source dependencies with Jamie Tanna

Josh discusses updating open source dependencies with Jamie Tanna. Jamie works on Renovate which gives them a lot of insight into the challenges of keeping your open source updated. We discuss the challenges of semantic versioning, supply chain security, and AI-generated code. If you’re new or old to the world of open source dependencies, there’s something to learn from this chat.

Episode Links

This episode is also available as a podcast, search for “Open Source Security” on your favorite podcast player.

Episode Transcript

Josh Bressers (00:00) Today, open source security is talking to Jamie Tanna, maintainer and community manager of Renovate, which is a project to do kind of automatic dependency updates, we’ll say, but I’ll let Jamie explain more. So Jamie, us who you are, tell us kind of a little bit about like what Renovate is, and we’ll go from there, because I think this is gonna be a super interesting topic.

Jamie Tanna (00:19) Yeah, hey Josh, thanks for having me. Excited to be here. Yeah, so I, about six weeks ago, joined Mend the company behind Renovate, to work on primarily the open source project. ⁓ So Renovate is, as say, an open source project for doing dependency management ⁓ and keeping everything as up to date as possible. ⁓ I probably should have double checked just how many package managers we support, but it is literally

dozens we support. in the last week, I’ve learned about like three or four different package managers that I’m like, where did that come from? Like, yeah. So we do a lot of dependency updates and try and make it so people don’t have to do the boring. what’s the latest version of this thing? How do I check the change log? We do that all for you.

Josh Bressers (01:10) Which is awesome. And the real reason I was so excited to talk to Jamie about this is this is one of those problems that feels really easy. Like, ⁓ just install the updates, run NPM update or whatever, but it’s actually incredibly difficult. And so why don’t you set the stage for us? Explain like when we talk about like updating dependencies, like what does that even mean? And then we’ll kind of go from there into the weird difficulties.

Jamie Tanna (01:39) Yeah, so I guess if we take a step back and even talk about what is a dependency. So some people have a ⁓ JavaScript or TypeScript application, which has a package JSON, which lists all the packages that they depend on, just dependencies. But then even if you just look at that, there’s a few other things in there. So it’s not just, ⁓ I depend on this version of the GitHub.

SDK, I actually also require this version of Node.js to build. And that is its own dependency. Then you actually say, oh, whoa, whoa, whoa, whoa, whoa. OK, we’re talking about the dependencies. But what version of the node or what version of NPM do I actually need to install this package? And then you also go to, we have your package JSON, but people are generally building things like containers. So, OK, you have a version of node that you want.

your containers build, but you also probably want that to stay in sync with the version in your package JSON. And then you actually go and say, well, okay, to actually build the Docker file in CI, I need some CI running somewhere. So ⁓ I have some GitHub actions or I have some build-type configuration and each of those things have their own dependencies and different things. And then you go one step further. And one of the things that I absolutely love about Renovate

is you can go down the horrible XKCD angle of regexes and you can say, OK, here is my readme file. And it actually references a Docker image. Now I want Renovate to go and update that because that is its own dependency. And you suddenly realize that there are dependencies everywhere in all of your things. And it’s way more than just like a package JSON. You realize the explosion in different places that

there’s all these things that you just aren’t updating.

Josh Bressers (03:41) Right. I have not considered like the read me dependency problem actually prior to you saying this right now, but that is absolutely correct. And this happens all the time where I look at a read me and there’s instructions about, you need this version of Docker or use this version of this runner or whatever. And then it’s wrong. And then of course I’d usually just figure it out and go, but yes, that is an excellent point. This is, I feel like you’ve already defined the complexity.

quite well, but it is. mean, just saying what is a dependency is not a simple problem. And this feels like in the industry, it’s just, ⁓ just update your dependencies. But now that we don’t even know what they are, that becomes even more ridiculous. So for the purpose of this conversation, just to keep ourselves, we’ll say under control, let’s focus on NPM for the moment. Or if you want to pick on a different Python or Rust or Go or whatever, I don’t care, but like…

I’ll just, I’ll, I’ll, I’ll set some of the stage for you and explain to you like where my head space is in this discussion. So like node is one of my favorite examples because node has like gazillions of dependencies where when you install one thing, usually end up with like a hundred things, whether you want them or not. And the way these dependencies are defined is not like if you install like this version of, I’ll pick on Axios because I was just working with it the other day. Like if you install Axios, like we don’t generally speaking, I just type

NPM install Axios. And then Axios says, I need like this dependency, but version like 2.0 or above. Right. And so I like basically have no idea what I’m going to get until NPM install is done running. But then the arrow of time goes on. Now I’ve got security vulnerabilities. I’ve got bug fixes. I’ve got whatever. And this is kind of where Renovate comes in now and the work you’re doing of like, how do we get from the thing I installed last week to the thing.

that needs updating and even understanding what that even means for me, right, as a developer.

Jamie Tanna (05:35) Yeah, yeah, exactly. So in this case, when you say run renovate or dependable or any of these dependency update tools against a project, first of all, they need to say, what even is Axios? Let me go and look up somewhere what that definition is. And so like in the node ecosystem, have npmjs.org, the npm registry, which will actually have all that available. So we’ll then go to.

registry and say, does this package exist? Because it may be something internal that actually doesn’t exist in the public world. First you need to work out, okay, is this a thing that is available here? And the default is the npm registry. And then from there, you’re like, so I’m currently using version 2.0 something. But actually, what is the version string saying? Am I saying I want exactly this version?

Or am I saying, I want a semantic version pinning for 2.x or 2.0.x? And there’s little things like that that you then, A, have to know, OK, what is the current version constraint? What does that resolve to? And sometimes that may be available in your lock file. Sometimes you may not have a lock file. And so again, it’s that. And then you need to go to the registry and say, OK, what other versions do you have? And then from there, you can maybe work out,

We’re currently on 2.x. There is now 5.12. So we need to go and update to that. And that will probably be painful. So let’s inform the user that it’s probably going to be a major version. Or it is a major version. And the NPM ecosystem follows semantic versioning. So there will be some breaking changes. And this will probably not be great. actually, maybe there is also another 2.x version that we could also update to. So let’s give people the choice.

of the hard thing, which is the most up to date, the less hard thing, which is the most up to date thing for the version that they’re currently using.

Josh Bressers (07:45) Right, right. And we should, what, is semantic versioning for anyone who doesn’t know? I, I, will say I, as an old school open source nerd, like semantic versioning just is like in my bones, but there’s a lot of people who don’t know what that means anymore.

Jamie Tanna (08:01) So I’m probably going to do a bit of a cop out. Earlier this year, I was on one of the fallthrough podcast episodes. And Steve Klabnik was one of the other co-hosts. And Steve, for those that aren’t aware, is an actual co-editor on the spec for Semver. So we all got schooled on what Semver is. So I’ll share a link for that. But effectively, in

Josh Bressers (08:05) Excellent.

Jamie Tanna (08:28) most practical cases semantic versioning is three version numbers. So it’s x.y.z which is major and minor patch. generally, well generally, the intent of it is that if you are bumping the major versions, that first number, you are saying this is going to break something. It may not be something that you directly rely on, but should be breaking.

The minor, which is the second number, is generally like there are functional changes, but it shouldn’t break anything. And then the last one is patch releases, which, ⁓ yeah, exactly. And as you say, shouldn’t is, yeah. And it does a lot of that heavy lifting. And then, yes, the patch release, which is the last one, is there’s a bug fix. There’s something not really user-facing. Maybe it’s a security issue. But you then need to go and check the release notes for that.

Josh Bressers (09:05) put shouldn’t in air quotes.

Yeah, yeah. And the intent is that I should be able to, generally speaking, install a patch version with minimal concern, the feature version with, we’ll say, a little maybe a side eye, and then the major version is like clear the calendar, because there’s some work coming.

Jamie Tanna (09:42) Exactly. One of the interesting things that Steve is that semver was originally and generally intended for machines to understand version numbers. And although humans do get a lot of semantic information from them, it’s primarily for machines to be able to say, OK, this thing is going to be breaking on us. And so then when you add in the human aspect of someone having to work out, OK.

Is this change that I’m making going to break someone? That is very difficult. There’s an XKCD called Workflow, which is one of my favorites, where it’s someone who’s like, ⁓ your update broke stuff. Well, in fact, I won’t go and explain it all, pop it in the show notes. But yeah, it’s one of those things that you as a human can’t for certain know, how am I going to break something for someone?

Josh Bressers (10:30) Yeah, yeah.

Jamie Tanna (10:39) because there’s both documented and undocumented things that people rely upon. One of the things is with semantic versioning, you’re meant to actually describe what things are and are not covered by your actual versioning scheme. But sometimes things fall through the cracks. People are relying on things that you don’t want them to. And therefore, a thing that you don’t think is breaking is breaking. And also, once you’ve released that version and say it’s non-breaking.

There’s no way to, somebody said, whoa, whoa, actually this was breaking without you going through extra work to re-release it and everything.

Josh Bressers (11:17) Right, Okay. So now explain to us what a tool like Renovate, like these automatic updaters, like what happens, right? I’ve got my new version and obviously we have kind of three kinds of updates, right? There’s the minor, the, well, so the patch, the what? The minor, the major, yeah, minor. I always get the words mixed up. Anyway, what does that mean? Like, what are we looking at?

Jamie Tanna (11:44) So unfortunately, it’s not just major minor patch. But for simplicity’s sake, let’s start with that. So yeah, for instance, say with Axios, you’ve got a major minor patch version update. What Renovate will do is you will run Renovate either via the CLI, because it’s an open source command line tool, or you can use one of the hosted versions. You can, or when you run Renovate, Renovate will

Josh Bressers (11:49) That’s fair.

Jamie Tanna (12:14) Go and parse your package JSON. It will say, OK, cool, I’ve got a version of Axios. It will then hit the NPM registry or any other private registry you use, like Artifactory. And it say, I have a new patch version. I have 17 minor versions and three major versions. And then from there, there are a number of choices that depend on how configurable or how configured you have made your repository.

One of the things that, again, is nice about Renovate that can get a bit much at times is there’s just loads of configuration you can do. So you can add a lot of custom logic in around, actually, if there are three major releases, I want each one raised separately. So don’t try and immediately raise the V5. Get me to V3 next. So give me just one major at a time, and then I can incrementally get there.

Josh Bressers (13:02) Hmm.

Jamie Tanna (13:13) instead of trying to do big bang multiple releases at once. So by default, Renovate will say raise a major or minor in a patch. It will create three branches. If you’re on GitHub, it will create a pull request for you. And then what it will also do is it will populate that PR with here are the change logs for Axios of the versions that you’ve got. And it will provide you some additional information like

So Mend, one of the things we do is based on people using the Mend developer platform, which is a public cloud, we’ll look at, okay, in the last week, there’s a new version of Axios, great. How many people have accepted that PR straight away? Or how many people have actually had to make changes? Or how many people have made those changes, merged it, and then three days later are like,

Josh Bressers (14:07) Yeah.

Jamie Tanna (14:13) production’s dead, we need to roll that back. And so we take that sort of information and we provide additional context to people in the PR that they can then also see. Not only this is a major version, but it was out three months ago and there’s a 95 % chance that everything’s fine. And that gives you that additional merge confidence in terms of how is it worthwhile me accepting this PR or do I need to be a little bit more careful?

which generally with things like major versions, you should be a little bit more cautious, do a bit more thinking. It’s one of those things that also highlights just how good is your test suite. yeah, especially say with things like Axios, as an HTTP library, you may not be extremely well testing it. You may not be trying to make sure that it does HTTP requests exactly as you want it to, because you treat it as…

a box that will just go away and do that. And so you mock it out. But then when it comes to, a major version where they may have changed some underlying behavior, you may not have as many tests for that as well. So yeah, makes it tough.

Josh Bressers (15:26) Yeah, I mean, for sure, for sure. Okay. So I love the aspect of paying attention to how successful or unsuccessful a certain update was that was sent out because that has like a very herd immunity feeling to it.

you talked about having the, ability to have the tool kind of ⁓ inspect what’s going on automatically install updates. And then there’s obviously run the test suite, see if that passes, see if that fails. But I have a suspicion there’s like gnarly corner cases that we don’t think about in this universe, right? That it’s not just as easy as saying like, let’s just install.

this new version, like what happens when, I don’t know, maybe sometimes there aren’t new versions, things change names, like there’s so many weird things that happen in the open source universe, like how is that stuff all dealt with?

Jamie Tanna (16:16) Yeah, it’s a really good challenge. So for instance, say in the case that there are no updates for a while, there’s a concept within Renovate called abandonment, which is that maybe the upstream has just stopped updating it. Maybe it is feature complete, and that is actually a good thing. But we surface to users in a couple of means to say, hey, this package hasn’t been updated in over a year. Maybe fine, may not be. We will let you go and make that choice.

because in a lot of times it is fine. Sometimes it is. The maintainer has just burned out. No one’s given them support. They’re not working on the project anymore. But as you say, it could have also changed the package name. So one of the things, again, with the Renovate we have is the ability to do package replacements. So we have a number of community source means where you can come in and say,

Yeah, so for example, one thing recently we had was Release Please, which bought by Google to do automated releases. They recently moved their GitHub action between organizations on GitHub. And I say recently. It was like over a year ago. But we had someone contribute to the project say, hey, please create a replacement for it. And now what will happen is anyone who’s using that old version with the old name will get automatically replaced.

They’ll get a PR to say, hey, this thing’s now moved. Please go and use it. And so again, there’s things like that that you can add some additional logic. And it gives you that control to go and replace things. Because, yeah, things get deprecated. Things just stop being maintained. There’s a new spiritual successor. It’s nice to have a community-sourced way of managing that, as well as internals your own company.

Josh Bressers (18:11) that’s cool. So you’re saying this is literally open source contributors sending you this particular information about what’s replacing what.

Jamie Tanna (18:18) Yeah, exactly. Yeah. Yeah.

Josh Bressers (18:20) That’s pretty cool. Have

you had the situation come up yet where you have like two potential forks of one thing that has ceased to function?

Jamie Tanna (18:30) So as far as I’m aware, no. Not that I’ve seen at least. I think we’ve been fortunate that a lot of the time there is like a blessed fork. Either like the maintainer says, OK, yeah, it’s going over there. Or, yeah, the second fork just hasn’t got in touch to be like, please use us. Yeah.

Josh Bressers (18:57) Yeah, yeah, that’s fair. That’s right. mean, that’s a tricky problem though, right? Cause sometimes in the open source universe, you don’t necessarily know what is going to, I actually, I just talked that the episode that’s going to come up before yours was about the, Edera Tarmageddon bug where they, was basically like three layers of forks of all abandoned tar libraries for Rust. And it’s like, what do we even do with this now? Like it’s so complicated.

Jamie Tanna (19:21) Yeah.

Josh Bressers (19:23) Okay. Okay. So I feel like you have explained that there’s two pieces to this, I guess, the way you’ve explained this up to this point, it feels straightforward and, kind of obvious. And anyone who’s used these automatic dependency tools, think quite often it’s, it’s uneventful, right? Because many of the updates just kind of work. The ones that don’t need some level of, of, I guess, understanding and expertise to, to.

fix whatever the thing is that broke. But there’s got to be some gnarly stories you have around like how, how your crew is, you know, finding these things, testing what’s going on, understanding, cause I know how hard this can really be and the experience. And this is a good thing, right? Because the tooling has removed a bunch of the pain and suffering users normally would have to deal with.

Jamie Tanna (20:18) Yeah, and so I guess as a slight tangent, so one of the things in the last month is we’ve had a couple of very bad NPM supply chain security attacks. And dear reader, well, dear listener, you will be like, which ones are you talking about? There have been many. And yes, it is really tough at the moment. And it’s only getting harder. And

Josh Bressers (20:30) Yes.

Jamie Tanna (20:45) So for instance, one of the things ⁓ we’re gearing up for is this coming week, Renovate 42 is coming out, which is our next major release. And as part of that, we are changing some default behavior and modifying some stuff around minimum release age. So one of the things that these supply chain attacks have been having in common is malicious packages released, people update to it very quickly, get pwned.

And we saw this with things like TJ actions, where people either weren’t pinning or they were updating very quickly and then were maliciously impacted. And so one of the different things that you can do is you can add like a release gate. And you can say, whoa, wait at least like three days, seven days, 14 days, until security researchers have had a chance to say, no, no, no, let’s not do this yet. Or come out and say,

No, no, there’s definitely something wrong with us. And then it gets yanked by NPM. But one of the things around that is that it’s actually a little bit gnarly to actually understand where you can do things like that. So there’s things like PNPM and YARN, who’ve recently added this functionality in the actual tool chain. My friends over at GitHub recently added it to Dependabot. We’ve had this since 2019.

And it’s something that people have been using on and off. Like, it does have a fair bit of use, but especially since these attacks, we’ve had a significant rise in the number of people using it. And one of the interesting things here is it’s helped tease out some edge cases that we previously weren’t able to do at the scale that people were switching it on.

contributors would get in touch to actually say, I’ve spotted this thing, as well as also additional things that we’ve been doing as a maintainer team to improve both the visibility and the secure by default configuration to actually make sure. Yeah.

Josh Bressers (23:03) And I’m a huge proponent of secure by default because I think expecting people to enable features like they never do, right? The number of people who ever change the default is a hilariously low number. So that’s pretty cool. I dig that. So, okay. So here’s my question for you then. And this is, I always ask people like, this is a topic that has come up more than once recently on the show with various guests. If everyone starts waiting three days to install the updates,

Like how do we know there’s something wrong? Like part of the reason we figured this out is like, it’s the, you know, it’s the penguins on the edge of the cliff and they’re getting pushed off the side and we’re like, whoa, those guys got eaten by a whale. Let’s all not do this now, you know? And if everyone’s taking a step back, like who’s jumping off the cliff first?

Jamie Tanna (23:53) And yeah, that is a problem. So there are a number of really great security researchers who are catching these things. some of the time that is because someone’s been like, this looks weird. I’ve just seen this on my machine. Or as you say, maybe malicious packages will be a little bit more cautious, little bit clever around waiting a little bit longer. And for instance, I think it was the NX. ⁓

build tools, supply chain attack, where the malware used things like ⁓ LLM tools on your local machine to actually generate the code to exfiltrate the data. And so again, they may be doing more things to be ⁓ more pervasive and more cautious. ⁓ And so yeah, like it is a difficult thing. ⁓

maybe we can try and get some AI agents to be going off and installing all the things and seeing how many of them get broken. Yeah, I think we do need like a mix of some people who are very happy being the guinea pigs here and testing it all out. ⁓ But yeah, like as you say, it will make it harder and the malware will take that into account. ⁓

Josh Bressers (25:20) sure, for sure. mean, it’s always a cat and mouse game, right? Where defenders do something, the malware people do something new, and then we get to catch up to that and it’s, yeah, yeah, I get it. Okay. Right, right. Exactly. All right, Jamie. So what else do you want us to know about this topic? We’re kind of coming to the end. So, so fill us in, like, what are some good takeaways we want here?

Jamie Tanna (25:32) Never ending. ⁓

Dependency management is hard. I would love to say we have solved it for you. Come and use the project. Come and buy us as a product. But I can’t. I can say we get you a large amount on the way. There are lots of tools on the market for trying to get you there. But at the same time, don’t just try and get tooling in to solve the problem. Because one of the things that I have definitely seen,

approvers companies doing this is some people see the dependency updates as just tick-box. And for a lot of people, is, ⁓ I just want to get them done. So they will do whatever they can to bypass it. They’ll create GitHub actions bots that just auto-approve PRs. They will just say, no, it’s probably fine. So some of it is trying to put organization policies in place and things that.

can make things easier. So as I say, there’s things like merge confidence that Mend provides on top of Renovate, where you can use that sort of data. We have a thing called workflows. We can actually say, once a week, raise me the PRs or give me a PR with all the dependency updates that look that you have the highest confidence in. And those I can probably auto-merge. No human needs to review them, because you’ve got a week of data from.

lots of There some things you maybe do need to be a bit more cautious about. Yeah, so I’d say some of it is getting some tools in that do some of the work for you, but also improving the overall understanding within your organization, within your team, to be like, there’s a lot of stuff going on behind the scenes.

We haven’t even talked about the risks of AI-generated code. How much that depends on outdated software, ⁓ not that best practices stuff. And that is another risk that you have coming into your organization. Yeah.

Josh Bressers (27:58) I mean, that’s a good point because a lot of the AI code is trained on older data. So it’s going to install older dependencies. then you, mean, I guess, ironically, then you end up in a weird situation where you might have the AI generated code code installed in older dependency. Then you run something like renovate, which tells you to update the dependency. But now you break everything and you got your AI code trying to fix that.

Jamie Tanna (28:22) to go and fix it, yeah.

Josh Bressers (28:24) The solution, just downgrade the dependency and it’s like a wheel of pain. But, that’s a good point. Actually. had, I’ve not put a lot of thought into the AI code generation aspect of what that means for like some of our dependency and supply chain things, because it’s just so weird and it’s so unpredictable.

Jamie Tanna (28:43) And yeah, 100%. And we still don’t know enough about it as an industry. But we are already seeing things like, as you say, trained on old code, so it’s producing old versions code. Or it is effect, like a lot of people are trying to vendor versions of open source code in their projects. And then that doesn’t get updated when there is a security issue. And so if you have 100 people who are like,

only these few functions from this project. And that’s like, OK, we now have all of these people who are affected by CVE, but they don’t even know because it’s custom code. Yeah.

Josh Bressers (29:25) Yeah. Yeah.

So just renovate, even try to unwind that when someone vendors their own thing.

It’s okay, you can say no, it’s a really hard problem.

Jamie Tanna (29:36) Yeah,

I think short answer no, ⁓ there are, depending on the tooling you’ve got and things like that. for instance, like PNPM, you can apply your own patches to packages. So things like that, there may be a little bit more ⁓ insight. Yeah.

Josh Bressers (29:54) That’s kind of not vendoring. for what we’re talking about, anyone who doesn’t know, vendoring is where we take an open source project and let’s say, I’ll just pick on Axios, because we’ve been talking about, and I create Axios-Josh. And then that’s what I’m using. And obviously no code scanner is going to realize Axios-Josh is just Axios with some stupid patch Josh installed. But now when I look at vulnerability data and update data, it’s never going to have updates.

It’s never gonna have vulnerabilities against it because it’s just some random thing I did. And it creates a huge blind spot and it’s actually quite problematic for a lot of organizations.

it’s, what industry is this? ⁓ all right, Jamie. I mean, this has been, this is like such a cool problem and it’s so hard. Like, okay, so I always love telling people that there are times I have a podcast that it feels like I’ve been talking to someone for like two hours, cause it just like drags on. I could swear we hit record like five minutes ago, maybe. And it’s been half an hour.

Jamie Tanna (30:31) Yeah.

Josh Bressers (30:55) And so rather than, rather than like bore the audience with me asking a bunch of more ridiculous questions, like this has been a really cool conversation. I want to thank you a lot for, just the insight and the interesting problems. Like this is a cool space and I’m excited to see kind of where it goes from here. So thank you so much.

Jamie Tanna (31:11) Thanks for having me. It’s been great to chat. I mean, I’ve learned the clock change Maybe that was a mess at the time.

Josh Bressers (31:18) Yes. Yes. We are speaking, ⁓ after daylight savings shortly afterwards, and I hate daylight savings. I hate it so much, but all right. Awesome. Anyway, Jamie, thank you. Thank you so much.

Jamie Tanna (31:30) Thanks, folks.

Episode Links#

Episode Transcript#

Episode Links

Episode Transcript