Josh chats with Brian Fox from Sonatype about their 2026 State of the Software Supply Chain report. Most of the number continue to grow at alarming rates, but there’s some new interesting findings in this one. We discuss end of life and open source which is tough to define. We touch on what using AI with open source dependencies looks like (and why it’s broken), and we discuss the challenge of upgrading your open source dependencies in a way that doesn’t break everything. It’s a great report and great discussion.
Episode Links
- Brian
- State of the Software Supply Chain
- Sonatype
- Sustaining Package Repositories Interview
- Michael’s FOSDEM talk
- Report covering the herd
This episode is also available as a podcast, search for “Open Source Security” on your favorite podcast player.
Episode Transcript
Josh Bressers (00:00) Today, open source security is talking to Brian Fox.
co-founder and CTO of Sonatype and an absolutely lovely person I love talking to. So Brian, welcome back to the show, man. I’m really excited. So it is that time of year when I get to talk to Brian about Sonatype’s 2026 open source. What is the name of your report technically?
Brian Fox (00:09) Hi Josh, good to see you again.
state of the software supply chain.
Josh Bressers (00:22) Okay. And you’ve been doing this every year for quite a few years now. Whoa, 11 years. Holy cow. I feel old. Right? Yeah. We were, we were doing supply chain before it was cool. I love it. So I guess I’ll just start the show up by saying, I’ll give you the floor here, Brian. Tell us about like, what is the report? And you can touch on any points you have. I’ve got a bunch of notes and some things I’d like us to talk about, but I’ll just let you set the stage for us.
Brian Fox (00:26) 11 years now, yeah.
It’s way before it was Vogue, yes.
Yeah, sure. So ⁓ we started the report forever ago, like I said, 11 years ago, trying to shine a light on these challenges that we could see in the ecosystems. So just as a reminder, Sonatype runs Maven Central. It’s not just Maven. It’s sort of the old name, right? But it’s basically the open source Java component repository. It’s like NPM for JavaScript.
NuGet for .NET and PyPI for Python, right? We run the repository for Java. ⁓ It’s kind of the OG in the repositories. It kind of predates most of those other ones. ⁓ And in the early days, we saw weird patterns. We saw things like the most vulnerable versions of crypto libraries were the ones that were also the most popular, right? And so that kind of ⁓ caused us to start.
creating this report to shine a light on it. And so every year we take a look at different aspects of it. know, of course the problem space has shifted over the years. You know, in the early days it was people didn’t know they were using open source. ⁓ know, leaders didn’t know they were using open source. They were like, we’re not doing that. You know, and they’d think we’re not using open office and Firefox and Linux, but really all of their applications were built as you know, you know, on all these components.
So that was sort of the problem in the early years and then it became, you know, about better management and then later, you know, SBOM kind of capabilities, which were just different words for basically the same thing, you know, knowing what’s in your software and being smart about choices. And of course, then around 2017, it started including the malware stuff, you know, which has become a prominent feature every single year. I think in the last couple of years, I would say the industry is kind of finally waking up and like,
This is general knowledge, but for years I was talking to people and they had no idea that this was a thing. So we kind of use the SSCR as sort of this document to kind of help track these things over time. And we also do in-depth research on different areas over the years. We don’t look at the same thing every single time. We try to take a different take on it. And you might’ve heard of AI. It’s this new thing. So this year we did an interesting deep dive on…
how AI agents and AI coding tools interact and deal with dependencies. And I think that’s really interesting. ⁓ But, you know, ⁓ we were chatting before record some of the things that we mentioned in here, like end of life and some of the other things are also things we talked about years ago, ⁓ but didn’t get traction because I guess the market wasn’t ready, right? So it’s kind of interesting to have this ⁓ long view on it and be able to refer back to the trends sort of.
⁓ pretty consistently so that we can kind of tell this story as the market has evolved.
Josh Bressers (03:45) Yeah, yeah, for sure. And I guess if I was going to summarize your report, it’s, know, graph goes up into the right is kind of, I think the theme here, right? Where there’s more malware, more dependencies, more open source, more kind of everything, which I guess to anyone paying attention shouldn’t really surprise them, right?
Brian Fox (04:05) No, it shouldn’t. ⁓ you know, we, we, know, the high, high level numbers, ⁓ you know, that we’re kind of reporting ⁓ on 9.8 trillion, basically 10 trillion downloads across the top five, we call them kind of the big five, Maven, PyPI, NPM, and NuGet 10 trillion downloads ⁓ last year.
⁓ And when you look at the graph, it’s the same graph every year. Basically the growth continues. It’s like a fractal every year you zoom out, but it’s the same shape. ⁓ The numbers are just bigger and bigger. And you know, Michael Winser gave a talk at FOSDEM and he kind of, compared 10 trillion to Google searches. You want to take a guess how they compare Google search, search Google searches in the same year.
Josh Bressers (04:55) You know what Brian I literally watched that talk yesterday and and I have Brian no this is good
Brian Fox (05:00) I didn’t mean to put you on the spot then. Okay.
It’s two times. It’s two times the Google scale of 5 trillion searches a year. We’re serving up 10 trillion across these registries. And that was, that was kind of used in the context of how we have to work to make the repositories more sustainable. This is the thing you and I talked about last time, right? That the growth of these things is largely unbounded, but it’s not free, you know, and, and,
I think Michael did a good job of kind of reminding people, know, that when more people use open source, the cost of using more open source is effectively zero. Like that’s the economics of software, right? More or less. But when you’re talking about a service that delivers binaries, these are actual electrons being pushed around the world. Those are not yet free. We don’t have limitless free energy. And so at the end of the day, all these things cost money. So the more this goes up, the more the actual infrastructure.
supporting that open source goes up. And so that’s one of the sections we talked about in here. ⁓ But 9.8 trillion is a crazy number. ⁓ The latest number of malware packages that we’ve tracked, we have the biggest database in the industry. It’s 1.2 million intentionally malicious open source components. And we saw a huge explosion of that this year because of several of the worms, the Shai-Hulud worm.
Version two, version three, and there were a few others, the Lazarus Group out of North Korea doing a lot of stuff, right? So we’ve seen cases where individual campaigns are producing hundreds of thousands of components all by themselves, just littering the wayside.
Josh Bressers (06:41) And I want to point this one out as well. You have a whole section for malware and it kind of as a note, I’m actually talking to Michael Winser next week, like calendar next week. I suspect it’ll come out right after this episode. So it’s going to be, these will tie together perfectly, but your malware graph. I mean, there’s basically the amount of malware from last year to this year. It looks like it’s pretty much doubled on your graph, which is like a terrifying level of growth, right?
Brian Fox (06:46) Mm-hmm.
interesting. Okay. Okay. Yeah.
Mm-hmm. Mm-hmm.
It is, I mean, there were three years in a row in the early days where the growth was like 750 % each year. But yes, those were smaller numbers. We’re talking big numbers and the fact that it still doubled is also not cool. And I think it was in last year’s report where I called out the fact that, cause we did some research of all of these big five, how many components are in use in enterprise applications, the…
the number was about 750,000 components, right? So there’s tens of millions of various things out there, but most of them are not used regularly in enterprise applications. So 750,000 as of last year, and last year was also when we crossed 750,000 fake malicious components. And I thought that was interesting that the more actual intentional garbage out there was.
It was more than what people were typically using. So now we’re vastly past that. The noise to signal ratio is going in the wrong direction.
Josh Bressers (08:05) Right, right. And you also have a, I can’t remember what you call it, and I’m trying to find it in the page, but it’s not coming up, but where people are uploading like test projects and kind of just random, I wouldn’t call it garbage, but like things that are intended never to be used by anyone. And the amount of that software is huge as well.
Brian Fox (08:26) It is, and we talk about that in the registry sustainability ⁓ section because it was sort of an example of people doing somewhat innocent things without realizing the costs that they’re producing, right? So it was a case of someone who was continuously on every single commit, doing a build, uploading.
their snapshot effectively trying to pretend that it was going to be a release, pushing it through the Maven central validation and then dropping it at the last minute automatically because one time they went to do a release and they, had a burp in their plugin or something like that. And I was like, dude, what are you doing? Why are you doing that? Like, know continuous is sort of a thing, but we all need to step back and think about that continuous, continuous. All these things is, is effectively waste. You know,
You don’t actually intend to release every single commit. Like that’s crazy. Nobody really does that. So why did you build it this way? And that was just an example of when ⁓ people don’t understand the cost of their actions and they assume that they’re free, it leads to these decisions that don’t scale. That was kind of the moral of that story.
Josh Bressers (09:36) Right. And, and look, I think historically a lot of people just never thought about what happens after I push something into CI, right? Like, Google’s giving everyone free computers, something the Linux foundation is paying for it. That’s my favorite one is when people like, the Linux foundation pays for open source. It’s like, no.
Brian Fox (09:53) Right.
Right. ⁓ Yeah. And that was kind of the point that I was trying to make. And that’s why we did the open letter that we talked about. If people don’t know what we’re talking about, we joined together with other registries. We published it was in the beginning of October, I believe. The open letter ⁓ on sustainability. You can find that on the OpenSSF blog. ⁓ And we kind of laid this out and talked about the fact that, you know, the registries are going to have to shift. We’re going to have to shift to align
how these things get paid for with how the costs are produced. And that’s really kind of code for heavy users are going to have to start paying or they’re going to be limited. There’s just no way not to. And these registries were designed to help ⁓ host and share and support open source. And we all want to continue to do that. But I think it’s hard to argue that we should allow giant social networks downloading the same 10,000 jars a million times every month.
That’s not the same thing as supporting open source, right? And so it becomes a slippery slope if you’re not careful there. And in the early days, these numbers were small. It was easy to write off the rounding. But now when we’re talking, I think it’s sort of estimated that the cost per year to run one of these public registries is between $5 to $10 million, depending on the infrastructure.
And in many cases, registries get their bandwidth donated and it still costs that much. So that doesn’t include the bandwidth. You know, these things are very much not cheap to run anymore and relying on, ⁓ you know, just the never ending goodwill of large benefactors is not also going to work. Again, when the numbers start to get really huge, it becomes much more difficult. So we’re all working together to kind of figure out how to balance those costs.
Josh Bressers (11:44) Yeah, yeah. And I will put a link in the show notes. Brian and I talked about his letter shortly after it was released and I’ll dig it up and shove it in there for anyone who hasn’t seen it. It’s very good.
Brian Fox (11:52) Yeah. Yeah.
And Michael’s talk at FOSDEM was really good as well.
Josh Bressers (11:56) I will also put a link to that in the show notes. And if I remember, I’ll update the show notes when I talk to Michael, but I have, there’s a 90 % chance I won’t happen.
Brian Fox (12:02) Yeah.
So yeah, we covered that. know, vulnerabilities, we also talked about. ⁓ In fact, I just did ⁓ a webinar yesterday with with CRob and some others on ⁓ the state of vulnerability management in 2026. And, you know, I made the joke that it’s kind of coming up on ⁓ NVD CVE funding crisis season, right? Like the last two years in a row around March, April.
A bomb was dropped on the world that one of them was going to lose funding and then somebody came in at the last minute. you know, it’s a, it’s that time of year again. So we should all prepare ourselves for that. But, ⁓ you know, the, the, the vulnerability, ⁓ ecosystem is being underfunded. No surprise, just like anything else, the scale of this stuff is blowing up. The costs and the funding are not, not aligned.
And that creates massive lag. I think in our study this year, let me see, we found only 35 % of open source vulnerabilities could be triaged from the NVD. So basically 65 % of them had not yet been scored on the NVD. So tools that are dependent upon that are effectively blind to 65 % of last year’s open source vulnerabilities. Further, 46 % of those CVEs
were actually of high or critical. So it’s not the case that they’re focused on the higher critical and ignoring the little ones. It’s just not that at all. They’re not getting triaged at all in many of these cases, right? ⁓ And additionally, and this is something we’ve talked about for a long time, that even when they are scored, even when they are out there, the data that is available in the public, it’s not really vetted in many cases, right? The project submits the data and it kind of just goes through in the vulnerability.
Long talked about how that can lead to mismatches and the exact components being flagged and things like that. So one in seven we found last year had to be corrected. So, you know, that’s still not great. So you’re missing 65%. Half of those are high or critical. And even the ones that do get triaged, the one in seven are effectively wrong. So our vulnerability feeds collectively when you’re relying on the public data, you can’t run a business on that, you know?
Josh Bressers (14:23) No, no, for sure.
Brian Fox (14:25) And
that’s one section that we looked at for sure.
Josh Bressers (14:28) Yeah, I mean, we have that problem at Anchore is we have Grype which is an open source vulnerability scanner. And we do a ton of work triaging and understanding vulnerabilities because you’re right, the NVD kind of collapsed. Now I will say the data GitHub publishes is very good. And that one I’m quite pleased with. And in fact, Maven publishes Java vulnerability information, which is quite good as well.
Brian Fox (14:33) Mm-hmm.
Yeah, I mean, GitHub is doing ⁓ triage data elaboration like we have long done, right? We’ve been doing that for a long time for our products because I didn’t want to have to get in that business back in the day. I just wanted to build products. But then we realized we couldn’t build products that weren’t loaded with false positives and a bunch of false negatives by just relying on that data. So we became accidental data scientists out of this. ⁓
It was a good accident when we talked about the AI stuff. I think it’s pretty interesting, but that’s why that happened because we recognized right away that the data that was out there wasn’t built for open source. It was built for chips, pieces of hardware, big things like Microsoft Office or Windows, Red Hat, these kinds of things, not micro components. And that was the ultimate flaw. So yeah.
And then, you know, our favorite Log4Shell you know, we’ve talked about this. It’s four years now. It was released. It was discovered at the end of 2021. ⁓ Last year, 14 % of the downloads of Log4Shell were of those known vulnerable versions.
Josh Bressers (16:01) which is way lower than I thought it would be.
Brian Fox (16:04) Yeah,
me too, but it took four years to get there, right? Last year it was like 30%. ⁓ And, you know, we did take a look this year at several other components that are also popular, also had large vulnerabilities, also had been, you know, vulnerabilities that had been out for a while. And we find similar patterns that ⁓ show the truth that we kind of know that the consumers of these things aren’t…
on average paying attention enough and updating their dependencies. Hence all the push and mandates around SBOMs and everything else, right? Because ⁓ left to our own devices, companies just aren’t doing it. It’s not that the data is not available. It’s not that it was low profile. mean, gosh, where were you living if you didn’t hear about Log4Shell? My mom was asking about it, right? I saw it on screens in the news chirons in elevators.
⁓ It was everywhere and yet, you know, so if you didn’t hear about that, how many of the other things did you miss? ⁓ So, so we looked at that kind of report on that as well.
Josh Bressers (17:06) So I also, want to touch on that in the vulnerability section this year, you have some end of life information. And I know that was part of the topic you talked about, Dave Welch from Hero Devs on your webinar yesterday. And you have some Hero Devs data in here on end of life. And I’m curious of your thoughts of why do we think people are suddenly caring about end of life software? I feel like this has exploded in the last six months, whereas before this, like no one cared.
Brian Fox (17:11) Mm-hmm.
Mm-hmm.
Yeah,
it has. it’s funny. We chatted about this before we did a, we did a study on this a few years ago and we were exploring algorithms to try to understand, you know, given a component that hasn’t had a release in a handful of years, is it end of life? It might, it might not be, it might just be working. It might be doing what it was intended to, and there’s no vulnerability. So therefore there’s no reason to change it. And, you know, we, we proposed and shared some algorithms in the report. It was a couple of years ago and I think it’s linked in this year’s report as well.
that talk about how you might look at that. You might look at is the ecosystem generally moving away from it more rapidly than you would expect normal attrition. That’s a pretty good sign of something that’s actually end of life versus something that’s just does its job and is a quiet hero underneath the dependency stack. ⁓ was probably two years ago at this point that I first started to hear from customers really poking at end of life and it was onesie twosie.
the Federal Reserve Banking audit system had started asking about it. Why did they start? I have no idea, but they did. And so we started to hear that across our finance customers a couple of years ago and it’s continued to grow from there. So I think some of these more progressive auditors understood the risk and started asking these questions underneath the hood. And I think that’s what really finally got the ball.
Josh Bressers (18:55) so the end of life and financial services, I suspect PCI four is what drove a lot of that. Cause PCI four has specific language about using end of life software, which I don’t think PCI three had that language in it.
Brian Fox (19:08) You’re right. I, that was a while ago now. I forgot about that, but you’re, you’re right. think those two things are correlated. So that might’ve been why the auditors had started to ask it and why, why these things, these things have popped up certainly within finance. It is a good endorsement for why these things matter. You know, we get impatient that it takes so long for obvious things to get implemented, but when they do get implemented in the right places, in the right type of regulation industry standards, it does eventually get people to care.
And so I think that’s why we’re seeing it. ⁓ Certainly, I would say just the maturity of the industry overall of understanding, you know, things like Log4Shell things like Shai-Hulud have made it more ⁓ front of mind, top of mind, whatever you want ⁓ for people to understand that. then, you know, identifying things that have no support path becomes a natural outcropping. I think it’s probably a little bit of all of the above. Yeah.
Josh Bressers (20:06) Yeah, yeah. I mean, we don’t know as much as we’d love to say we can define end of life and open source. We can’t today, but I suspect over the next couple of years, that definition will evolve and I’m sure there will be much wailing and gnashing of teeth as we go.
Brian Fox (20:19) Yeah.
Yeah. It is hard. ⁓ It’s not impossible, but it is difficult because most projects don’t do a big public, you know, I’m out, put up the white flag. It happens. So I’m in response to the CRA, but most of them don’t happen that way. They kind of get to a point where they’re working for me. Nobody’s asking me to do stuff. Life happens. Things get old and then they kind of solidify.
And then eventually nobody wants to fix them anymore. That’s the more natural thing. And that’s why it’s so hard to put your finger on when exactly did this thing turn the lights out? It never did. They just slowly faded away.
Josh Bressers (20:58) Yeah, yeah, for sure. All right, I want to move on to the AI section because I think there’s a couple interesting data points in here. And I especially want to talk about your thing at the end. What did you call it? The false economy of latest version. We’ll get to that later. But first, I want to just talk about you have some data on, I guess, typical AI kind of open source use. And I guess the fundamental message here I took away with was just that like.
Brian Fox (21:10) Yeah, okay.
Josh Bressers (21:24) AIs love to hallucinate things that don’t exist and they love to like install old versions of software. And I’m like, I, we’ve all talked about this for a long time, but I’m glad like someone actually like did some scientific rigor and wrote it down. So thank you.
Brian Fox (21:32) Yep.
Mm-hmm.
Yes. Thank you. So what we did here is we looked at, ⁓ in the report, we did GPT-5. I have more late breaking data that talks about some of the more recent models that we can cover. But I think in general, what we’re kind of highlighting here is that models generally have a point in time where they were trained and then they don’t have new information anymore. And so that means they are not available, they are not aware of new versions, new vulnerabilities.
And any shifts in the public stance on these things since they were last trained. Now, what I found interesting in many cases is the models aren’t even recommending the latest that they know about. They’re picking some other version older. And I’ve, you know, I’ve had chats with Claude about that. Like, hey, build me a Spring app and it’s spring, pick Spring five. And I’m like, okay, do a, do an upgrade. And it goes, okay, I’ll upgrade it to like Spring seven or something. It’s like.
Okay, then why did you pick five? And it gives an answer like a toddler would explain why they wrote on the wall. It seems plausible, but you’re like, I think you’re just making that up. and so, you know, the, the point here is that this was a real challenge, but ⁓ C P servers allow us to compensate for that. Right. So the model control protocol allows the tools to reach out to other experts, ⁓ to ask for better information, to help produce, you know, more correct answers. And so.
That’s kind of what we’re trying to highlight here that just going with coding agents all by themselves is going to produce pretty, pretty bad results. know, it’s half the time, 27 % of the time it was just making up a version. You know, but, but, and, that one, that one’s interesting and it’s, it’s funny, but the build’s going to immediately fail, right? It becomes, it’s, it’s a failure case that becomes immediately obvious. So.
really you’re talking about wasted tokens wasted time because it’s just going to figure out its mistake fairly quickly. When it makes up a project that didn’t exist and the attackers figure out that this is a thing that happens fairly repeatedly, then they can go typo squat on those domains, right? We’ve seen those things happen. Yeah. Slop squat. Yes. Slop squat. know, so pick
Josh Bressers (23:55) No, no, not typo squat. Thank you, Seth Larson, slop squatting, which is such a marvelous term.
Brian Fox (24:02) pick a project name that you know AI likes to make up and then make it real for your malware. ⁓ That’s a thing, so there’s real risk there. The bigger risk is that just that the tools are in general producing low quality dependency stacks that then have to be fixed. And worse, if you’ve got…
you know, ⁓ sort of a dumbing down of the people that are developing software generally because it’s become more accessible. They don’t know or care what’s happening underneath the hood. They just want to look at it. So it’s going to lead to less secure, less stable software because these things are producing ⁓ worse recommendations out of the gate, right? So that’s kind of what we’re exploring here. What’s interesting is ⁓ we looked at this, we did the test.
just within the past few weeks and we tested for example, OPUS 46 and CHAT GPT-52 and a bunch of these other models. Well, what we found, one of the interesting thing is that yes, the hallucination rate is going down. We ran the exact same test with, for example, OPUS 46, the hallucination rate went from 27 % in…
in the 3.7 down to 6.2%. That sounds like tremendous progress. But when we actually looked at what was happening, inversely, the number of times it basically just didn’t choose to make a recommendation has gone way up, has gone from basically 10 % to now 30%. So what it seems like is the tools are basically saying, I’m not really sure very often. And so it just basically says, I don’t know, don’t do anything.
Right? So it’s sort of one of these things on first glance, it looks like the hallucination got better, but the result was it’s sort of like, I can’t make up a mind. So yeah, just use the version you’ve already got, you know, no recommendation. it, when we specifically said make a recommendation, make an update instead of making up a version that goes, nah, you’re good. So, which.
Josh Bressers (25:58) I gotcha.
Brian Fox (26:08) which arguably is again worse because it gives you a false sense of security versus I would rather make up something so I know it failed spectacularly and then I can go address it, right? So that’s what we found in our latest results. And I think we’ll do an addendum to the report in the next month or so kind of sharing those graphs, but it’s kind of interesting.
Josh Bressers (26:28) Interesting. Yeah.
And I suspect the like securing AI generated code and projects is going to be probably a topic for the rest of our lives if well and well beyond.
Brian Fox (26:41) Yeah. for sure.
Yeah. And we’re just going to be, we’re going to be the old guy saying we already solved that problem. Like we know about this, right? And it’s the same, same stuff all over again. ⁓ but the point of all of this is, is that with MCP capabilities, you can inject real time information about vulnerabilities, about popularity, about what components are compliant with your policy directly into the tool when it’s making the decision. when
in the sort of the first shot, if you will, the code comes out using the right frameworks and you don’t have to rework it. ⁓ And so in talking to our customers, we’re finding ⁓ similar things that the Dora report talked about. The Dora report last year was something like 90 % of people are organizations are experimenting with it, but like 35 % are the ones who are actually getting real value. Everybody else doesn’t trust the output. And we believe
Josh Bressers (27:13) Yeah.
Brian Fox (27:34) part of it is this, that you do the thing and it recommends a insane version or really old version or something that your company won’t allow anyway and it’s wasted time and effort. So we’re saying use an MCP server, get it integrated and make those things better. And like I mentioned before, we were sort of the accidental data.
people trying to fix the tools, well, that becomes really interesting because all those things we’ve been building for years, trying to make it so that you could scale human developers consistently in large organizations, make sure that they had good policy guidance around that, the good vulnerability information, making the best choices by all these different dimensions. We can now feed that to the AI agents directly via MCP and then the end results are tremendously better. And that’s part of what we’re showing.
in the chart on the economy of the latest version. ⁓ We give a chart here kind of comparing how much wasted effort, if you just went with what the LLM recommended out of the box, it’s 7,200 weeks of wasted effort across, what is this? It’s a hypothetical large organization with, I don’t remember how many.
200 app? No. There’s a lot of numbers in here. With a lot of applications.
Josh Bressers (28:57) Yeah. Yeah.
Brian Fox (29:04) across 856 enterprise applications is what we looked at. So in the study we looked at and we said, if you were right. So across those, if you were just going with the out of the box per application recommendations, you’re wasting 192 hours if per year, if you go and you constantly upgrading to the latest version, every time something changes, you’re wasting 300 hours per year, right? So it’s almost double just taking random kind of.
kind of recommendations. But what we’re suggesting is if you’re using better intelligence to be able to ⁓ produce better results that are consistent with the organization, you can cut that in half. So you’re able to get better output right away. ⁓ And the issue with latest, we did a study on this a couple of years ago ⁓ using similar math, is that these things ⁓ change.
what on average four times a year, many more than that now. And so if you’ve got an application with a Java application, easily have 150 dependencies. If it’s JavaScript, it could be 10 times that each of those changing four times a year. So now you’re going to update every time something changes. Like that’s crazy amount of entropy introduced into the system. And a lot of times for no real value, you know, not every time there’s a change, is it a security thing? Right. It’s you’re making change.
And it might not be a feature that you use or a bug that you didn’t experience. Right. So that’s why the math behind just using latest all the time isn’t, isn’t always the greatest answer. So I know you have thoughts on that. Yeah.
Josh Bressers (30:40) Okay, I want to push on that a little bit because
I find this topic fascinating, On one hand, you’ve got a whole bunch of people saying you should absolutely run the latest bleeding edge. You you can’t run old things. And this EOL discussion kicks in a little bit here, right? Because what…
Brian Fox (31:00) But the point
is those two things are not mutually exclusive. And that’s actually the point we’re making, right? Like there is a lot of choice between too old, unsupported and not upgradable and maybe never getting fixed. That’s one extreme. The other extreme is I’m taking every version every time it comes out. Right. And, and yeah.
Josh Bressers (31:17) Yes, but, but here’s my question
then. How do we tell them apart? Right? There’s, feel like there’s obviously very old things, right? And there’s, I don’t, I don’t know the answer, right? I like, like it’s easy to say like upgrade slowly, pay attention, but now we push the burden from dealing with breaking changes to who’s like.
Brian Fox (31:25) Mm-hmm.
Josh Bressers (31:40) Who is paying attention to know like when an event happens that should trigger an update, right? Which I don’t think we know how to do that very well today.
Brian Fox (31:49) We do. We have the data. We shared it in the report a few years ago, but it requires it. It’s not something you can do on your own. And this is where you have to get into the realm of like, yes, it’s a product, right? It’s not a thing that is out there freely available, but we are able to do it because we have under management, know, millions of enterprise applications. We run a giant public registry so we can see every upgrade that’s happening inside of Maven Central. We also have, you know, a hundred and
50,000 Nexus open sources in there that people are using. Many of them are ⁓ sending us telemetry so we can see upgrade patterns, not just from Java, but also JavaScript and Python and everything, know, cause Nexus proxies all of these different ecosystems. So you take all that data together and you can kind of look at what we call the herd. What is the herd doing? And, ⁓ you know, take a look at the older report.
That showed that the charts are really interesting. But if you if you picture, know a herd crossing a river with a with an island in the middle What you don’t want to be is you don’t want to be the first first, you know, take your pick ⁓ You know gazelle on the other side of the river because somebody might be waiting for you, right? That’s the equivalent of the latest version You also don’t want to be the last gazelle on the other side the first side of the river because somebody might be tracking you and now you’re screwed
You want to be in the middle somewhere. You want to be close enough to the front that if a vulnerability happens, it’s likely to be patched and it’s probably easier for you to upgrade and you don’t want to fall too far behind. But the answer, there is no one answer to that because the ecosystem, the way the project supports backporting, the way they support backwards compatibility changes how the herd migration goes. It’s not always a thin single file line.
But when you look at that across all versions, all users of those particular components, the patterns become very obvious. ⁓ And so that’s what we use to basically recommend the best version. And that’s kind of what we’re talking about in the math here. We can provide that now via MCP to AI agents, but this is a kind of algorithm we’ve had for a number of years. So we can say, like, what is the pattern? If you want us to auto-generate a pull request to make a recommendation,
Latest is going to burn a bunch of cycles and that’s not actually what you’re afraid of. What you’re afraid of is not being, you know, forget about it for three years. So how do we keep you in the safe zone?
Josh Bressers (34:15) Right. man, that’s, that’s fascinating. I mean, you think about this probably for the rest of the week at least. So.
Brian Fox (34:20) Yeah,
it’s a big data problem. That’s why I say it’s not the kind of thing. There is no simple answer. It kind of the answer for what you would do for spring is very different than pick any other component. Right. And, it’s interesting. And that’s why I would suggest people interested in this to, to, to look it up and maybe Josh, I’ll find which of the previous years report so you can put the link in there because the visualizations tell the story. There’s a couple of different ones that really kind of you look at it you go, Oh, I can see it.
Josh Bressers (34:39) Yeah.
Brian Fox (34:47) That’s cool. As a human, you can see it, but to write algorithms behind the scenes to make the machines make those decisions has historically been harder. AI, you probably could give it the image and it could figure it out just like we can. But it is really neat to look at the differences between the different projects and how that unfolds.
Josh Bressers (35:02) man.
I love it. love it. All right, Brian, it’s time, man. I’ll give you the floor. Take us home. What do you want us to know? What do you want us to do?
Brian Fox (35:14) man, take a look at the report. We put these things in here because they’re important and because they’re not solved problems. We’re not talking about problems that are well solved. know, malware is still a huge thing and growing because the industry is not doing enough to combat it. You know, number one, ⁓ you know, we’re still seeing poor practices and management of their dependencies and
I think there’s an opportunity to do a much better job with AI as people evolve it, but you have to be aware of the downsides and the gaps. The trust gap is kind of what we’re starting to call it between what the tools out of the box can recommend and where you can get to when you have intelligence added on top of it, the stuff that we’ve been talking about here. So how do you make a recommendation that is the best version versus do nothing, which might get you the oldest version.
⁓ So take a look at these things. It’s not an unsolvable problem. It is very solvable. It’s just people need to be aware of it and choose to take action.
Josh Bressers (36:14) Right on. All right, Brian, my friend, it’s been a treat as always. Thank you so much.
Brian Fox (36:19) Alright, thanks for having me.