I recently sat down with Brian Fox, CTO and co-founder of Sonatype, about a report they recently published about malware in open source ecosystems. This is something that’s not a surprise to anyone paying attention, but there are some things Sonatype is doing in this space that’s very clever. I’ve known Brian for a long time so it was a treat to catch up and see what they found, and what it means for the future.
2024 in Open Source Malware Report
This episode is also available as a podcast, search for “Open Source Security” on your favorite podcast player.
We’re gonna need a bigger boat
I’ll start with the most terrifying and important point in all this: there are currently over 820,000 malicious packages floating around in the open source ecosystem. Not vulnerabilities or bugs – intentionally malicious code designed to compromise systems and steal data. That number jumped by 42,000 packages in just the first two months of 2025 and it’s going to keep growing as the year progresses.
Brian had a great point during the chat about this malware. The target isn’t the end users of applications, the target is the developers using open source. That’s not really something I had thought about before now. We like to think about attacks on end users, not on developers. Developers exist in a space that many of our tools aren’t designed to work with.
The attacks happen the instant a dev installs a malicious package. Brian called it a “smash and grab”. The dev installs the package, the payload is delivered, and whatever data the attackers exfiltrated is gone."
We pick on NPM, a lot
The data shows the vast majority of attacks are coming from NPM. We pick on them quite a bit. There’s a lot to unwind in why NPM though. Firstly, it’s the biggest ecosystem by a wide margin. It’s the classic “why do you rob banks? It’s where the money is”. There are some things NPM does by default that also makes it easier to attack. It’s very very easy to add a package to NPM, you basically just sign up for an account and you’re done. There are no namespaces (this is an often argued topic though). NPM packages can execute scripts on install which is basically remote code execution, and NPM love to upgrade packages aggressively by default.
It’s easy to claim all these things are shortcomings in NPM, but it’s just as likely many of these are the reasons NPM is so popular. Regardless of what is good or bad, this is an ecosystem problem, and the ecosystems are where the solutions should exist. It should also be said that the ecosystems are grossly underfunded. It’s complicated.
We do touch on how Maven Central stack up in this context during the chat. Maven Central is where all the Java developers get their packages from. Maven does a bunch of things different and isn’t attacked nearly as much. Some of this is no doubt due to a higher barrier of entry, but we should keep in mind Maven is also several magnitudes smaller than NPM, so there is a numbers game to play.
Watching the behavior
One of my first questions for Brian was how they knew the packages were malicious. 820,000 isn’t a number you can vet with humans, it has to be robots doing the work. The way Sonatype does this is very clever I thought.
Brian compares their detection strategy to the evolution of credit card fraud prevention: Modern fraud detection looks for behavioral anomalies – like suddenly buying 15 TVs in another part of the country.
It’s not always possible to know what “normal” open source development looks like, but we can know what abnormal open source development looks like. For example, a project suddenly appearing with version number 1000, or a popular package unexpectedly depending on an obscure new library.
Apparently this detection picked up Alex Birsan’s famous “dependency confusion” attack six months before it became public knowledge, because his behavior deviated from normal development patterns.
New things are hard
During the chat Brian explained how security teams often respond to this threat. They like to talk about vulnerability management and scanning tools. But the thing is, developers are working in a space before these tools kick in. Even if we try to use a package repository to know what a developer is using, they can still download things to their machine. It’s a really hard problem.
Many of these components aren’t real component, so they won’t work, then the developer will realize their mistake, fix it, and it’s too late.
How does it get better?
An important part of this conversation was that we don’t really know how to fix this problem. The reality is the attackers have an incentive to create this malware, but the defenders don’t seem to have enough incentive to fix it properly. The repositories themselves suffer little when malware is uploaded, and we just keep using those repositories because it’s where the open source we need lives.
The open source world we’ve built runs on trust. Maybe that trust isn’t as strong as we want to believe. But the reality is, trust it or not, open source is here to stay.