I’ve started a project to put the CVE data into Elasticsearch and see if there is anything clever we can learn about it. Ever if there isn’t anything overly clever, it’s fun to do. And I get to make pretty graphs, which everyone likes to look at.
I stuck a few of my early results on Twitter because it seemed like a fun thing to do. One of the graphs I put up was comparing the 3 BSDs. The image is below.
You can see that none of these graphs has enough data to really draw any conclusions from, again, I did this for fun. I did get one response claiming NetBSD is the best, because their graph is the smallest. I’ve actually heard this argument a few times over the past month, so I decided it’s time to write about it. Especially since I’m sure I’ll find many more examples like this while I’m weeding through this mountain of CVE data.
Let’s make up a new law, I’ll call it the “Inverse Law of CVEs”. It goes like this - “The fewer CVE IDs something has has, the less secure it is”.
That doesn’t make sense to most people. If you have something that is bad, fewer bad things is certainly better than more bad things. This is generally true for physical concepts brains can understand. Less crime is good. Fewer accidents is good. When it comes to something like how many CVE IDs your project or product has, this idea gets turned on its head. Less is probably bad when we think about CVE IDs. There’s probably some sort of line somewhere where if you cross it things flip back to bad (wait until I get to PHP). We’ll call that the security maginot line because bad security decided to sneak in through the north.
If you have something with very very few CVE IDs it doesn’t mean it’s secure, it means nobody is looking for security issues. It’s easy to understand that if something is used by a large diverse set of users, it will get more bug reports (some of which will be security bugs) and it will get more security attention from both good guys and bad guys because it’s a bigger target. If something has very few users, it’s quite likely there hasn’t been a lot of security attention paid to it. I suspect what the above graphs really mean is Free BSD is more popular than OpenBSD, which is more popular than NetBSD. Random internet searches seem to back this up.
I’m not entirely sure what to do with all this data. Part of the fun is understanding how to classify it all. I’m not a data scientist so there will be much learning. If you have any ideas by all means let me know, I’m quite open to suggestions. Once I have better data I may consider trying to find at what point a project has enough CVE IDs to be considered on the right path, and which have so many they’ve crossed over to the bad place.