Part 3: Composition scanning

If you just showed up here, go back and start at the intro post, you’ll want the missing context before reading this article.

In this post we’re going to talk about a newer type of scanner called a composition scanner. The idea here is when you build an application today it’s never just what you wrote. It also includes source code from a large number of other sources. Usually these other sources are open source.

A composition scanner will look at your project, specifically the things you didn’t write, and attempt to alert you if you are including components that have known security vulnerabilities. It’s very common to not upgrade the open source we put into our projects. Upgrading is hard and can break things, so doing nothing is easier most of the time. Composition scanners let us see what’s hiding in the depths of our project, sometimes it isn’t very pretty.

An easy example we can use is if you are including OpenSSL code in your application. Do you know if the version of OpenSSL you are using is still vulnerable to Heartbleed? You probably can’t say for certain if this is true or not, but a composition scanner probably can.

Who let all this open source in?

Everything we build today is open source. I’m sure a non trivial number of readers just jumped up and shouted “no it isn’t”. There are two kinds of open source today. The open source we know is open source, and the open source we think isn’t.

I’m only half joking here. The reality is everything is filled to the brim with open source now. If you’re building anything and you aren’t using as much open source software as you can find, you’re a fool. Sure you have some of your own features you’ve written that aren’t technically covered by an open source license, but when that’s only 10% of your codebase, you better be willing to pay attention to the other 90%.

When we use open source in our products, projects, and businesses, we lose the protection of security through obscurity. “But security through obscurity doesn’t work!” You shout. It does actually work, until it doesn’t. There is a huge number of applications out there that are only secure because nobody has ever looked at them, and nobody ever will. These applications have existed for years and will continue to exist for years hidden behind the veil of obscurity.

But once we include some open source, we’re going to start getting noticed. It’s like bringing the best looking person to the party. Everyone notices. It’s a bit poetic that by downloading and using something free, the cost is attention. What I mean by this is if you have a public web site, there are people scanning the internet looking for certain known libraries. There are attackers scanning the internet for certain applications running on it. There are researchers scanning all the source code in github looking for known bad libraries. When your website was just some perl code you wrote in 1995, nobody noticed or cared. Now that’s you’re using AngularJS, everyone sees.

Updating dependencies is harder than not updating them

Now, as we mentioned just above, everything is open source now. Open source comes with a catch though. You include it in your application at a point in time, then the arrow of time marches forward. There are some who like to say software ages like milk, not wine. But I say it ages like humans. It’s exciting, good looking, and nimble when it’s young, then the ravages of time start to set in and things stop being young and beautiful pretty fast.

You have to update your open source dependencies. You can’t just grab some code off the internet and forget it’s there. There are going to be newer versions that fix bugs and security issues. You’ll want those fixes. This is further complicated because sometimes the new version of an open source library will break your application. It could be a bug in the new version, it could be you were using something incorrectly, or they might just break it on purpose because they decided the old way was bad. Every time you pull in a new version of something there will be a cost. Sometimes the cost is as small as pulling in the new version. Sometimes the cost will be refactoring half of your application.

The current best practice advice is to keep your dependencies updated. One could easily write a book debating the pros and cons of updating the open source you use, and there are many ways to manage this problem. Rather than delve into that problem right now, let’s just stick with the idea that we should be updating our open source, but that then leads to the question of when. If we pull in every update for every dependency, that’s going to be a lot of churn. We probably want to update things in a way that makes sense and is manageable. We all have more work to do than time to do it, so we have to be smart about these updates.

Which of these security problems do I need to care about?

OK, so let’s assume if you made it this far you agree that software is incredibly complex. It’s also mostly open source, and that open source will have security flaws that need to be fixed. But you also can’t fix everything, you can only fix some things.

As we’ve already mentioned several times, there are going to be false positives, true positives, and false negatives. The vast majority of your findings will still be false positives, but in composition scanning, there are two types of false positives. There are false positives of the sort where the vulnerability reported doesn’t exist in your dependency. And there is the false positive where the vulnerability exists in the dependency, but you don’t use the vulnerable code, so you’re not vulnerable.

This idea of a vulnerability existing in a dependency but being a false positive can be difficult to understand, here is a laughably simple example. Let’s say you have a library in your application that has two features. One of the features is designed to remove dangerous HTML from strings. The other feature adds two numbers together (this is meant to be a very simple and ridiculous example). You only need the feature that adds numbers together so you ignore the string sanitizer. There was a number of security vulnerabilities found in the string sanitizer, but since you don’t use it, you never upgraded the library. Now that you run a composition scanner, you see it lights up like a Christmas tree due to all the unfixed vulnerabilities in the sanitizer. The vulnerable code is there, but you don’t use it, are you vulnerable? There isn’t a single answer to this question. I say it is a false positive, but it’s up to you really. This is a very common type of false positive with composition scanners.

What now?

It’s important we keep in mind why we are running these scanners. Are we doing it just to run a scanner, or are we doing it to make our application more secure? I think today a lot of users are running them for the sake of running them. But the real reason should be to make things more secure. A vulnerability an attacker can’t exploit isn’t a vulnerability. It’s important we invest our limited resources into fixing vulnerabilities that attackers can attack. There is a lot of nuance in this explanation, I expect to write some future posts about it after this series is complete.

As composition scanners are the kid on the block they also currently show the most promise. But that optimism is worthless if we don’t work with the scanner vendors. Today the scanners produce relatively low quality results, the number of false positives are still unacceptably high and the reports are enormous. But composition scanning is a much easier problem to understand than any of the other scanning problems. I do think it has a bright future.

Part 4: Application scanning