We’ve already discussed the perils of code and composition scanning. If you’ve not already read those, you should go back to the beginning.
Now we’re going to discuss application scanning. The basic idea here is we have a scanner that interacts with a running application and looks for bugs. The other two scanners run against static content. A running application is dynamic and ever changing. If we thought code scanning was hard, this is even harder. Well it can be harder, it can also be easier. Sometimes.
Scanning a running application is hard
Back in the code scanning post, we talk about how much harder it is to scan a weakly typed language because there is a distinct lack of rules. Scanning an application can be comparable to this. It mostly depends what you’re scanning. There are many types of applications. Some have fancy user interfaces. Some are just libraries that get included in other things. Some are just APIs that you access over a network. Of course it’s common for many application scanners to pretend everything is the same.
First, a word on fuzzing. If you’ve never heard of fuzzing that’s OK. It was hugely popular a few years back as a way to stress test certain applications. The basic idea is you stress test inputs by subtly modifying whatever the application is trying to read. An example is you have a program that converts image files from one format to another. You take one good image and flip some bits at random, then see if it crashes your application. Repeat this a few million times and you’re fuzzing. Fuzzing is an example of application scanning that works really well, but generally only with languages that are not memory safe. Fuzzing is less effective on memory safe languages. We will not be talking about fuzzing in the rest of this post.
Scanning user interfaces
I want to start with application scanners that go after user interfaces. Today the most common user interface is a website accessed in a browser. It seems like this would be a space that is ripe for scanning since we can teach a robot to parse HTML. You would be right that we’re really good at scanning HTML. What we’re not really good at is finding good results when we scan HTML.
The single biggest challenge an application scanner has in almost every instance is a lack of situational awareness. What I mean by that is the scanner approaches a web app without any real knowledge about how the app works or what it does. The scanners just start throwing spaghetti at the walls and see what stick. It’s very common for a webapp scanner to tell you about security flaws affecting a webserver you’ve never even heard of, but because you returned some HTML snippet a webserver from 1994 returned once, they assume that’s what you’re using.
Webapp scanners also make many assumptions about the results they get back. If a scanners sends a request that they think should have returned a 400 error code, and they don’t get it, that’s going to end up in the report. My favorite example of this was a scanner putting a / at the end of every URL, which resulted in a json error message being returned, it reported it as a security flaw for every URL it could find. Every. Single. URL. It was very silly, and not very useful.
Some scanners will let you configure them to be a bit more clever with respect to your webapp. If you can configure them you should, but even with that the results aren’t going to be amazing. You’re going see an increase in quality from completely terrible to mostly terrible. You should definitely do some work to decide if running a webapp scanner makes sense for you. This is a great place to figure out some return on investment calculations. And of course keep in mind the big question of “what our reason for doing this”. If you want a more secure application, keep that in mind while you’re parsing the reports.
Scanning APIs
If your application has an API, that’s great news. APIs are ways for machines to talk to each other, so logically one would expect an application scanner to do a great job scanning an API for problems. One would expect …
The reality here is many application scanners will treat an API as if it was a user interface that is returning HTML. Many scanners will report things like error messages as being security flaws because they don’t respond in a way the scanner is familiar with. Web browsers don’t treat content that has a type of application/json as HTML. Scanners don’t seem to understand this.
If you are building an API using modern design principals it’s very likely you already have an application scanner running against your API. You just don’t call it an application scanner, you call it “continuous integration”. That’s right, if you have a robust test suite against your API, you can expect far better results from that than you’ll ever see from an automated scanner. If you have a finite budget, you should write more tests, not buy an application scanner for your APIs.
What can we do?
For the what action we can take part, I’m going to point at the conclusion for source code scanners rather than try to write something new and interesting. These things have been around for a while and they’ve not improved a lot. Everyone should calculate their own ROI here, but if I was writing the check I would look into composition scanning.
The next post will be far more exciting as we start to tackle how to parse a phone book sized report.
Part 5: Which of these security problems do I need to care about?