Who are the experts

These are certainly strange times we are living in. None of us will ever forget what’s happening and we will all retell stories for the rest of our days. Many of us asked “tell me about the depression grandma”, similar questions will be asked of us someday.

The whirlwind of confusion and chaos got me thinking about advice and who we listen to. Most of us know a staggering number of people who are apparently experts in immunology. I have no intention of talking about the politics of the current times, goodness knows nobody in their right mind should care what I think. What all this does have me pondering is what are experts and how can we decide who we should listen to?

So I’ve been thinking a lot about “experts” lately. Especially in the context of security. There have been a ton of expert opinions on how to work from home, and how to avoid getting scammed, which video conferencing software is the best (or worst). There are experts everywhere, but which ones should we listen to? I’m not an expert in anything, but there are some topics I know enough about to question some of these “experts”.

It seems like everyone has something to say about almost everything these days. It feels a bit like the market outside the train station. Whatever you need, someone is selling it, but you better buy it fast because everyone else also wants one!

I have a tweet from a few weeks ago when I really started to think about all this, I called it “distance to the work”

The basic idea is if someone is trying to post themselves as an expert on a topic, how close are actually to the topic. One of my favorite examples is when I see talks about DevSecOps. I’ve known people who have given DevSecOps talks that have never been developers or system administrators or worked in the field of security. In my mind you aren’t qualified to impart knowledge you don’t have. There are certain ideas they can grasp and understand, but part of being an expert at something is having done it, often for a long time. Would you let someone operate on you because they thought about the problem really hard and decided they are now a surgeon? Of course not!

So this brings to a place where we have to start deciding who we should be listening to. I like to break people up into a few groups in my mind when deciding if they should be listened to.

  1. Have they ever done actual work in this space?
  2. Do they have a history of doing work in this space, but aren’t currently?
  3. Are they doing work in this space now?

It’s not hard to see where I’m going with this. I think we all know people who fall into every group. It’s very related to my distance to the work idea. If someone has never done the work, I’m not going to consider them an expert. One the poster children for this is whenever someone titles themselves a “thought leader”. That’s usually double speak for “I have no idea what I’m doing but I have very nice clothes and speak very well”. For a number of these people, their primary skill is speaking well, so they can sound smart, but they can’t fool the real experts.

There are also groups of people who did a lot of work in a space long ago, but aren’t very active now. An easy example here would be the Apollo astronauts. Are these people experts on going to the moon? Yes. Are they experts on space? Yes. Would I trust them to help build a modern day rocket? Probably not.

There are plenty of parallels here in any industry. There are plenty of people who did amazing things a decade ago, but if you look at what they’ve done recently, a resume of “talking about the awesome thing I did a decade ago” doesn’t make them an expert on modern day problems. Look at what people are doing now, not what they did.

And lastly we have our group of people who are actual doing the work. These are the people who are making a real difference every day. Many of these people rarely talk about what they do, many don’t have time because they’re busy working. I find there are two challenges when trying to listen to the people doing the real work.

Firstly, they’re usually drown out by other making more noise. If your job is getting attention, your incentive is, well, getting attention. When your job is doing technical tasks, you’re not going to fight for attention. This means it’s up to us as the listener to decide who is full of gas and who can teach us new things. It’s a really hard problem.

The second problem is finding the people doing the work. They aren’t going to a lot of conferences. They’re usually not publishing blog articles 😎. You won’t find them on social media with millions of followers. A lot actively avoid attention for a variety of reasons. Some don’t have time, some got burnt and don’t want to stick their neck out, some just don’t want to talk to anyone else. The reason is unimportant, it is what it is.

I could end this one with some nonsense about getting outside your comfort zone and making more effort to encourage other to talk about what they’re doing, but I don’t want to. If people don’t want to give talks and write blogs, great, I’m tired of seeing an industry that bases success on how many conferences you attend each year. My suggestion this time is to just look around. You are working with people who are making a real difference. Find them. Talk to them (don’t be a pest). Go learn something new.

Part 6: What do we do now?

Well, we’ve made it to the end. What started out as a short blog post ended up being 7 posts long. If you made it this far I commend you for your mental fortitude.

I’m going to sum everything up with these 4 takeaways.

  1. Understand the problem we want to solve
  2. Push back on scanner vendors
  3. Work with your vendors
  4. Get involved in open source

Understand the problem we want to solve

In security it’s sometimes easy to lose sight of what we’re really trying to do. Running a scanner isn’t a goal in itself, the goal is to improve security, or it should be if it isn’t. Make sure you never forget what’s really happening. Sometimes in the excitement of security, the real reason we’re doing what we do can be lost.

I always hate digging out the old trope “what’s the problem we’re trying to solve” but in this instance I think it’s a good question to ask yourself. Defining problems is really hard. Staying on goal is even harder.

If we think our purpose is to run the scanners, what becomes our goal? The goal will be to have a clean scan. We know a clean scan is impossible, so what really happens is our purpose starts to twist itself around a disfigured version of reality. I’ve said many times the problem is really insecure applications, or at least that’s the problem I tend to think about. You have to figure this out for yourself. If you have a scanner running make sure you know why.

Push back on scanner vendors

When a scan has 80% false positives, that’s not because your project isn’t built well, it’s because the scanner has a lot of serious bugs. High false positive rates mean the product is broken, it doesn’t mean your project is broken. Well, it might be, but probably not. The security industry has come to accept incredibly high false positive rates as normal. We have to break this cycle. It holds the industry back and it makes our jobs horrible. If you are a scanner vendor go make a sign that says “ZERO FALSE POSITIVES” and hang it on the wall. That’s your new purpose.

Setup a weekly or monthly call with your vendor. Make sure they understand your purpose and goals (remember your purpose isn’t just to run a scanner). Make them help you through rough patches. If you feel pain because of their product, they should feel it with you. A vendor who won’t work with you is a vendor who needs to be replaced. Good vendors are your partner, your success is their success. Your pain is their pain.

Work with your vendors

Now, when you find a scanner that has a lot of bugs, you basically have two choices. You can give up (and sometimes this is an acceptable decision). Or you can work with the vendor. At this stage in the technology, it’s very important we start working with these vendors. Report every false positive as a bug. Make them answer hard questions about the results you see. If nobody pushes back we’re going to see worse results in the future, not better. Products improve because of feedback. They don’t improve if we all just pretend everything is fine. I do think part of the reason code and application scanning seems to have plateaued is because we accepted poor results as normal.

If you are a vendor, remember that reports about false positives are gifts. Make sure you treat false positives like they are  important bugs. Like all new markets, there will be winners and there will be losers. If your scanner reports the most results, but most of those are false positives, that’s not progress. The scanners with the most false positives will be on the losing side of history.

Get involved in open source

And lastly, help. This sounds like the old “patches welcome” we all love to throw around, but in this case I’m quite serious. Your product is basically open source. Some of the projects you are working with could use a hand to fix some of these findings. As annoying as you think a huge scan report is, imagine getting one when you’re working for free. It’s insulting and degrading. If you have actual real scan results that need fixing in an open source project, don’t dump it over the fence, get in there and help fix problems. If you include a dependency in your project, sending a patch upstream is the same as patching your own application.

If you’ve never contributed to open source it can be a terrifying. I just spent longer than I want to admit trying to find a nice “getting started in open source” guide. I wasn’t terribly impressed with any of what came up (if you know of one let me know, I’ll link it at the bottom). I’m going to start writing a post titled “How to get involved in open source for security people”, but until then, my advice is just go help. You know how github works, get in there and help. Be patient and kind, apologize when you mess up and don’t be afraid to ask stupid questions. Some say there are no stupid questions. There totally are, but that’s not a reason to be a jerk when asking or answering.

What now?

The one ask I have of everyone reading this is to help educate other on this extremely complicated an important topic of security scanners. It’s important we approach others with empathy and understanding. Security has a long history of being ill tempered and hard to work with. If someone is misunderstanding how a security scanner works or what it does, it’s our opportunity to help them understand. These scanners are too important to ignore, and they need a lot of work. We don’t change an industry by being annoying idiots. We change it by being respected partners.

I think we have an opportunity to see composition scanners make things better in the future. Composition security has been a taboo topic for a very long time. Many of us knew what was happening but we didn’t do anything because we didn’t have a solution. Security through obscurity works, until it doesn’t. There’s a lot of work to do, and we have to do it together.

Now go and teach someone something new.

Part 5: Which of these security problems do I need to care about?

If you just showed up here, go back and start at the intro post, you’ll want the missing context before reading this article. Or not, I mean, whatever.

I’ve spent the last few posts going over the challenges of security scanners. I think the most important takeaway is we need to temper our expectations. Even a broken clock is right twice a day. So assuming some of the security flaws reported are real, how can we figure out what we should be paying attention to?

I ran the scan

If you ran a security scanner, running it is the easy part. What you do with the results of your scan is a challenge. I’ve seen teams just send the scan along to the developers without even looking at it. Never do this. This tells your developers two very important thing. 1) You think your time is worth more than theirs. 2) You aren’t smart enough to parse the scan. Even if one or both of these are true, don’t just dump these scans on someone else. If you ran it, you own it. Suddenly that phone book of a scan is more serious.

When you have the result of any security report, automated or human created, how you deal with the results depends on a lot of factors. Every organization has different process, different resources, and different goals. It’s super important to keep in mind the purpose of your organization, resolving security scan reports probably isn’t one of them. Why did you run this scan in the first place? If you did it because everyone else is doing it, reading this blog series isn’t going to help you. Fundamentally we want to run these scanners to make our products and services more secure. That’s the context we should read these reports. Which of these findings make my product or service less secure? And which findings should I fix to make it more secure?

I was given a scan

If you were given a scan, good luck. As I mention in the previous section. If you were given one of these scans and it’s pretty clear the person giving it to you didn’t read it, there’s nothing wrong with pushing back by asking for some clarification. There’s nothing more frustrating than someone handing you a huge scan with the only comment being “please fix”. As we’ve covered at length, a lot (almost all) of these results are going to be false positives. Now you have to weed through someone else’s problem and try to explain what’s happening.

I’ve seen cases where a groups claim they can’t run an application unless the scan comes back clean. That’s not a realistic goal. I would compare it to only buying computers that don’t crash. You can have it as a requirement, but you aren’t going to find one no matter how hard you try. Silly requirements lead to silly results.

Falsifying false positives

If you ran the scan or you were handed a scan, one of the biggest jobs will be figuring out which results are false positives. I don’t know of a way to do this that isn’t backbreaking manual labor. Every finding has a number of questions that you have to answer “yes” to in order for the finding to matter.

  1. Do you actually include the vulnerable dependency?
  2. Is version you’re using is affected by the issue?
  3. Are you use the feature in your application?
  4. Can attackers exploit the vulnerability?
  5. Can attackers use the vulnerability to cause actual harm?

As humans it’s hard work to do these steps, it’s likely you can’t do them by yourself. Find some help, don’t try to do everything yourself.

One really important thing to do as you are answering these questions is to document your work. Write down as much detail as you can because in three months you’re not going to remember any of this. Also, don’t use whatever scanner ID you get from the vendor, use the CVE ID. Every scanner should be reporting CVE IDs (if they don’t, that’s a bug you should report). Then if you run a second scanner you can know right away if something has already been investigated since you’ve already documented the CVE ID. Using only scanner IDs isn’t useful across vendors.

Parsing the positive positives

Let’s make the rather large leap from running a scan to having some positive positives to deal with. The false positives have been understood, or maybe the scanners have all been fixed so there aren’t any false positives! (har har har) Now it’s time to deal with the actual findings.

The first and most important thing to understand is all of the findings aren’t critical. There is going to be a cornucopia of results. Some will be critical, some will be low. Part of our job is to rank everything in an order that makes sense.

Don’t trust the severity the scanner gives you. A lot of scanners will assign a severity rating to the findings. They have no idea how you’re using a particular piece of code or dependency. Their severity ratings should be treated with extreme suspicion. They could be an easy way for a first pass ranking, but those rating shouldn’t be used for anything after the first pass. I’ll write a bit more on where these severities come from in a future post, the short version is the sausage is made with questionable ingredients.

It makes a lot of sense to fix the critical findings first, nobody will argue this point. A point that is a bit more contentious is not fixing low and moderate findings, at least not at first. You have finite resources. If fixing the critical issues consume all of your resources, that’s OK. You can mark low findings in a way that says you’re not fixing them now, but might fix them later. If your security team comes back claiming that’s not acceptable and you have to fix everything, I suggest a very hearty “patches welcome” be sent to them. In typical software development minor bugs don’t always gets fixed. Security bugs are just bugs, fix the important stuff first, don’t be afraid to WONTFIX silly things.

It’s also really important to avoid trying to “fix” everything just to make the scanner be quiet. If your goal is a clean report, you will suffer other consequences due to this. Beware the cobra effect.

Can’t we all just get along

The biggest takeaway from all of this is to understand intent and purpose. If you are running a scanner, understand why. If you’re receiving a report, make sure you ask why it was run and the expectations of whoever gave it to you. It’s generally a good idea not to assume malice, these scanners are very new and there is a huge knowledge gap, even with the people who historically would consider themselves security experts. It can get even more complicated because there’s a lot of open source thrown into the mix. The amount of knowledge needed for this problem is enormous, don’t be afraid to ask lots of questions and seek out help.

If you are doing the scanning, be patient.

If you are receiving a scan, be patient.

Remember, it’s nice to be nice.

Part 6: What do we do now?

Part 4: Application scanning

We’ve already discussed the perils of code and composition scanning. If you’ve not already read those, you should go back to the beginning.

Now we’re going to discuss application scanning. The basic idea here is we have a scanner that interacts with a running application and looks for bugs. The other two scanners run against static content. A running application is dynamic and ever changing. If we thought code scanning was hard, this is even harder. Well it can be harder, it can also be easier. Sometimes.

Scanning a running application is hard

Back in the code scanning post, we talk about how much harder it is to scan a weakly typed language because there is a distinct lack of rules. Scanning an application can be comparable to this. It mostly depends what you’re scanning. There are many types of applications. Some have fancy user interfaces. Some are just libraries that get included in other things. Some are just APIs that you access over a network. Of course it’s common for many application scanners to pretend everything is the same.

First, a word on fuzzing. If you’ve never heard of fuzzing that’s OK. It was hugely popular a few years back as a way to stress test certain applications. The basic idea is you stress test inputs by subtly modifying whatever the application is trying to read. An example is you have a program that converts image files from one format to another. You take one good image and flip some bits at random, then see if it crashes your application. Repeat this a few million times and you’re fuzzing. Fuzzing is an example of application scanning that works really well, but generally only with languages that are not memory safe. Fuzzing is less effective on memory safe languages. We will not be talking about fuzzing in the rest of this post.

Scanning user interfaces

I want to start with application scanners that go after user interfaces. Today the most common user interface is a website accessed in a browser. It seems like this would be a space that is ripe for scanning since we can teach a robot to parse HTML. You would be right that we’re really good at scanning HTML. What we’re not really good at is finding good results when we scan HTML.

The single biggest challenge an application scanner has in almost every instance is a lack of situational awareness. What I mean by that is the scanner approaches a web app without any real knowledge about how the app works or what it does. The scanners just start throwing spaghetti at the walls and see what stick. It’s very common for a webapp scanner to tell you about security flaws affecting a webserver you’ve never even heard of, but because you returned some HTML snippet a webserver from 1994 returned once, they assume that’s what you’re using.

Webapp scanners also make many assumptions about the results they get back. If a scanners sends a request that they think should have returned a 400 error code, and they don’t get it, that’s going to end up in the report. My favorite example of this was a scanner putting a / at the end of every URL, which resulted in a json error message being returned, it reported it as a security flaw for every URL it could find. Every. Single. URL. It was very silly, and not very useful.

Some scanners will let you configure them to be a bit more clever with respect to your webapp. If you can configure them you should, but even with that the results aren’t going to be amazing. You’re going see an increase in quality from completely terrible to mostly terrible. You should definitely do some work to decide if running a webapp scanner makes sense for you. This is a great place to figure out some return on investment calculations. And of course keep in mind the big question of “what our reason for doing this”. If you want a more secure application, keep that in mind while you’re parsing the reports.

Scanning APIs

If your application has an API, that’s great news. APIs are ways for machines to talk to each other, so logically one would expect an application scanner to do a great job scanning an API for problems. One would expect …

The reality here is many application scanners will treat an API as if it was a user interface that is returning HTML. Many scanners will report things like error messages as being security flaws because they don’t respond in a way the scanner is familiar with. Web browsers don’t treat content that has a type of application/json as HTML. Scanners don’t seem to understand this.

If you are building an API using modern design principals it’s very likely you already have an application scanner running against your API. You just don’t call it an application scanner, you call it “continuous integration”. That’s right, if you have a robust test suite against your API, you can expect far better results from that than you’ll ever see from an automated scanner. If you have a finite budget, you should write more tests, not buy an application scanner for your APIs.

What can we do?

For the what action we can take part, I’m going to point at the conclusion for source code scanners rather than try to write something new and interesting. These things have been around for a while and they’ve not improved a lot. Everyone should calculate their own ROI here, but if I was writing the check I would look into composition scanning.

The next post will be far more exciting as we start to tackle how to parse a phone book sized report.

Part 5: Which of these security problems do I need to care about?

Part 3: Composition scanning

If you just showed up here, go back and start at the intro post, you’ll want the missing context before reading this article.

In this post we’re going to talk about a newer type of scanner called a composition scanner. The idea here is when you build an application today it’s never just what you wrote. It also includes source code from a large number of other sources. Usually these other sources are open source.

A composition scanner will look at your project, specifically the things you didn’t write, and attempt to alert you if you are including components that have known security vulnerabilities. It’s very common to not upgrade the open source we put into our projects. Upgrading is hard and can break things, so doing nothing is easier most of the time. Composition scanners let us see what’s hiding in the depths of our project, sometimes it isn’t very pretty.

An easy example we can use is if you are including OpenSSL code in your application. Do you know if the version of OpenSSL you are using is still vulnerable to Heartbleed? You probably can’t say for certain if this is true or not, but a composition scanner probably can.

Who let all this open source in?

Everything we build today is open source. I’m sure a non trivial number of readers just jumped up and shouted “no it isn’t”. There are two kinds of open source today. The open source we know is open source, and the open source we think isn’t.

I’m only half joking here. The reality is everything is filled to the brim with open source now. If you’re building anything and you aren’t using as much open source software as you can find, you’re a fool. Sure you have some of your own features you’ve written that aren’t technically covered by an open source license, but when that’s only 10% of your codebase, you better be willing to pay attention to the other 90%.

When we use open source in our products, projects, and businesses, we lose the protection of security through obscurity. “But security through obscurity doesn’t work!” You shout. It does actually work, until it doesn’t. There is a huge number of applications out there that are only secure because nobody has ever looked at them, and nobody ever will. These applications have existed for years and will continue to exist for years hidden behind the veil of obscurity.

But once we include some open source, we’re going to start getting noticed. It’s like bringing the best looking person to the party. Everyone notices. It’s a bit poetic that by downloading and using something free, the cost is attention. What I mean by this is if you have a public web site, there are people scanning the internet looking for certain known libraries. There are attackers scanning the internet for certain applications running on it. There are researchers scanning all the source code in github looking for known bad libraries. When your website was just some perl code you wrote in 1995, nobody noticed or cared. Now that’s you’re using AngularJS, everyone sees.

Updating dependencies is harder than not updating them

Now, as we mentioned just above, everything is open source now. Open source comes with a catch though. You include it in your application at a point in time, then the arrow of time marches forward. There are some who like to say software ages like milk, not wine. But I say it ages like humans. It’s exciting, good looking, and nimble when it’s young, then the ravages of time start to set in and things stop being young and beautiful pretty fast.

You have to update your open source dependencies. You can’t just grab some code off the internet and forget it’s there. There are going to be newer versions that fix bugs and security issues. You’ll want those fixes. This is further complicated because sometimes the new version of an open source library will break your application. It could be a bug in the new version, it could be you were using something incorrectly, or they might just break it on purpose because they decided the old way was bad. Every time you pull in a new version of something there will be a cost. Sometimes the cost is as small as pulling in the new version. Sometimes the cost will be refactoring half of your application.

The current best practice advice is to keep your dependencies updated. One could easily write a book debating the pros and cons of updating the open source you use, and there are many ways to manage this problem. Rather than delve into that problem right now, let’s just stick with the idea that we should be updating our open source, but that then leads to the question of when. If we pull in every update for every dependency, that’s going to be a lot of churn. We probably want to update things in a way that makes sense and is manageable. We all have more work to do than time to do it, so we have to be smart about these updates.

Which of these security problems do I need to care about?

OK, so let’s assume if you made it this far you agree that software is incredibly complex. It’s also mostly open source, and that open source will have security flaws that need to be fixed. But you also can’t fix everything, you can only fix some things.

As we’ve already mentioned several times, there are going to be false positives, true positives, and false negatives. The vast majority of your findings will still be false positives, but in composition scanning, there are two types of false positives. There are false positives of the sort where the vulnerability reported doesn’t exist in your dependency. And there is the false positive where the vulnerability exists in the dependency, but you don’t use the vulnerable code, so you’re not vulnerable.

This idea of a vulnerability existing in a dependency but being a false positive can be difficult to understand, here is a laughably simple example. Let’s say you have a library in your application that has two features. One of the features is designed to remove dangerous HTML from strings. The other feature adds two numbers together (this is meant to be a very simple and ridiculous example). You only need the feature that adds numbers together so you ignore the string sanitizer. There was a number of security vulnerabilities found in the string sanitizer, but since you don’t use it, you never upgraded the library. Now that you run a composition scanner, you see it lights up like a Christmas tree due to all the unfixed vulnerabilities in the sanitizer. The vulnerable code is there, but you don’t use it, are you vulnerable? There isn’t a single answer to this question. I say it is a false positive, but it’s up to you really. This is a very common type of false positive with composition scanners.

What now?

It’s important we keep in mind why we are running these scanners. Are we doing it just to run a scanner, or are we doing it to make our application more secure? I think today a lot of users are running them for the sake of running them. But the real reason should be to make things more secure. A vulnerability an attacker can’t exploit isn’t a vulnerability. It’s important we invest our limited resources into fixing vulnerabilities that attackers can attack. There is a lot of nuance in this explanation, I expect to write some future posts about it after this series is complete.

As composition scanners are the kid on the block they also currently show the most promise. But that optimism is worthless if we don’t work with the scanner vendors. Today the scanners produce relatively low quality results, the number of false positives are still unacceptably high and the reports are enormous. But composition scanning is a much easier problem to understand than any of the other scanning problems. I do think it has a bright future.

Part 4: Application scanning

Part 2: Scanning the code

If you just showed up here, go back and start at the intro post, you’ll want the missing context before reading this article.

The first type of scanner we’re going to cover are source code scanners. It seems fitting to start at the bottom with the code that drives everything. Every software project has source code. It doesn’t matter what language you use. Some is compiled, some interpreted, it’s all still source code. The idea behind a source code scanner is to review the code a human wrote and find potential security problems with it. This sounds easy enough in theory, but it’s extremely difficult in practice.

Strongly typed languages like C, C++, and Java lend themselves to code scanning. An oversimplified explanation would be a strongly typed language is one where a named variable has to be a certain type. For example if I have a variable named “number” that is a number, I can’t assign a string to it. It can only be a number.

Weakly typed languages, such as JavaScript and Python are incredibly difficult to properly scan. These are languages where I can assign the string “potato” to my variable named “number”. While weakly typed languages offer great flexibility to developers, they are a nightmare for code scanners.

I have software and nobody knows how it works

Software today is infinitely complex. That statement isn’t a joke, it really is infinitely complex. There is no limit to what computers, and by extension software, can do (this is a concept called Turing Complete). An infinitely complex problem will have an infinitely complex solution. It’s important to keep in mind how big infinity is. Since humans can barely solve finite problems, it’s safe to say we can’t actually solve problems that are infinitely complex, even with a scanner. Now just because you can’t solve a problem doesn’t mean you can’t make things better. There’s a lot of space between “solved” and “do nothing”.

So the real problem is basically if you have software running today in any environment, it’s so complex nobody really knows how it all works. If you write software, you’re going to accidentally include security vulnerabilities. Finding those vulnerabilities is a nearly impossible task in many instances. One way to try to uncover some of them is, you guessed it, scanning the source code for security vulnerabilities.

Trying to scan for those flaws is really really hard problem it turns out.

The only thing harder than writing secure software is writing a code scanner

So if software is infinitely complex, it’s safe to say building a scanner is more complex than infinity. I’m not sure what that is, but I’m comfortable assuming it’s really hard. Being able to scan code that can do anything is an incredibly difficult problem. Now, just because it’s really hard doesn’t mean we should do nothing, but it’s important we have reasonable expectations. When I point out shortcomings in something it doesn’t mean we should throw our hands up and declare the problem too hard to solve. This has been the default reaction in the security industry to many problems. It doesn’t work.

A code scanner isn’t going to catch all your bugs. It’s probably not going to catch half of your bugs. Code scanners are plagued by the problem of very high false positive rates and extremely high false negative rates. Most code scanners can only find a certain subset of security vulnerabilities, and of the subset they can find, they will be wrong a lot.

I mentioned strongly and weakly typed languages in the intro. You can imagine that weakly typed languages are incredibly difficult to scan. The flexibility you gain from not defining types can lead to a lot of complexity. Having a subroutine that can return an integer or a string means now your scanner has to try and figure out what is getting returned, and hope it can solve if there are going to be problems when processing the output.

Scanning a strongly typed language will have a slightly higher level of success if you structure your code in a way the scanner likes. Some scanners can be augmented with certain comments to help it understand what’s happening. Even if you do everything right your scanner will have a high number of false positives. Scanning code is hard.

The other important thing to keep in mind is these scanners generally only pick up a subset of possible security vulnerabilities. Even if you ran a code scanner and it came back clean, you should not assume your code is free from security vulnerabilities. Scanners tend to be good at finding problems like buffer overflows, but not good at finding logic problems for example.

Every scanner will also have false positives. Some scanners will have a lot of false positives. As mentioned in the last post, make sure you report false positives to the scanner vendors. They are bugs. False negatives are also bugs, but they’re a lot harder to pick out and report.

What can we do?

I would love to tell you security code scanners will get better with time. They’re already about a decade old and the progress we’ve seen is not super impressive. Like most technology you should understand your return on investment for using a code scanner. If that return is negative, you’re wasting resources scanning the code.

One of the most dangerous traps we can fall into in security is using tools or processes “because that’s the way we do it”. We should always be evaluating everything we do constantly and making it better. Because of the arrow of time, a process that isn’t getting better is getting worse. Nothing ever just stays the same. using this logic I would probably argue code scanners are mostly staying the same (feel free to draw a conclusion here). Newer, safer, languages are likely the future, not better cod scanners.

In the next post we will cover composition scanners. Composition scanner is newer and currently shows promise. It’s also a problem that’s a lot easier to understand and solve than code scanning is.

Part 1: Is your security scanner running? You better go catch it!

This post is the first part in a series on automated security scanners. I explain some of the ideas and goals in the intro post, rather than rehashing that post as filler, just go read it, rehashing content isn’t exciting.

There are different kinds of security scanners, but the problem with all of them is basically the same. The results returned by the scanners are not good in the same way catching poison ivy is not good. The more you have, the worse it is. The most important thing to understand, and the whole reason I’m writing this series, is that scanners will get better in the future. How they get better will be driven by all of us. If we do nothing, they will get better in a way that might not make our lives easier. If we can understand the current shortcomings of these systems, we can better work with the vendors to improve them in ways that will benefit everyone.

The quick win: I did something, and something is better than nothing!

One of the easiest problems we can see when running a security scanner is the idea that doing anything is better than doing nothing. Sometimes this is true, sometimes it’s not. You have to decide for yourself if running a scanner is better than not running a scanner, every organization is different. I see a common theme in the security industry where we take actions but then never follow through. Just running a scanner isn’t the goal, the goal is making our products and services more secure.

If you’re running scans and not reviewing the output, doing something is not better than doing nothing. What I see on a regular basis is a security team handing a phone book sized scan report to the development team, demanding they fix everything clearly not having even looked at it. If you can’t be bothered to review the massive report, why should the dev team? A good dev team will push back with a firm “no”. Even if they wanted to review all the findings, practically speaking it can’t be done in a reasonable amount of time.

So what do we expect to see in the report we didn’t read?

False positives

False positive findings are probably the single biggest problem with security scanners today. A false positive is when the scanner flags a particular line of source code, or dependency, or application behavior, as being a security problem. A very high percentage of the time whatever findings the report spits out will be a false positive. This is problematic as dealing with false positive results have a cost.

No scanner will ever have zero false positives, but a scanner than has magnitudes more false positives than actual positives is a broken scanner. Anything it tells you should be treated with skepticism. I did some searching to find what other industries consider an acceptable false rate to be. I couldn’t find anything I want to link to, it’s all quite dull, but the vast majority of industries seem to find somewhere around 1%-10% in the acceptable category. If I had to guess, most scanners today have a false positive rate very close to 100%. I’ve seen many scan reports where all of the findings were false positives. This is not acceptable.

Our job as users of security scanners is to report false positives to whoever creates the scanner. They won’t get better if we don’t tell them what we’re seeing. When you’re paying a company for their product it’s quite appropriate to make reasonable requests. One challenge any product team has is receiving a large number of unrelated requests. When you have 100 customers with 100 different feature requests it’s really hard to prioritize. If you have 100 customers with the same request, the product team can have a razor sharp focus. We should all be asking our scanner vendors for fewer false positives. Every false positive is a bug. Bugs should be reported.

Maybe positives

There’s another call of issues I’ve seen from security scanners that I’m going to call “maybe” positives. The idea here is the scanner flags something and when a human reviews the results they can’t tell if it is or isn’t a problem.

These are tough to deal with, and they can be dangerous. I bring it up because this problem created some trouble for Debian some time ago. DSA-1571 was a security flaw that resulted in OpenSSL generating predictable secret keys. The patch that caused the bug was the result of fixing a warning from the compiler. Sometimes warnings are OK, it can be very difficult to know when.

The reason I include this is to warn that fixing errors just to make the scanner be quiet can be dangerous. You’re better off fixing the scanners than trying to fix code you don’t really understand just to make the scanner results go away.

Low quality positives

One of the other big problems you will see from security scanners is findings that are positives, but they’re not really security problems. I call these “low quality positives.” They are technically positives, but they shouldn’t be.

An easy example I saw recently was a scanner claiming a container had a vulnerable package in it. The vulnerability didn’t have a CVE ID. After some digging I managed to find a link to the upstream bug. The bug was closed by the upstream project as “not a bug”. That means the findings from the scanner wasn’t actually a vulnerability (or a bug). One could argue the scanner may have added the bug before it was marked “not a bug.” I would argue back they should be double checking these findings to know when the bug gets fixed (or closed in this instance).

These scanners should be getting CVE IDs for their findings. If a scanner isn’t a CVE naming authority (CNA), you should ask them why not. Findings that don’t have CVE IDs are probably low quality findings. Actual security issues will get CVE IDs. If the security issue can’t get a CVE ID, it’s not a security issue.

False negatives

The last point I want to cover is false negatives. A false negative is when there is a vulnerability present in a project but the scanner doesn’t report it. It is inevitable that there will always be false negatives with every scanner, and it’s very difficult to figure out false negatives.

Something we can do here is to try out different scanners every now and then. If you only ever run one scanner you will have blind spots. Running a different scanner can give you some insight into what your primary scanner may be missing.

Wrapping up

If you take away anything from this post I hope it’s that the most important thing we can do about the currently quality problems with security scanners is to report the low quality results as bugs. I think we are currently treating these unacceptable reports with more seriousness than they deserve. Remember this industry is in its infancy, it will grow up, but without guidance it could grow into a horrible monster. We need to help nurture it into something beautiful. Or at least not completely ugly.

Part 2 is when we start to talk about specific types of scanners. Source code scanners are up next.

The Security Scanner Problem

Are you running a security scanner? It seems like everyone is doing it, maybe it’s time to get with it. It’s looking like automated security scanning is the next stage in the long winding history of the security industry. If you’ve never run one of these scanners that’s OK. I’m going to explain what they are, how they work, how we’re not using them correctly, and most importantly, what you can do about it. If you are running a scanner I’m either going to tell you why you’re doing it wrong, or why you’re doing it REALLY wrong. If you’re a vendor who builds a security scanner I assure you I understand there is a high probability I am indeed an idiot and don’t know what I’m talking about. I’m sure everything will be fine.

Automated scanning IS changing the world, but right now it’s not changing it for the better, it’s currently the security industry version of lead paint. The technology is still REALLY new, so it’s important we have proper expectations and work together to make things better. One of the challenges with new technology is understanding what you have now, and more importantly understanding what you need next. Like any tool, if you use it wrong it can make things worse than doing nothing at all. Let’s talk about how to make things better.

If you’ve never seen the sort of report an automated scanner generates you should probably consider yourself lucky. The best way to describe these reports is if you had a 10 page report that wasn’t very good, then you made 100 copies of every page, shuffled them around a bit and stapled it all together. There are some useful findings in the report, but they’re really hard to find. Expecting anyone to parse a 1000 page report for one or two findings has a terrible return on investment. It’s even less helpful if you send the report to someone else with unrealistic demands, such as requesting they fix all of the findings. By Friday. If you didn’t read the report, why should they?

There’s also the problem of incentives. Today a lot of report vendors talk about all the findings their scanner will … well, find. They fail to mention how many false positives are in those findings. In the case of security reports more findings is like bragging your house has the most lead paint. It’s not really a contest you want to win, and if you are winning you probably have a lot of work to do. That’s OK though, this is new technology trying to solve a REALLY hard and mostly unsolvable problem up to this point. Someday we’re going to look back on all this the same way we look back at food safety in the 1900’s. Asbestos was an ice cream flavor, motor oil counted as a vegetable, and nobody worried about where the meat came from.

There are a lot of moving parts in the scanner story, so I’m going to write a bunch of blogs posts to help explain the problem, explain what these scanners are doing, and finally what we can do about it.

The rough outline is going to look something like this

  1. Is your scanner running? You better go catch it!
    1. The quick win: I did something, and something is better than nothing!
    2. False positives
    3. Maybe positives
    4. Low quality positives
    5. False negatives
  2. Source code scanners
    1. I have software and nobody knows how it works
    2. The only thing harder than writing secure software is writing a code scanner
  3. Composition scanners
    1. Who let all this open source in?
    2. Updating dependencies is harder than not updating them
    3. Which of these security problems do I need to care about?
  4. Application scanning
    1. Scanning a running application is hard
    2. Scanning user interfaces
    3. Scanning APIs
  5. Which of these security problems do I need to care about?
    1. I ran the scan
    2. I was given a scan
    3. Falsifying false positives
    4. Parsing the positive positives
  6. What do we do now?
    1. Understand the problem we want to solve
    2. Push back on scanner vendors
    3. Work with your vendors
    4. Get involved in open source

Ending a blog post with an outline is pretty lame. Since writing good conclusions is hard work, I’m going to just link you to the first post of actual content in the series. I have a number of posts on this blog that talk about open source dependencies and the supply chain, I’m not going to torture you with links to all of them, I’m going to explain the problem with a slightly different angle in this series, if your time has very little value you can dig through the archives and see if there’s anything there worth reading.

Part 1: Is your security scanner running? You better go catch it!

Backdoors in open source are here to stay

Unless you’ve been living under a rock for the past few … forever, you may have noticed that open source is taking took over the world. If software ate the world, open source is the dessert course. As of late there have been an uptick in stories about backdoors in open source software. These backdoors were put there by what is assumed to be “bad people” which is probably accurate since everyone is a villain in some way.

The reactions I’ve seen to these backdoors range from “who cares I don’t use that” to “we should rewrite everything in house and in assembler and go back to using CVS on a private network”. Of course both of those extremes are silly, it’s far better to land somewhere in the middle. And as much fun as writing assembler can be, the linker is probably an open source project.

This brings us to the question what do all these backdoors really mean for open source? It isn’t going to mean anything in most instances. There’s a lot happening that’s not well understood yet, and no doubt we’ll see more changes in the future as we understand the problem better. I think there’s a tendency to try to overcorrect when something new happens, in this case I’m not sure we can overcorrect even if we want to.

The first and most important point is to understand that a huge number of open source projects are a couple of people who are doing this for fun. They’re not security experts, they will never be security experts. They’re also not going to adopt some complex security process. If they get a nice looking pull request they’ll probably merge it. Security isn’t on the top of their list when working on the project. The whole point of their project is to solve some sort of problem. While I’m sure many would love getting a few donations, it’s a steep climb to being able to work on your open source library full time. The reality is these project will always be hobbies.

Secondly, you can’t claim you will only use “trusted” open source. There are now a number of vendors who tell you if you come sit by their fire everything will be OK. They’re a safe space and the open source they have is only the finest quality artisan open source crafted by Himalayan monks, but only on Tuesdays because that’s the day the open source karma is best. I don’t think these vendors are trying to mislead, I think they’re just as confused as the rest of us.

Open source is like an enormous tapestry depicting an epic struggle between evil and something slightly less evil than the first thing. Even the big projects and vendors that everyone thinks have it all together are a part of this tapestry. Everything is connected to everything else, it’s open source all the way down. Sometimes it’s a library. Sometimes it’s part of the build system. Sometimes it’s a tool running on the developer workstations. You can’t ignore the point that everything is connected. Claiming to only use trusted open source is just as realistic as claiming you’ll rewrite it all in assembler.

So what can we do about this problem? The collective “we” probably can’t do much, but the tapestry of open source is doing something, it’s just not super obvious. It’s even likely it’s not even intentional. Backdoors are like insects chewing holes in our wonderful tapestry, how do we get rid of them? We don’t.

We can’t prevent backdoors

This is probably going to a controversial position, but I’m going to say backdoors are going to just be a part of open source. They are here to stay and we can’t stop them from happening. The things that make open source development work are the same things that let backdoors happen. Getting rid of backdoors means getting rid of open source. All the positives of open source drastically outweigh the negatives of backdoors.

I imagine this sounds a bit loony to some, but things are happening that should give us all hope. If you made it past the previous paragraph I’m going to explain why backdoors don’t really matter now, and won’t really matter in the future. A comparison here would be security vulnerabilities in software. We used to think we could get rid of vulnerabilities, we just need more training and forcing people to care. If we take this stance on backdoors we’re in for a decade of disapointment.

How did we find out about the last few backdoors? They were found by the community, generally pretty quickly. The mostly discredited Linus’s Law says “given enough eyeballs, all bugs are shallow”. While I don’t think that’s true, I would be willing to amend it to say “with open source bugs can’t hide for long”. A backdoor needs to hide to be useful. The open source community seems to be pretty good at finding backdoors. And more importantly is when a backdoor is found, it gets fixed usually in a few hours. Being fast is really important. Fixing security vulnerabilities, backdoors, and even bugs is a lot different when you can fix it in a few hours vs a few days.

As our infrastructures grow and evolve, as our development tools get better. As we pay attention to what’s happening in our applications like never before, we are seeing an evolution in computing that’s making a backdoor harder and harder to stay in place for a long period of time. This is what I mean when I say open source is doing something, but doesn’t exactly understand what. We aren’t doing these things to find backdoors, finding backdoors is a side effect.

Researcher are also starting to look for backdoors in open source. If you are a security researcher, start looking for these things. If you find a backdoor you’ll get a ton of free PR. There are some tools that can help do this today, we need a lot more. While some backdoors can hide, once we have more people looking and better tooling in this space, it’s going to get a lot harder. It’s always a game of cat and mouse. The defenders need to catch up a bit, I have no doubt we will.

It’s also important to point out in the case of open source we can find the backdoors. Claiming closed source is some magic bullet is even worse than rewriting everything in assembler. I don’t want to hear it!

Lastly, we can and should also account for these in our risk models. If we know backdoors will happen how will that change our behavior? It certainly should, a backdoor is a pretty big deal if you think about it. How will you architect your networks and applications if there is a chance a backdoor exists somewhere in the software? Anyone interested in this should think about it, implement it, and do some writing. I think it will be a very interesting space in the future.

The most important thing you can take away from this post isn’t that we should all just ignore backdoors. The real purpose of this is to help explain how this crazy thing we call open source is going to grow and evolve. Help it grow. If you’re a researcher, look for backdoors. If you’re an open source project, I guess look too, and keep doing whatever it is you do. If you’re an architect, account for backdoors in your risk models and talk about it. Part of the open source community is sharing what you know and learn. We have a lot of room to learn.

Appsec isn’t people

Recently there was a thread on Twitter I stuck my nose into about appsec and why it doesn’t work.

I have a response in there that I believe is a nice way to explain my biggest problem with appsec. I would sum it up as “Appsec isn’t people”. Here is a clever image to help.

appsec-isnt-people

You know you can take it seriously because the text is green.

The best way to think about this is to ask a different but related question. Why don’t we have training for developers to write code with fewer bugs? Even the suggestion of this would be ridiculed by every single person in the software world. I can only imagine the university course “CS 107: Error free development”. Everyone would fail the course. It would probably be a blast to teach, you could spend the whole semester yelling at the students for being stupid and not just writing code with fewer bugs. You don’t even have to grade anything, just fail them all because you know the projects have bugs.

Humans are never going to write bug free code, this isn’t a controversial subject. Pretending we can somehow teach people to write bug free code would be a monumental waste of time and energy so we don’t even try.

Now it’s time for a logic puzzle. We know that we can’t train humans to write bug free code. All security vulnerabilities are bugs. So we know we can’t train humans to write vulnerability free code. Well, we don’t really know it, we think we can if you look at history. The last twenty years has had an unhealthy obsession with getting humans to change their behaviors to be “more secure”. The only things that have come out of these efforts are 1) nobody likes security people anymore 2) we had to create our own conferences and parties because we don’t get invited to theirs 3) they probably never liked us in the first place.

I imagine a non trivial number of readers are concocting clever responses in their minds explaining why this statement is wrong in every possible way. Feel free to @ me.

But seriously, we’ve been at this for more than twenty years. Things really aren’t much better than they were back at the beginning. Nearly all forward progress has been the result of better tooling and languages and better process for handling security incidents. I’ve yet to see anything that really changes human behavior in a way that creates real progress.

I should mention at this point that I don’t think security training, or any training, for humans is a bad thing. Awareness is a big deal and it’s important. Training is also useful to learn new skills and ideas. As a professional we should never stop learning, it’s the most important thing we can do.

What I’m saying is that training won’t change behavior at scale. Humans will continue to human even after training, ESPECIALLY after training they didn’t want to take.

The future of appsec has to be technology. Automated tools that can detect vulnerable components. Languages that don’t suffer from memory corruption. Frameworks that make cross site scripting impossible. Operating system hardening. Those are the things that have created real change in the last two decades.

Yelling at people to be more secure never worked. It will never work.