The Titanic of security

I listen to a lot of podcasts. A lot of podcasts. I was listening to the Dave and Gunnar Show podcast episode 212 with guest David A. Wheeler. The Titanic was used as an example of changing process after a security incident. This opened up a flood of thoughts to me, but not for the reasons intended in the conversation. The point of the suggestion was the Titanic sinking created changes to international requirements to help avoid a similar disaster next time, and we should be viewing SolarWinds in a similar way. The idea being we should use the SolarWinds event to drive meaningful change to make security better. Why no change will come of this is a different conversation: TL;DR it’s because nobody important died from SolarWinds, the Titanic killed a lot of important people. But I think this is an interesting way to talk about how we tend to deal with problems in software and how we deal with them in real life.

I won’t bore you with a bad retelling of what happened to the Titanic. Historians have covered this topic at length and from every possible angle you can imagine. What’s not always covered is what happened after the Titanic had its historic voyage. Wikipedia has a page dedicated to changes in safety the Titanic caused. If you read this list it seems pretty obvious in hindsight. This is how many disasters work. There are a lot of little things that have to go wrong for an accident to turn into a disaster. If we could avoid even one or two of those little things, it can change the result dramatically, changing the story from “disaster” to “accident”.

So what does this have to do with software? When we have failures in the world of software it seems like we rarely change anything. How many organizations have changed the process due to Solar Winds? There was a lot of momentary panic, and A LOT of empty talk, but from what I’ve seen most companies have basically just gone back to doing thing the same old way. Most of the suggestions on what to do was what I like to call “security harder!” This is where leadership says they take security very seriously, but their actions tell a different story.

It seems like if we reacted to the Titanic in the same way we handle software incidents the solutions would be “Titanic harder next time”. We would have launched thousands of boats through iceberg filled waters with varying degrees of success. Every time a boat sank we just told the ship builders to use more Titanic the next time. They of course have no idea what that means, but as long as the checks clear they keep building boats.

I think we lack realistic understandings of what happened to SolarWinds. If you listen to the Dave and Gunnar Show mentioned above, the proposed solution to all of this is something called reproducible builds. I’ve written and talked about reproducible builds at length, they are a modern solution to a malicious build system the same way an iceberg satellite tracking system would have been a solution for the Titanic. It’s not a realistic option because the technology doesn’t exist. I hope it does someday, but it’s just not there right now no matter how hard we want it to be. The tooling doesn’t exist. If we want to have an honest conversation about reproducible builds it needs to be with the creators of build systems. And when I say “conversation” I really mean patches. If this is something society wants, we have to do the work. Talking isn’t work (says the guy with a blog). Anyone who has reproducible builds today have done it in spite of the tools, not because of the tools. The number of organizations and projects that can do this is a very very small number.

The obvious question about all this supply chain talk now is “what can we do?” And there isn’t a great answer to this question today. There is no shortage of opinions about how SolarWinds could have Titanic’d harder. But what there is shortage of is actionable advice the rest of us can use. If you asked someone how you could better secure your supply chain, you aren’t going to get advice, you are going to get guesses. There aren’t a lot of people who actually understand what a modern open source supply chain looks like. There are however plenty of opinions on how to secure a modern open source supply chain.

I of course know how to secure the supply chain because I have a blog named “Open Source Security”. This is sadly the closest thing to a real qualification that exists in this space. While I offer a great deal of advice on various open source topics, the most important takeaway is anything I talk about is verifiable by anyone. This is open source after all. You don’t have to take my word for it, go figure it out for yourself. If you find I’m wrong, let me know so I can change what I’m talking about.

So you don’t want to be the next SolarWinds, but you also don’t know what that means. I just wrote a blog post about not trying to manage your supply chain. Find someone else to help you. There are many companies and projects that are better at supply chain tasks than you will ever be (unless you’re in the business of open source software, don’t spend your valuable time worrying about it). But even that post misses some important points. “What should you do first” is a pretty important question for example.

Lately I’ve been imaging the work I have to do as an apple tree. There are some apples close to the ground, and some high up in the tree that will need a ladder to get. If you’re like any normal person, you pick the low apples first. I don’t even have to explain why (if you’re the sort of person who picks the apples on top first this is why nobody likes you). The work we have to do for making our supply chain secure is a lot like this. There are some tasks we can do right away because they’re the low hanging fruit. Low hanging fruit is easy to pick.

By doing the easy things first we get two huge benefits. The first is we are doing the work we need to do. We’re not talking about it, we’re not building some plan, we’re actually doing work. Work beats talk every day of the week. The second benefit is we start to learn more about the problems. Even if you do a task that ends up failing or not really mattering, you will learn something along the way. The more we learn the more we can decide which apples should be picked next. And when in doubt, just pick the apple closest to you.

Here’s a simple thing you can go do right now that can make a difference. Aqua Security have a scanner tool called Trivy. Go run it against your code (it can also scan containers). Here’s an example that scans the Signal Desktop app.

docker run --rm aquasec/trivy repo https://github.com/signalapp/Signal-Desktop.git

Of course, I’ve also talked about what this scanner output really means, so be sure to take it with a grain of salt :)

But the point is if you run this tool and look at the output, you are going to learn new things and have a different understanding for what you’re doing. It will help shape your future work and ideas.

And the absolute most important thing you can do once you start this journey is to talk about it. When you do something that works, tell the rest of us. When you do something that doesn’t work, tell the rest of us. We have enough talking heads giving out unrealistic advice. I want advice from people doing the work. I don’t need to hear what the guy with an expensive suit thinks. He’s the reason we’re in this mess.