Josh welcomes back David Bernstein to talk about creating a disaster recover plan. It’s a very timely topic given all the current events. There are more supply chain attacks and compromises than ever before. There are some great resources for this planning, but as David tells us, it’s really not that hard to put some plans together. It’s easy to over-plan, David gives some great tips on getting started with our planning for an eventual incident.
Episode Links
- David
- Disaster Recovery Journal
- Continuity Insights
- Association of Continuity Professionals
- Disaster Recovery Institute International
- Business Continuity Institute
- Continuity Insights
This episode is also available as a podcast, search for “Open Source Security” on your favorite podcast player.
Episode Transcript
Josh Bressers (00:00) Today, open source security welcomes back Dave Bernstein. He’s a certified emergency manager and certified business continuity professional.
Dave is coming back on the show for what I think is going to be one of the more interesting episodes I’ve ever recorded. I have virtually no context, but after we recorded the last episode with Dave, where we talking about disaster management and emergency, no, disaster recovery, emergency management, whatever, I don’t know, I don’t know, I’m not the pro here. But I mentioned to Dave that I think it would be interesting to come back and talk about kind of what it means to create a disaster recovery or emergency management plan, as well as kind of what do we do to test that?
I told Dave right before I hit record, I am like 100 % unprepared for this episode because I don’t know what I don’t know. So Dave is here to educate and entertain and Dave, it is basically your show at this point. So like take it away, man. And thank you so much.
David Bernstein (00:54) Thanks for having me. So rarely am I given 30 minutes of soap box. I’ll take it. ⁓ But I have a tendency to rant and rave. So please, if there’s a thing you want to follow up on, if there’s a question you have, just interrupt and jump into it. ⁓
Josh Bressers (01:01) ⁓ Yes.
Yeah, yeah.
Don’t worry, I will be the invisible hand, right, to keep us in the direction.
David Bernstein (01:20) All right, so the way I’m thinking about this ⁓ is I want to start kind of really, really big picture about how you start thinking about emergency programming, crisis management programming, continuity programming, and then where you’re to dial into how do you actually get started on writing a plan? Because I think sometimes when I talk to colleagues, when I talk to clients, when I talk to other people who getting started on this, ⁓ even if they’ve written a lot of plans before, ⁓
it can feel so large, like how do we even get started? And so I think maybe talking about the first few steps you would take and then kind of expanding on that would probably be most helpful. Yeah.
Josh Bressers (02:00) I love it. And I want to add
one other kind of data point for you, Dave, is so in the last couple of weeks, there have been an enormous number of attacks happening against like open source maintainers, companies with open source software. And then obviously we get that kind of knock on effects of people running this software. And there’s a group called Team PCP that has claimed a bunch of the credit for this, but I feel like this is incredibly timely that we’re talking because
I have basically been working on a lot of this stuff nonstop for about three weeks now. And so I’ll be interested to learn what tips and tricks you have that I didn’t even think about or things I thought I was doing right and I’m not. So this is cool.
David Bernstein (02:41) Yeah,
absolutely. Well, and there’s a bigger concept at play here also that, obviously, as you’ve mentioned, the open source area has been ⁓ under attack pretty consistently the last few weeks. ⁓ But also, depending on where you look, you’re going to find that there is some emergency, some terrible thing happening in the physical or digital realm. ⁓ And that happens year round, which brings us back to this, what does this cycle look like, or how are we kind of
keep these things alive because one of the topics that I think I’ve kind of talked about before that I think will be a big theme in today’s discussion as well is the concept that all of these plans have to be alive. The risk profile, the way the world is working and accelerating, to think that these things are static in any way is think fundamentally misses the point of how we’re supposed to be planning and preparing for
these eventualities. So much of the world that we work in is not an if, it’s a when, even if it doesn’t actually look exactly like we thought it was going to look like. I think that still counts. ⁓ So the big topic, the big starting, big picture piece is that, ⁓ as I just said, everything is alive. The plans need to be updated, they need to be touched, they need to be revisited. And so what we find ourselves doing is establishing this continuous improvement cycle. And if you are going to
get started on any kind of emergency program or plan unless you are starting from literally zero, like this is the biggest thing that you want to make sure that you’re starting to get in place. And you start to see this a lot in plans or programs. They can talk about doing it rather informally, but if you are formally talking about management systems or continuous improvement programming,
then you might see it in things like ISO 22301 or some of the NIST requirements where they say, like, you need to have a plan, it needs to be updated annually, and you need to understand your risks. And so functionally, what this looks like, there’s a lot of different nomenclature around it, is that you have an opportunity to plan, right, understand your risks, understand your hazards, start creating plans around all of that. Then once you’ve established that plan, start to do training around that plan. So, all right, like, I wrote this plan,
extensively brought in the stakeholders I need to bring in. does everyone know all the assumptions that we made? Probably not. Let’s train them and get them familiar with what it says. ⁓ Then let’s exercise it or validate it. This is a little bit of a plan for our next one, which is we’re going to talk about plan validations in our next discussion. ⁓ But validate it, right? Did everything work out the way we thought it was going to work out? Did when we’re supposed to… ⁓
If we’re supposed to have something sequenced in a particular way, that actually happen? ⁓ And that validation is not a demonstration. A demonstration means that you’re showing off that everything is going right. Validation is not that. Exercising is not that. You want to make sure you’re actually testing it. Stress the plan. The point is to see when and how it fails. If you’re just showing everyone that everything’s hunky-dory, then you’re not actually
helping yourself. There’s a place for that, right? Like if you want to build confidence, there’s definitely a place for that. I don’t mean to say that it’s that, you know, that has no place in emergency planning. But like if you are meant to improve your programs, you’ve got to stress them and figure out where it fails so you know where to improve. ⁓ And then, you know, the last thing ⁓ is, if there’s a response element or anything else, then you take care of that. Otherwise, you kind of get into planning again. And so this continuous improvement cycle,
Josh Bressers (06:18) Yeah. Yeah.
David Bernstein (06:29) goes through like plan, do, check, act. The act is, once you’ve identified those improvements, you have to implement them. So ⁓ depending on who you’re talking to, that can fall in a couple of different buckets. But plan, do, check, act, ⁓ prepare plan, exercise, respond. There’s a, I think I actually just kind of mixed that one up in a couple of different places. But that continuous improvement cycle kind of runs in a couple of different, runs a bunch of different ways. ⁓
Josh Bressers (06:53) That’s okay.
David Bernstein (06:58) and you know if you’re following on the emergency
Josh Bressers (07:00) I mean,
look, this is like software engineering, right? Like it’s the same basic idea.
David Bernstein (07:04) 100%. Yeah,
nothing I’m talking about here is rocket science. It’s just different sides of the same coin, right? And the language that you use around it is going to be different. So if you use the ISO and the business continuity language, you’ll be talking about it one way. If you use the FEMA language, you’ll be talking about it a different way. FEMA, as opposed to CISA FEMA really talks about it much more in a physical realm as opposed to CISA in a digital realm. And so they have another piece of that continuous improvement.
Josh Bressers (07:28) Yeah.
David Bernstein (07:30) cycle where it’s organized and equipped. And so that’s like, you know, if you need to buy equipment or if you need to like get people trained on particular equipment, like, you know, like they include that. And so it’s a five part improvement cycle instead of four. But like, that’s really, I mean, that’s kind of the biggest difference. And, and as we talk about it, it’s really just, you know, it’s, I don’t want to be dismissive of it, but it’s kind of like this, just the minutia of how they’ve decided to talk about it, right? Like everyone’s talking about the same thing.
write the plan, train the plan, exercise it, and then make the improvements you need to do. ⁓ And so that continuous improvement cycle is pretty consistent through any kind of crisis management, crisis management programming. The piece that we’re going to talk about as we’re starting to write the plans, which I want to transition to now, is ⁓ the plan should not be a set of handcuffs. The plan should be a framework for you to work in because otherwise, ⁓
you’re going to find yourself responding to an incident that doesn’t align with the plan you’ve written. And then, like, then what are you supposed to do? ⁓ Right? It’s, ⁓ you have to give yourself the leeway to meet the expectations that you’re setting for everyone, as opposed to saying, like, things have to happen in such a regimented, structured way. Because, know, Josh, in the things you’ve done over the last few weeks, and certainly in all the incidents I’ve been a part of,
Nothing has happened in such a linear way in a way that I expected it. ⁓
Josh Bressers (09:02) Right, right. Well,
and look, that’s been part of the challenge over the last couple of weeks. It’s just been the fact that this is I I don’t have the language to describe this, but I’d say it’s like we all kind of knew these things could happen, but I don’t think we’ve kind of seen it go down exactly the way it did. Right. So there was definitely many instances of, OK, we need to do these things in this way, which is something we’ve never done before, especially where we like hardening some things in GitHub.
David Bernstein (09:06) Yeah.
Josh Bressers (09:31) who were making sure we were cataloging, you know, where our secrets were going and where they were being stored and things like that. And we basically had it before, but it wasn’t as thorough as we wanted it to deal with this particular incident. So yeah, there was like, there was a lot of kind of making it up as we go on this one, just because like necessity said we had to.
David Bernstein (09:51) Yeah.
Well, and at the pace that the world is becoming digitized and the networks and systems and good and bad actors that you’re working ⁓ with, you’re kind of always on your heels. And so we have to remember that all of these emergency plans are helping us to be more prepared, but they are not the silver bullet that’s going to resolve all of the problems that we have.
Josh Bressers (10:20) Yeah. And
it, by definition, we can only react, right? Like when you’re dealing with an emergency.
David Bernstein (10:29) Yeah, well, so yeah, so there’s a response element to it. Yeah, so, but you bring up a really good point, right? And so I think it’s worth discussing that a little bit. ⁓ There is something to help you prepare that will make your response more effective and better. ⁓ But I would argue against that you can only be reactionary. You can be responsive, for sure, right? Being reactive means that you are not putting a lot of…
Josh Bressers (10:32) I mean, you can be a little prepared.
Yes.
David Bernstein (10:56) at least the way I think about it. It means you’re not putting a lot of forethought into it. You’re just kind of like going piece to piece to piece or action to action to action. ⁓ And response, okay.
Josh Bressers (11:05) Okay,
think it’s a lack of vocabulary on my part. think being responsive is a better description of being reactionary. Because you’re right, you can be sort of ready, but the response you prepared for isn’t the response you’re going to have, right?
David Bernstein (11:10) Okay, sure.
Right,
100%. And the emergency plan is meant to give you the building blocks so that you are not trying to figure everything out in the moment. Information is coming in and you still have to do something with that information. But having a plan or at least having the relationships to build on for that plan give you the tools you need to make better decisions so that when you talk about the impact that you’re dealing with,
Josh Bressers (11:25) Okay.
Right, right.
David Bernstein (11:52) you have a sense of what’s going to happen from that and how you’re going to react or respond to that thing happening. But you’re coming also from a place of knowledge as opposed to just total hair on fire, which is not a bald joke, but I guess you could take it that way. I’ve been in so many responses that I don’t have any hair left. But you have these hair on fire moments where all you’re just trying to do is
figure out what’s going on and try and get all the building blocks in place to start getting you to respond ⁓ versus if you have your plan and you’ve exercised it, you may not have all the pieces figured out, but you have the parts in place to help you start making that response much, much more quicker than trying to set it up in the moment. I think that’s the key difference ⁓ between reacting and responding. ⁓ Yeah.
Josh Bressers (12:48) Gotcha,
gotcha.
David Bernstein (12:50) And this is not a hill I’m going to die on. This is just like the language that I use to talk about it. It’s not like the language of crisis management per se. Yeah.
Josh Bressers (12:59) Okay. Okay.
I mean, I’m okay with it. You can, you can die in whatever hill you want, but I, I, again, I don’t know what I don’t know. I’m very, I’m very aware of my knowledge gap here. So, okay. Okay. So David, you’re here to tell us how we get started building this plan. Like I’m very excited to learn. let’s, let’s jump into that.
David Bernstein (13:03) Yeah, of course
Yeah. So
I mean, you know, I used to have a colleague that would say, you how do you eat an elephant? And you do it one bite at a time, right? It’s kind of same thing here. I kind of, have you ever, you know, played the 20 questions, right? Is it a mineral? Is it something like who, what, why, when, where, and how, right?
Josh Bressers (13:35) Yes.
Only yes, no questions, right? Or are you thinking something else?
David Bernstein (13:39) Yeah. I’m thinking
of something else, but the point is, you probably have a list of questions you want to go through to start answering when you’re playing that game. Same thing for an immersion. Yeah, yeah, yeah, exactly.
Josh Bressers (13:49) Yes. you mean like animal, vegetable, mineral, I think is what,
like that, where you can ask like more than yes, no questions. Okay.
David Bernstein (13:57) Yeah, exactly.
The piece that I’m trying to get at ⁓ is, and part of it was just trying to come in at a lighter sense instead of coming in super strong, strong-willed on my soapbox. But you probably have a set of questions that you want to start asking people whenever you’re playing any of these games. Either it’s a checklist that you’ve gone through in your head, or if it’s a ⁓ mnemonic you’re trying to work through,
whatever it is to start gathering the little pieces of information that will tell you more about what’s coming. It’s the same thing for an emergency plan. The biggest first question is, why are we doing this? What are we trying to accomplish? Is there ⁓ a plan that already covers some of this? Is there another emergency plan that covers some of this already? Is ⁓ there an incident that we’ve had recently that we did not have a plan for?
Because if we understand the why, that’ll give us a lot of the direction for what we want to achieve from this plan. And that will help us understand also, is the plan really the thing we need to do? ⁓ Yeah. Sure.
Josh Bressers (15:08) Okay, hold on, let me stop and ask a clarifying question.
The way you’re talking, it sounds like we should expect to have different plans for different scenarios. There is no one catch-all plan.
David Bernstein (15:18) So if you were starting from scratch, you should have a catch-all plan. the catch-all, yeah. So if you have nothing, if you have nothing, you’re starting with nothing, then yeah, start with a catch-all plan. And the catch-all plan is really less about like, here is the type of incident we’re going to respond to. It is if anything happens. If anything happens that’s going to disrupt operations, like forget the reason why this is happening. Like who’s in charge?
Josh Bressers (15:24) Okay, gotcha, right.
David Bernstein (15:47) What are the roles and responsibilities that people have? How are we getting people together and how are we communicating? And what resources do we need to start doing that? That is just foundational building block stuff that all the other plans are going to start relying on. If we talk about this, and that’s a great question because I started jumping in, ⁓ I appreciate you pulling me back. ⁓ If we talk about this from the emergency management perspective or crisis management, this is like,
core emergency operations plan, core business continuity program. It is what we would call an umbrella plan. It is the plan that all other plans stem from. This sets the core expectation of how we’re going to respond and who’s able to start making decisions and how you’re going to start tracking all of these things. ⁓ If you are in the FEMA world or you’ve talked about an incident command model,
You see a lot of this coming down from FEMA and you’ll see it for, you know, continuity of operations or COOP planning. You’ll see it for emergency operations plans. They will give a lot of this kind of information for how recommendations for how you can organize yourselves. ⁓ Maybe a different discussion, but like, I feel very strongly that you should not take your organization and imprint it on top of whatever FEMA says you need to do. I feel very strongly that like,
You should take the principles that FEMA provides and then mold those to make them work for your organization. Otherwise, have 40 titles that you’re just trying to throw people into, which half of them don’t make any sense. ⁓
Josh Bressers (17:15) That seems fair.
Well, and look, let
me maybe clarify this a little more. Let’s make the assumption, anyone listening to us right now is in a situation, we’ll say similar to myself, where it’s a small company, we don’t have lots of resources, like FEMA is not on my list of places to go look for when I’m looking for guidance for my disaster recovery and emergency response, right? Because, I mean, well, first of all, we just don’t have…
We don’t have physical locations we’re concerned with, but more importantly is from my perspective, I’m more concerned with the like IT side of this, right? Where I’m, my concern is for example, for the last couple of weeks, like what do you do when one of your competitors gets like completely pwned and now you want to make sure you don’t get completely pwned in the near future, right? So like that’s the plan. What do we do now?
David Bernstein (18:17) Yeah. Yeah, totally.
Well, and I kind of talk about FEMA just because there’s so much information that’s out there. And while, you know, Josh, you may not use it and some of the listeners of this podcast may use it, it is out there and small organ, you know, from my client base, like small organizations are using it. I think, yeah, there are things there for you. ⁓ I think, but this goes back to like reactionary or not.
Josh Bressers (18:36) Well, should I use it? Okay, so there’s things there for me.
Okay.
David Bernstein (18:45) Just because your competitor is experiencing something doesn’t necessarily mean you need to do a knee-jerk reaction and prepare for all of that. If you don’t have anything in place to prepare for your response, constantly trying to figure out how you’re going to organize against these other threats means you’re always going to be on your heels and you’re never going to be able to catch up to it. So what I might recommend doing is take a step back and say, all right, well,
Before we get into this really reactionary mode, which is what we talked about before, maybe what we need to do is figure out just the core base. Any organization of any size can do this. How are we going to organize ourselves in a response? Who has the authority to start making decisions? Maybe it’s that first person that’s responding. Give them a ton of authority to start making decisions because they are the person that’s on the ground before they can start escalating issues. Or however you want to define it.
but start making those kinds of decisions around who can make decisions, how are they going to start contacting other people, how are you going to recognize and escalate issues. If you don’t have that part figured out, then the rest of it is going to fall apart really quick.
Josh Bressers (19:52) Okay, David.
Right. Okay. So let me ask about that. there, is there somewhere I can go to find, let’s say a framework or like a skeleton document or something that I can use to kind of help me, you know, start keeping track of all this stuff, right? Of the thing to fill out.
David Bernstein (20:14) Yeah, so ⁓ the great thing about the internet is you can find just about anything you want for anything. Just as great thing the internet, I don’t know if you’ve heard about it. ⁓ Or like all the AI models, the problem with some of the AI plans, some of the AI questions you might ask is they’re gonna give you an answer that they think you want to hear as opposed to like the right answer. But to your question of is there a place you can go?
Josh Bressers (20:25) Familiar with it.
David Bernstein (20:44) There’s a lot of great starting points that you can go for for plans and regulations and expectations around and best practices around this. Some of the business continuity organizations might be like DRI, ⁓ DRJ, Disaster Recovery Journal, ⁓ Continuity Insights, Association of Continuity Planners. There’s a bunch of groups out there that either have these professional practices that are freely available or can help connect you with people to help start pulling this together.
⁓ The thought process of how you’re going to provide that guidance is completely up to you. There’s no organizational guidance through these continuity organizations or through some of the federal programs that I’ve talked about that will say, is who should be making decisions and this is the leverage you should give them. You’re not going to find that. A lot of it is going to be based on how much trust do people have in their teams
Josh Bressers (21:15) Okay.
David Bernstein (21:44) to start making decisions on behalf of the organization when time is of the essence.
Josh Bressers (21:52) Yeah, yeah.
David Bernstein (21:54) Yeah, I will say, yeah.
Josh Bressers (21:55) Okay, and I
will have you send me links to those organizations and I will put them in the show notes for anyone who wants to follow up with that stuff, because I know like I’ve forgotten already the things you just said,
David Bernstein (21:59) Sure.
Yeah, I’m very happy to
provide all those. listen, if anyone has any questions, like they are free to reach out to me. And I’m very happy to answer any basic questions before we get into like, know, the need for consulting stuff, because at some point, like I need to feed my family on this information. So ⁓ we’ll get to that later on. But ⁓ yeah, so from an emergency plan or crisis management perspective, like if you can lay out those expectations, the rest of it is just figuring out.
Josh Bressers (22:21) Yeah, yeah. Right, right, right.
David Bernstein (22:34) what is the hazard you’re trying to respond to, and how do you orient everything towards that hazard? ⁓ This is where we start heading into a little bit of a double-edged sword. If you create too many plans based on too many hazards, it very quickly becomes impossible to implement. There’s just too much you’re trying to manage and keep on top of, because if you recall, all of these have to be living. You can’t just say, we’re going to create this plan for this very specific
Josh Bressers (22:53) Yeah. Yeah.
David Bernstein (23:03) type of hazard, and then it’s done and we never touch it again. People are going to forget about it. It’s going to be hard. It’s going to become impossible to implement, then you just set yourself up for failure. What I recommend doing is when we ask ourselves the question of why are we doing this plan, if the answer is because we don’t have anything yet, that tells you pretty directly, like, all right, so our first thing is we have to… The why of this plan is we have to figure out that base
the base response of our organization. If the why is, we have identified a new hazard that we’ve never come across before, it’s totally different than everything else we’ve defined or discovered, then that will help you reorient to whatever else you need to work on as well. But again, it’s that why, why are you doing it, is a ⁓ big ⁓ question to ask. And it also leads you to the question of, ⁓
A policy helps you set an expectation. A plan tells you how to implement it. If you have never set the expectation before, maybe a plan isn’t where you need to go. Maybe you need to set a policy or something first. Once you’ve asked the why, then you have to ask the who. Who do we need to bring this in? Who’s impacted by this plan?
or this expectation that we’re trying to implement. If it is a small group of people, then that’s fine. You need to collect those stakeholders. I would always, always advise bringing in a corporate champion or some kind of executive leader as well. They don’t need to do a lot of work. Just tell them you need money. Or it starts a time. They don’t need to take this on themselves, but having an executive there to help make sure that this work is prioritized and someone who
Josh Bressers (24:54) Yeah, yeah, right.
David Bernstein (25:01) intuitively understands how important this work is, goes a really long way as well. So you define the stakeholders, start getting them together so that when you start saying, this is how we’re going to implement it, you have people who can talk reliably about what people can accomplish and what they can’t accomplish and what the organization is willing to spend. So these are typically manager level, director level, sometimes VP level and on the titling.
varies a lot based on the organization and how they use titles and everything as well. But you want someone that is able to speak on behalf of the organization and who can make decisions about how their staff will be spending their time, especially in an emergency. And also someone who actually understands how those teams operate. Because if you get someone too senior, they’ll be like, I’ll direct all the money, but just like,
Josh Bressers (25:35) Right, right.
David Bernstein (25:58) John in data security like actually know how to do all of this stuff. Like you need manager who actually understands all of that. So.
Josh Bressers (26:06) Yeah. Yeah. And, so look, everything you’ve described, will, I will explain what I’ve been doing and you can maybe harshly judge. So it’s fine. I’m, I’m comfortable with that, but so right, we’ve got, incident with competitor. And obviously the first thing we do is I spun up a Slack channel and we collect the usual suspects. And this part is written down. Like anytime there’s an event, create Slack channel.
David Bernstein (26:16) I would never.
Yeah.
Josh Bressers (26:35) pull in these specific people plus anyone else you think you need. And then we start, you know, kind of focusing on timelines and to-do lists and things like that, right? Like, and I don’t create a lot of detail. Then we pull in the people we need. And you mentioned, you know, having someone who understands how this works is we have the individual who kind of will say like manages the engineering team, right? Like keeps everyone on task and keeps the machine running, so to speak. Like.
We pulled them in and they were super supportive, which is I’ll say one other piece of this is I try really hard to maintain like a very positive rapport with all of these people. Cause I know someday I’m going to need their help. And if they hate me, I have two problems now instead of one. So, you know, they come in and they were really good about saying, these are, you know, this person’s working on this thing. This person’s working on this thing. And we’re going to have to delay this feature requests because, you know, we just, we have to do this other stuff first. And it was.
David Bernstein (27:15) Mm-hmm.
Josh Bressers (27:29) I will say like the team I’ve been working with has been absolutely amazing, like picking things up. The other thing I’ve noticed that I will say in most incidents I’ve done in the past that I haven’t, I guess, I don’t know if I just didn’t notice or it hasn’t happened as much is because of the scope of what we’re dealing with. Everyone’s been really good about saying, why don’t we also work on this? And it could be like, yes, or let’s do that later, right? Like type activities, which I’m used to just being someone’s in charge, they’re barking orders.
people are doing what they’re told and there’s like zero input coming from the team because they’re just so busy running around like crazy dealing with it. So it’s been it’s been interesting to say the least.
David Bernstein (28:08) I’m sure it has been interesting. You said some really interesting things there that I think come back to the planning piece as well. for, if you’ll allow me, I’m gonna finish my thought and then I’m going to give you excess criticism. I promise that it’ll be light criticism because I think you guys are doing a lot right as well. Yeah. So when you get that plan set, set the, get the stakeholders in place, make sure the expectations are realistic. And then as you start planning,
Josh Bressers (28:20) Yes, please.
Bring it.
David Bernstein (28:36) If you say, is how we’re going to implement it, just get everyone’s feedback of, is that realistic? And then start laying out, all right, these teams have these objectives as part of this plan. This is how they’re going to do it. These other plans have these objectives. This is how they’re going to do it. Pull it together. And then like, that’s your plan. I don’t want to overcomplicate it. there’s no sense in having 100 page plans because no one’s going to read them, no one’s going to use them. At that point, like,
Josh Bressers (28:58) Yeah, yeah, for real.
Yup. Yup.
David Bernstein (29:06) The plan is really just the overarching theme of, this is collectively how we’re going to achieve this objective for implementation. then their own individual teams may have their own protocols, as associated as that, that’s on their teams. That brings us back to what you had said, Josh, about you create a Slack channel, you have it written out for who is going to be on that Slack channel, and then people just go until they…
and people are barking orders and they kind of go and then sometimes issues are brought up and then sometimes they’re accounted for and sometimes they’re kind set aside. ⁓ First things first, you had like alarm bells going off in my head when you said, ⁓ you know, everyone goes, people are awesome, but there’s orders being barked, people just kind of, you know, ⁓ implement it and kind of run on it. ⁓ And there is…
a really key component that I think I might have brought up also on an earlier podcast that ⁓ I don’t want to be dismissive of. want to think about this for a moment. And that is the resilience of your team is not the same as the resilience of your organization. If you totally burn out your team responding to this one incident, you’ve lost them for some period of time to respond to whatever else is coming up. You need to make sure that you supplant
Josh Bressers (30:29) Yes.
David Bernstein (30:30) the resilience of your team with a plan that will allow them to be nimble, that will allow you to cycle staff in and out so that you’re not burning all your staff all at once. There are definitely all hands-on deck situations, but when that stuff starts coming down, you still need someone to man the ship. So, yeah, you’ve got to be thoughtful about that. The other piece of it is there has to be any incident
Josh Bressers (30:52) Yes, 100%.
David Bernstein (31:01) some mechanism for bringing up new issues. And I would recommend that there be some formality to that process because even though you have something on fire that you’re dealing with right now, if you see the next one coming and it’s coming quick, it’s best to try and be proactive about that upcoming issue than ⁓ to…
put one fire out only to have another fire be just as big, you know, in the meantime. That’s kind of coming down the path. And so the nice thing about some of these really formal incident response organizational groupings or programs is that it provides that formal issue raising, if you will. And it doesn’t have to look like the federal programs. You can make one up that totally makes sense for you.
doesn’t really matter as long as someone can raise an issue and it’s taken seriously and someone in charge of the organization gets to properly prioritize it with the right information. Right? You know, if there’s a manager that says like, not right now I’m dealing with this other issue, like,
You can’t just let something drop if you think it’s really important. It needs to be sussed out by someone and then evaluated. Now, typically, if you’re elevating issues, it’s because you can’t resolve something yourself and you need more leadership input. But if you see something coming down the pipeline that’s going to affect a bunch of people, I think that totally meets that definition.
Josh Bressers (32:21) Yeah. Yeah.
I agree with that and I don’t think that’s something I’ve ever really thought of before is having a kind of formal process for the next thing on fire. I feel like in my experience, know, I’ve generally I’ve been at smaller organizations. We never had like tons of structure around this stuff, but we’ve just kind of, all right, you’re in charge of the new fire. So and so like good luck.
And that’s kind of what happened. Maybe we end up with another Slack channel. Maybe we’re using the same infrastructure. Maybe we have a new team that’s working on the new thing. I do, I can’t think of an instance where something that needed to be dealt with was dropped on the floor, but I can totally see that being a concern. Where if you’re right, if you have one group working on something critical and there’s something we’ll say maybe slightly less critical behind it.
They’re just not gonna touch it, maybe they should. I need to think about this, I like this.
David Bernstein (33:31) Yeah. Or
like we talk about this a lot in business continuity, like it’s less, it might be less critical in the frame of the incident you’re responding to right now, but is it going to become a more critical issue later on? Right? And so, you know, we talk about this formality and I don’t want anyone thinking that like they have to have a huge, like they have to buy E-Team, they have to buy like, you know, WebEOC or…
Josh Bressers (33:44) That too, yes.
David Bernstein (33:57) ⁓ or VHC or whatever it is, any these other tools to manage these incidents. That’s not what I’m recommending. What I’m saying is maybe it makes sense that every few hours you get all the teams together and just say, all right, how are things going? Is there anything coming up that we need to be worried about? Is the stuff that is currently on fire, is that being resolved? Does anyone need any issues or anything that needs to be leveled up right now? No? Great, everyone back to your corners to keep responding.
Josh Bressers (34:26) Yeah, yeah.
David Bernstein (34:26) It doesn’t have to be big,
but there should be a little bit of structure built in just to make sure that at least your leadership or the person who’s managing the incident knows the landscape of what’s happening.
Josh Bressers (34:37) Yeah. Okay. So if I’m going to sum everything I think you’ve said, and you can correct me if I’m wrong, is I’ve felt in the past, like proper and good disaster recovery and incident management of these plans kind of needed to be like pretty detailed and a big deal. But I feel like everything you’ve said is like, they don’t really need that. You need some basic structure, I guess, to get the right people in the right place at the right time.
And then you kind of, you go from there, right? Where, where over planning obviously doesn’t have any. And in fact, I’ve over planned a million times and never once has my over planning been useful in any way. Now that I think of it, like it’s always been like, it very quickly, I think. And I think the most valuable thing of all this is just making sure like the right people are talking to each other is it has to be, I mean, now that
David Bernstein (35:18) Yeah, you hit a point of diminishing returns for sure.
Josh Bressers (35:35) This is really funny because hearing someone talk about some of this, it’s like, it’s so obvious now, like just putting the right people together is literally step one. has to be step one. And they, they decide what step two is. Like it has to be that way.
David Bernstein (35:44) Yeah, it has to be.
Yeah. Yeah.
And I would say, I would add one other note onto your point, which is the level of complexity in your planning has to meet the organization. So if you’re a small organization that’s not super complex, you don’t need all this crazy planning. If you’re in a 50,000 person organization, you should probably get a little bit more in place. Yeah.
Josh Bressers (36:03) Yes.
And I’ll agree with that. think that
makes sense. And so the one thing I will add from that is like, worked at Red Hat many, many years ago and they were, they were pretty good sized when I left and I did a lot of, you know, security response work where there’d be a critical vulnerability and we had to, you know, triage and get the right people together and all that stuff. And I will say in that particular instance, there was a pretty well-defined number of steps we had to take because like we had to reach out to teams.
that were pretty big and there wasn’t like, like in my particular instance, like, you know, I’m, I’m at a small company. I, I know everyone’s first name, you know, I know, I know where they all live, you know, whereas when you’re in a company with, know, you’re right, like 40, 30,000 people. don’t remember how big Red Hat was at that point. It was pretty big, but like, I knew the people I was going to work with, I had probably never spoken to ever before that particular incident, which does change the dynamic considerably because
David Bernstein (36:52) Mm-hmm.
Josh Bressers (37:12) The other aspect of that is now I have to introduce everyone. You know what I mean? Like it’s very different than saying, all right, the five of us, we have to solve this problem, you know, right now. Like we all know each other. We all work together every day. Great. Let’s go.
David Bernstein (37:26) And that’s the soft benefit of the planning process is that you start to form those relationships so that even if you don’t know what’s happening and how you’re going to resolve it, you at least know each other and you know how each other works. mean, you brought that up, Josh. You don’t want to create enemies in this process, otherwise you’re going to have the hardest time ever. But if you at least know everyone and you’re… You don’t have to be friends with them, but you know what to expect from them, that goes… I mean, that’s one of the hardest parts of responding to an incident. It’s kind of just…
Josh Bressers (37:39) Yeah. Yeah.
David Bernstein (37:55) meeting people and knowing them.
Josh Bressers (37:57) Yeah. Right. Oh, for sure. 100%. All right. All right, David. So we’re kind of, we’re coming to the end here. What, what have we left out? No, it’s fine. Like what else, what else, what else do we need to know? Are there any pieces we missed? Is there something we need to go deeper on?
David Bernstein (38:00) Yeah. Right time, I know.
I don’t think so. ⁓ I think if I had to leave everyone with one thought, it would be this. There’s no need to overcomplicate the process. Keep it as simple as possible because when you are in a really dynamic incident, there’s only so much information you can hold in your head at that time. Keep it simple. Have really clear expectations and just work with people to resolve the issue and use the plans that you have.
Josh Bressers (38:32) Yes.
Yeah, yeah, for sure. Wow. Okay. So I just need to say, David, like, I feel like I hit record four minutes ago on this conversation. Like this has been a crazy, you’ve given me so much to think about and I love this conversation so much. And for anyone listening, I’m going to have David back in a couple of weeks and we’re going to talk about kind of, right, so let’s say we put our plan together. What do we do next? Because you do need to test this stuff. And I will say,
testing these plans. remember when I was young, I’d be like, this is stupid. This is such a waste of time. But as soon as you do it a couple of times, you have to just like ridiculous epiphanies in the process of being like, like the, my favorite, think we talked about this last time we were here was when I was doing my last disaster recovery kind of tabletop. And I realized like, if Google goes away, there’s literally nothing we can do and the business is basically screwed. like, I guess like,
David Bernstein (39:32) Yeah.
Josh Bressers (39:35) Everyone go to the pub. That’s disaster recovery plan. Yeah. Yeah. Right. There’s literally nothing we can do. So we. That’s right.
David Bernstein (39:37) Yeah, sometimes the end of the plan is just shut it down, and then wait for stuff to come back online.
Plans don’t solve the problem. Plans give you the tools to respond
until it’s resolved.
Josh Bressers (39:50) Well, but, but I will say the advantage to that is I know if Google like vanishes off the face of the earth for an extended period of time, there’s literally no point in trying to do anything about it. We’re just going to be like, well, that’s that everyone have fun. Like, well, well, no, no, no, they’ll come back. mean, if Google disappears in a way that they don’t come back, like modern society has collapsed, right? Like I’m going to be looking for cans of beans in a, in a pile of rubble.
David Bernstein (40:00) Yep. Rebuild on something else.
There’s a figure.
Josh Bressers (40:18) Like that’s the outcome of that. But all right. All right, David. So I want to thank you. This has been super cool and I can’t wait to talk to you again soon because this is, ⁓ man, I have so much to think about now. This is awesome. Thank you so much.
David Bernstein (40:31) Such a pleasure.