What is Risk Storming? / Hebrew

What is Risk Storming? / Hebrew

It is a method that allows you to easily and quickly, collectively and visually identify risks in the system. The method involves the participation of several people. For a broader view of the considered system, the full composition of participants can include people from different areas and with different skills.

Method

The method itself consists of several consecutive steps. Let’s analyze each of them.

Step 1: Draw architecture diagrams

Since the method involves looking for risks in a ready-made system under construction, we need to see this system in some way.

Architecture diagrams are a great help in this regard. They will illustrate the main components of the system and their relationship.

Step 2. Individual search for risks

At this step, each of the participants writes absolutely any risks and problems that come to his mind on stickers.

This step is performed in absolute silence and must be limited in time. For example, 5-10 minutes can be quite enough.

Examples of risks can be any assumptions about the failure of something. Example:

  • The data format in the third-party system has changed

  • External services are not available

  • Data inconsistency

Step 3. We share assumptions

In this step, all participants of the session place their stickers on the architecture diagrams on the basis of which the risk was searched.

The stickers are placed close to the part of the diagram to which the described risks belong.

If several participants of the session described similar risks, the stickers of these risks are placed together.

Step 4. Prioritization

And at the final step, each of the risks found is considered and discussed collectively.

The purpose of this step is to determine how high a priority the potential problem found is.

There are several ways to prioritize. I will describe two of them:

  • Planning Poker is when participants assess risks using numbered cards and collectively discuss. Ultimately, the negotiations must reach a common understanding.

  • Using a special matrix, which is located on one axis Probability of occurrence (Probability) problems, but in another – hers Influence (Impact).

example

In order to see the method in action, we need an architectural diagram of the system. Since the approach is completely universal, I took an arbitrary scheme from the Internet:

Now that we have an architecture diagram, we can get started.

The next step will be an attempt by each participant to identify the risks possible in any part of the system. The risks are described on stickers and then the stickers are placed on the part of the scheme to which the risk described on it refers.

I got the following:

Let me remind you that the purpose of this method is not to calculate every potential potential risk. Of course, there must be limits to what is reasonable. “Shark attack on data center employees” is a good idea, but it is guaranteed not to happen (unless the data center is located at the bottom of the ocean).

The next step will be to collectively assess each risk found using a team-agreed assessment method. For the example, I will use the matrix method. Let’s try to prioritize the risk of uploading a giant file to our service (it should be clarified, by a giant file I mean any content whose size exceeds the limits of reasonable and acceptable for this system).

Downloading a giant file can force our system to use a lot of resources to process and store it. Moreover, let’s assume that our Image Storage is an S3 service from AWS, which means that storing huge amounts of data will cost us a lot of money. Therefore, I believe that the probability that someone will try to screw it up is not zero, so Medium, but the impact on the system’s performance and maintenance is high, that is, High. Having assessed the risk in this way, I placed it in the appropriate row and column.

Similarly, it is necessary to act with each of the remaining risks. As a result, we will get approximately the following picture:

Therefore, we obtained scores for each risk. With the matrix, it is already possible to draw conclusions about which of the risks should be tried to be processed faster than others. For example, an unlimited number of requests to the service can lead to a complete failure of the service due to a DDOS attack.

The list of risks with their evaluation will serve as an excellent introductory information for the formation of the technical backlog of the service team.

Additional materials

The original of this article, as well as many others on various IT and development topics, can be found on my website #fullstackguy.

Related posts