Risk Management
Abstract
Risk management is a critical component to any successful engineering project. Understanding, evaluating, and planning for risk properly enables a project to stay on schedule and avoid potential disaster. This article provides an overview of risk management that discusses the basics of risk management, including risk identification, risk assessment, and risk mitigation. Additionally, commonly used methods and practices are presented as potentially useful tools for the risk manager.
Introduction
We manage risk every day: the Wall Street investor sinking thousands into a hot commodity, the teacher showing a controversial movie to a class, the grocery shopper deciding to purchase expired goods at a discount, or the child getting into a roller-coaster for the first time. Risk is the possibility that an event that would cause negative impact to the risk-taker could occur. In the case of the Wall Street investor, the risk-taker is the investor, while the risk is the possibility of losing money on the investment. Additionally, the event could be the price of the commodity drops, and the threat is losing money. Threats can have minor to severe repercussions to the project’s goals and can vary in probability of occurrence.
Risk management is the process of identifying, assessing, and mitigating risks. Successful risk management is the difference between someone who gets lucky and someone who makes things work every time. As severity of repercussions and probability of occurrence get higher, skillful risk management becomes more important. In an engineering setting, severity of repercussions tend to be very high, such as a short circuit that renders a multi-million-dollar satellite useless. Due to the degree of severity, engineers must be trained in risk management in order to be successful in their work.
Theory
This article will focus on a three-stage process of risk management: risk identification, risk assessment, and risk mitigation. The first two stages, identification and assessment, are also collectively called risk analysis. As defined by Markeset and Kumar (2001, p. 119), “Risk analysis in general consists of answers to the following questions. What can go wrong that could lead to system failure? How likely is this to happen? If it happens, what are the consequences?”
There are numerous methods for each aspect of risk management and varying opinions on the effectiveness of each method. Each risk management scenario is unique, and the successful risk manager will adapt known methods that best suit the situation.
Risk Identification
Risk identification is the first and the foundational stage in risk management. Subsequent steps rely on the completeness of risk identification. Achieving absolute completeness by identifying every possible risk is impossible to ensure, which is why risk identification can be the trickiest to handle.
There are three elements that must be considered when in the risk identification process: events, threats, and sources. Events are possible occurrences that can end up negatively impacting project objectives, threats are the negative results of events, and sources are elements that are able to trigger events. For example, for a pilot landing a plane, the source of risk could be the weather, the event could be stormy conditions, and the threat could be losing control of the landing.
In order to identify as many threats as possible, it’s important to start from the sources and work down the chain through events and then through threats. Once a complete list of sources is compiled, it’s easy to create a list of events and threats and ensure that the lists are as complete as possible.
There is no method that will ensure complete lists, but being mindful of a few points can be helpful. First, risk identification should be focused on project objectives. Examine what the objectives of the project are and think of sources of risk or events that could impede the achievement of those objectives. Second, most projects, especially engineering projects, are similar to projects that have been completed in the past. By seeing what risks that predecessors have identified, what they missed, how they succeeded, and how they failed, you can easily complete a bulk of the risk identification with some basic research. Learning from the past can even help you with the assessment and mitigation stages coming up next. Third, keep the identification process well organized. Understand how the sources, events, and threats are linked and write out a chart or diagram that will help keep track of these relationships.
This is not the ultimate strategy to risk identification, but it is reliable and adaptable. But in certain cases, it may be important to break these guidelines and create a new approach that is more suitable for the specific case. For example, it may be easier for the Wall-Street investor to identify threats first—make money and lose money—before moving on to events and sources.
Risk Assessment
Risk assessment is the second and often the most straightforward step in risk management. Now that the risks have been identified, they must be organized according to severity of threat and to probability of occurrence. Often, not every risk can be fully addressed without compromising the overall goals of the project or without compromising the mitigation strategy for another risk.
For example, to minimize the probability of wires burning out, you could alter the materials in the wire or increase the gauge of the wire. Drawbacks could include increasing the cost of manufacture or increasing the risk of the components not fitting inside of the enclosure due to the larger size of the wires. The first drawback could conflict with a project objective of keeping manufacturing costs down or maximizing profit. The second drawback could force a consideration of risk tradeoffs. In both cases, data from risk assessment is used to come up with the best risk mitigation plan.
The most common method to risk assessment is to determine a composite risk index (CRI) by taking the product of the severity of threat and the probability of occurrence (Leung, 1996). For example, brake failure in an old car has a very high severity and high probability of occurrence. For the sake of this example, we’ll assign the severity as 8/10 and probability as 40%. The resulting CRI is 8*.4=3.6. The CRI can then be used to assign priority, with greater CRI being higher priority. This method is simple, straightforward, and can generate a priority list very quickly. The tricky part is determining the weight to use for severity of threat. The weight determines how much the severity of the threat increases the rank on the priority list. The ideal weight varies for each situation.
Investors may be able to swallow a 1% probability that all of their investments will become worthless due to a catastrophic event in the marketplace. However, engineers may not be willing to accept a 1% probability that the gas tanks in the car design they are working on will explode during operation. Therefore, they may want to consider creating a risk mitigation plan for this particular risk.
Risk Mitigation
Risk mitigation is the third and most dynamic step in risk management. At this point, the risk manager has at his or her disposal a priority list of threats and the corresponding probability, severity, event, and source of the risk. Using this information, the risk manager should decide on how to deal with each risk. Possible strategies in engineering settings usually fall under three categories: avoidance, reduction, and acceptance.
Avoidance is removing the probability of occurrence of a risk by avoiding an action or decision that could carry risk. Reduction, sometimes called mitigation or contingency planning, is reducing the severity of repercussions by having some sort of contingency plan. Acceptance is simply accepting the risk. For example, a software project engineer might want to add in a cool new feature into the program but the feature may break the rest of the code. Avoidance would be to not create the new feature at all, removing any risks involved with the change. Reduction would be to create a backup copy of the old code to ensure that a working copy will always be available if the new feature causes major problems. Acceptance would be to simply move forward with adding the new feature, judging that the potential benefits outweigh the risk. (Gonen, 2011)
There is no ultimate answer for dealing with a certain risk. The risk manager must be aware of all of the factors involved in ultimately achieving project goals and objectives. Just like any engineering design, there are always tradeoffs in the design of a risk mitigation plan. Often, the risk manager must compromise on a tradeoff between elements such as cost, risk, and benefit (Markeset & Kumar, 2001).
After a strategy is decided upon, it should be documented in a risk response plan (Gonen, 2011). Each risk that was identified and evaluated should have some documentation in the risk response plan on how each should be handled. Strategies and assessments tend to vary, as individual risk managers have differing opinions on how valuable certain objectives or critical risks are. This subjectivity means that depending on the risk manager, the risk response plan may differ and can be better or worse. Therefore, the risk manager must strive to be as objective and scientific as possible to maximize the effect of the risk management (Bristow, Fang, & Hipel, 2012).
After a risk response plan is complete, it must then be given approval by the appropriate levels of management (dependent on the project, risk, and mitigation strategies) who may require the risk manager to revise the plan. An approved risk response plan is a milestone for the risk manager as completing the risk management overall strategy. However, risk management is dynamic and the risk manager must reevaluate each stage the process periodically to ensure relevance, adapt to changes in risk and risk probability, and revamp in response to changes in project objectives.
Application to Senior Project
Risk management is useful at any project scale. For its senior design project, the Burgundy Team has been working on a project to track the location of devices in space in real time using RF. The Burgundy Team’s project, like any project, has risks that must be planned for in order to ensure the successful and timely delivery of a complete product. By using the three-stage process detailed in the Theory section of this article, the Burgundy Team greatly increases its chances of success.
As shown in Table 1, there are four sources of risk, five events, and six threats identified. This quick example matrix was developed starting with brainstorming possible sources of risk, followed by brainstorming of resultant possible events and threats. Another technique discussed in the section on Theory is to research similar projects and to copy the relevant risks involved there. This technique was also used, but such risks were already resolved at this stage of the project and thus were not included in this example.
Table 1
Sources | Events | Threats |
---|---|---|
ZigBee device | networking failure | triangulation failure |
Battery depletion | total loss of power | |
Main PC | hard drive failure | data and programs lost |
Environment | RF traffic within frequency range | interference |
Personnel | illness | software development delayed |
testing delayed |
Next, a risk assessment was conducted using the results from the identification step. The results were recorded in a similar matrix (Table 2). This project is very time-sensitive and thus highest assessment weights were given to project failure and project delay. Project failure was given the highest weight, 10 on a scale of 10, while project delay was given a weight based on the expected length of delay, scaled against the total duration of the project (e.g. 1 month delay for total 3 months for project = 3.3 on a scale of 10). Probability was difficult to determine exactly, so four qualitative levels of probability were used: highly unlikely (~10%), unlikely (~30%), likely (~50%), and highly likely (~80%).
Table 2
Threats | Delay (days) | Severity | Likelihood | Probability | CRI (severity * probability) |
---|---|---|---|---|---|
triangulation failure | 24.0 | 2.7 | unlikely | 0.3 | 0.8 |
data and programs lost | 7.0 | 0.8 | unlikely | 0.3 | 0.2 |
software development delayed | 5.0 | 0.6 | unlikely | 0.3 | 0.2 |
testing delayed | 5.0 | 0.6 | unlikely | 0.3 | 0.2 |
interference | 10.0 | 1.1 | highly unlikely | 0.1 | 0.1 |
total loss of power | 0.0 | 0.0 | highly likely | 0.8 | 0.0 |
Finally, a risk response plan (Table 3) was created. These particular threats had fairly obvious solutions, but in more complex projects some degree of creativity is required. Since the risk managers were also the project leads and engineers, there was no problem communicating technical and managerial issues. Additionally, there was no problem getting the response plan approved. On a larger and more complex project, the response to a risk is not clear-cut, good communication and collaboration between technical and non-technical personnel is required, and managerial approval is not always easily obtained.
As the Burgundy Team’s project moves forward, the risk management process will be reiterated and the records updated. This will give the team a realistic view of possible risks and will allow for response plans to be executed immediately following an event.
Table 3
CRI (severity * probability) | Threats | Strategy | Details |
---|---|---|---|
0.8 | triangulation failure | reduction | extensively research devices and similar projects to ensure that accurate triangulation is possible. |
0.2 | data and programs lost | reduction | keep a backup copy of all data and programs on an external hard drive and/or on a cloud server |
0.2 | software development delayed | acceptance | sickness is unpredictable and unavoidable. |
0.2 | testing delayed | acceptance | sickness is unpredictable and unavoidable. |
0.1 | interference | avoidance | in the unlikely case of interference, the project will be conducted in an isolated environment |
0.0 | total loss of power | acceptance | this is easily fixed by replacing the batteries–essentially zero impact on the project |
Cited References
- Bristow, M., Fang, L., & Hipel, K. W. (2012). System of Systems Engineering and Risk Management of Extreme Events: Concepts and Case Study. Risk Analysis, 32(11), 1935–1955. DOI: 10.1111/j.1539-6924.2012.01867.x
- Gonen, A. (2011). Optimal Risk Response plan of project risk management. 2011 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), (969–973). DOI: 10.1109/IEEM.2011.6118060
- Leung, H. K. N. (1996). A Risk Index for Software Producers. Journal of Software Maintenance: Research and Practice, 8(5), 281–294. Retrieved from Wiley Online Library
- Markeset, T., & Kumar, U. (2001). R amp;M and risk-analysis tools in product design, to reduce life-cycle cost and improve attractiveness. Reliability and Maintainability Symposium, 116–122. DOI: 10.1109/RAMS.2001.902452
Recommended Reading
- Goddard, P., & Davis, R. (1985). The Automated, Advanced Matrix FMEA Technique. Proceedings Annual Reliability and Maintainability Symposium, (NSYM), 77–81. OCLC WorldCat Permalink: http://www.worldcat.org/oclc/12675707
- Huang, D., Chen, T., & Wang, M.-J. J. (2001). A fuzzy set approach for event tree analysis. Fuzzy Sets and Systems, 118(1), 153–165. DOI: 10.1016/S0165-0114(98)00288-7
- Ulrich, K. T., & Eppinger, S. D. (2012). Product design and development . (5th ed.). New York: McGraw-Hill. OCLC WorldCat Permalink: http://www.worldcat.org/oclc/706677610
- Xiao-lin, L., Yan-xia, Z., & Zeng-hui, Z. (2010). Research on application of fuzzy fault tree analysis in the electronic equipment fault diagnosis. 2nd International Conference on Computer and Automation Engineering (ICCAE), (Vol. 2, pp. 65–67). DOI: 10.1109/ICCAE.2010.5451381