Data-centric security leaders from across industries have embraced Mean-time-to-Detection (MTTD) and Mean-time-to-Resolution (MTTR) as key metrics for measuring effectiveness in security operations. With these metrics rapidly becoming the standard, it’s important to define and unpack them mathematically.
MTTD is universally understood as the average amount of time that passes between when a threat first enters a network and when it is prioritized or dismissed as a viable event. For example, Detection typically occurs when a correlation on a suspicious signature or indicator generates events.
MTTR is the average amount of time it takes for a company to respond to a threat. Once the detection has occurred, the context around the signature or indicator can be used to determine the severity and nature of response.
For data-centric security operations leaders, the objective function of the mission can be precisely stated as:
Intelligence, as defined by the Data-Centric Security Automation paper and endorsed by the Cloud Security Alliance and others, is a critical piece in the toolkit of the security leader to accelerate these metrics. Intelligence can be used to drive detections, triage and investigation with context. If the primary purpose of intelligence for enterprise security is to accelerate automation in detect and respond operations, then the data-centric security leader is measuring the effectiveness of their investments in intelligence in terms of the proportion of unlabeled data that their intelligence sources are labeling. We call this ‘Coverage’.
Intuitively speaking, it would appear that the more indicators that intelligence sources cover, the more likely they are to detect events, and sooner. Simple, right? Not quite.
In 2020, researchers from universities in the Netherlands and Germany compared threat indicators from four open source threat intelligence feeds and two commercial feeds. They concluded that, even in tracking the same advanced persistent threat (APT) groups, threat intelligence vendors do not seem to collect the same data.
Focusing on 22 threat groups that both vendors claimed to be tracking, the researchers found, at most, a 4% overlap in threat indicators. "This raises questions about the coverage that these vendors are providing.” says Xander Bouwman, a PhD candidate at Delft University of Technology. “This is what we refer to as a market with asymmetric information." he said. "The sellers know what they are selling, but the buyers don't know what they are buying."
With so many threat intel vendors in the market, it becomes difficult to figure out who to buy from. It’s nearly impossible to figure out apriori which vendor’s data would create the most accurate detections for your organization. However, a discussion around coverage helps us focus on the questions that matter:
- Which threat intel vendors are the best suited for maximizing your coverage?
- What kind of coverage also minimizes MTTR?
The Role of Coverage in MTTD
MTTD = Total Time to DetectionNumber of Potential Events
Total Number of Events = Positive Detections + Negative Detections + Unknown Events
Covered Events = Positive Detections + Negative Detections
Coverage = Covered Events/Total Events
TTD, or Total Time to Detection, is the time it takes to convert unknown observables into IOCs which trigger an Alert for triage and investigation. This is what we want to minimize and this is where coverage comes in. With the definition of Covered Events, we can rewrite the equation for Total Number of Events as:
Total Number of Events = Covered Events + Unknown Events
Out of the total events, if we maximize the number of covered events - thereby minimizing the number of unknown events - we can have a direct and powerful impact on TTD.
According to the Ponemon Institute, the average dwell time for an attacker in 2020 is 191 days, of which the time between detection and remediation is 66 days. This leaves 135 days on average for TTD. Threats stay in a company’s environment for over 4 months. We cannot control how long it takes for an IOC to be first noticed and scored by a threat intel company; but we can control how quickly that information appears in your SIEM once that has happened.
So how does that impact TTD? By virtue of probability, the more prioritized indicators there are in your SIEM, the more likely you are to see an increase in correlations between indicators and your events. However, the solution isn’t as simple as increasing indicator throughput. It’s about finding the intelligence sources that provide you with the maximum amount of coverage.
Now let’s zero in on the Covered Events. Covered Events directly relate to the Total Number of High Priority Indicators. Let’s formalize some more concepts.
Total Number of Indicators = Covered Indicators + Unknown Indicators
Covered Indicators = Number of Indicators with Priority Score + Whitelisted Indicators + Blocklisted Indicators
Indicators Coverage = Total Number of IndicatorsTotal Number of Observables
That last definition is key. The denominator is the number of observables across all your security logs. The idea is to maximize the number of observables that have a priority score. Some will be dismissed as benign while others will become indicators. The more indicators your intel sources provide, the more correlations they will generate for you, the more detections that will occur which means fewer unknowns or threats that go undetected. Your MTTD goes down.
The Role of Coverage in MTTR
Before we understand the role of coverage in MTTR, let’s formalize some definitions:
MTTR = Time to RespondTotal Detections
Total Detections = True Positives + False Positives
Time to Respond= Time Spent Resolving + Time Spent Remediating
Going back to the Ponemon Institute’s numbers on Dwell Time, we saw that, on average, it takes over 2 months between detection and remediation. Another study from the Ponemon Institute revealed that security teams spend 25% of their time chasing after False Positive (FP) alerts. This takes up significant time and bandwidth which would be better allocated towards resolving real threats.
A high number of false positives can be the curse of high coverage. This raises an important point about the quality of coverage. Organizations have largely varying and unique intelligence needs which only certain threat intel vendors fulfill well. It naturally follows that coverage quality is also subjective.
Due to this documented subjectivity, it makes sense to allow each organization to hold the levers on coverage quality. Imagine a scenario in which you control how priority scores get aggregated across sources by dampening or completely doing away with sources that create too many false positive alerts. You get to throttle the noise and directly impact your MTTD and MTTR.
Realistically, some false positives may still make their way in; eating into Time Spent Resolving (TSR). Context holds make or break significance when it comes to resolving such alerts. Poor quality context or a complete lack of context available in-workflow and on-demand requires precious time spent manually hunting and pecking for enrichment. Quality, readily-available context can make this process increasingly automated, minimizing Time Spent Resolving.
Reducing MTTD does not have to come at the cost of increasing MTTR. Customized weights across sources and implementation of intelligence workflows to properly prepare and prioritize data upstream from your core detect and respond applications will prevent a high volume of FPs from flowing in and inject context where it is needed to triage, investigate and respond.
For more information on how intelligence management solutions like TruSTAR can help bridge the information gap and minimize MTTD and MTTR, see our blog: How TruSTAR Uses MTTD and MTTR as North Star Metrics.