The beautiful lies of machine learning in security

Contrary to what I have read, machine learning (ML) Not a magic pixie dust. In general, machine learning is useful for small-scale problems with large data sets available, and where patterns of interest are highly reproducible or predictable. Most security issues do not require nor take advantage of ML. Many experts, including people in The Googlewe suggest that when solving a complex problem you should Exhaust everything else Before trying ML.

ML is a broad set of statistical techniques that allow us to train a computer to estimate the answer to a question even when we have not explicitly coded the correct answer. A well-designed machine learning system applied to the right kind of problem can unlock insights that would not have been possible otherwise.

Example of a successful ML natural language processing
(NLP). NLP allows computers to “understand” human language, including things like idioms and metaphors. In many ways, cybersecurity faces the same challenges as language processing. Attackers may not use idioms, but many techniques are similar to synonyms, words that have the same spelling or pronunciation but different meanings. Some attackers’ techniques also resemble actions that a system administrator might take for perfectly benign reasons.

IT environments vary across organizations in terms of purpose, architecture, prioritization, and risk tolerance. It is impossible to create algorithms, ML or otherwise, that broadly address security use cases in all scenarios. This is why most ML implementations in security combine multiple approaches to address a very specific problem. Good examples include spam filters, DDoS or bot mitigation, and malware detection.

Garbage in, trash out

The biggest challenge in ML is having the relevant and usable data available to solve your problem. For supervised ML, you need a large, properly categorized data set. To build a model that identifies pictures of cats, for example, you train the model on many pictures of cats labeled “cat” and many pictures of things that are not cats labeled “not cat.” If you don’t have enough images or are poorly rated, your template will not perform well.

In the field of security, a well-known supervised ML use case is the detection of unpopular malware. Many endpoint protection platform (EPP) vendors use machine learning to label huge amounts of malicious and benign samples, and to train a model on what the malware looks like. These models can correctly identify mutating malware, evasion, and other scams where the file is altered enough to avoid signature but remains malicious. ML does not match the signature. It predicts damage using a host of other features and can often detect malware that signature-based methods miss.

However, since ML models are probabilistic, there is a trade-off. ML can catch malware that misses signatures, but it can also miss malware that misses signatures. That is why modern EPP tools use hybrid methods that combine ML and signature-based technologies to obtain optimal coverage.

Something, something, false positives

Even if the model is well designed, ML presents some additional challenges when it comes to interpreting the output, including:

  • The outcome is probable.
    The ML model produces the probability that something exists. If your model is designed to recognize cats, you’ll get results like “This thing is 80% cat”. This uncertainty is an inherent feature of ML systems and can make the result difficult to interpret. Is 80% a cat enough?
  • The form cannot be set, at least not by the end user. To deal with probabilistic outcomes, the tool may have thresholds set by vendors that break it down into binary outcomes. For example, a cat identification model might report that anything over 90% “a cat” is a cat. Your business tolerance for cats may be higher or lower than what the seller specified.
  • False negatives (FN), failure to detect true evil, is one of the distressing consequences of ML models, especially those that are poorly tuned. We don’t like false positives (FP) because they waste time. But there is an inherent trade-off between FP and FN rates. ML models were tuned to optimize the trade-off, prioritizing the best balance of the FP-FN rate. However, the “right” balance varies between organizations, depending on their individual threat and risk assessments. When using ML-based products, you should trust your suppliers to set the right limits for you.
  • There is not enough context to sort alerts. Part of the magic of ML is extracting powerful but random predictive “features” from data sets. Imagine that getting to know a cat is closely related to the weather. No one would think this way. But that’s the point of ML – to find patterns that we couldn’t find otherwise and to do it at scale. However, even if the reason for the expectation can be revealed to the user, it is often not useful in sorting out alerts or incident response status. This is because the “features” that ultimately determine the decision of an ML system are optimized for predictive power, not practical importance for security analysts.

Would “stats” by any other name smell sweet?

In addition to the pros and cons of ML, there’s another problem: not all “ML” is really ML. Statistics give you some conclusions about your data. ML makes predictions about the data you didn’t have based on the data you had. Marketers were enthusiastically drawn to themachine learningand ‘artificial intelligence’ to refer to a modern, innovative, and advanced technology product of some kind. However, there is often very little attention paid to whether or not the technology is using ML, let alone whether ML is the right approach.

So, can ML detect evil or not?

ML can detect evil when “evil” is well defined and narrow in scope. It can also detect deviations from expected behavior in highly predictable systems. The more stable the environment, the more likely the machine will correctly identify anomalies. But not all anomalies are harmful, and the trigger is not always equipped with enough context to respond. ML’s greatest strength is not in replacing but in extending the capabilities of existing methods, systems, and teams for optimal coverage and efficiency.