UEBA: Don’t Be Fooled by False Positives!

This is how User and Entity Based Analytics can process logs of big data, while surfacing only the cyberthreats that matter

false cyber security alert

The following is adapted from a presentation (see slides, below) by Interset’s CTO Stephan Jou, who recently received an ICSIC accolade for penning a seminal article about machine learning.


Detecting cybersecurity anomalies can be like diagnosing a human patient. Self-diagnosis isn’t always the most accurate option.

For example, when my child was born, she had what I thought were all the signs of plagiocephaly, or flat-head syndrome. With my sample size of one, and the easily accessible Dr. Google, I was sure she needed treatment to address this. I also had concerns that she had too many bowel movements, was spitting up frequently, had a horrifying rash, and had a high temperature. I was convinced she had a series of ailments, all requiring medical treatment.

Then I consulted an expert with a much larger sample size, who told me that my concerns were actually within the norm and nothing to be concerned about. In other words, these were all false alarms common to a first-time parent with limited sampling of the subject matter.

The first lesson of analytics is that you need enough sample sizes for conclusions to be valid. Similarly, in classical statistics, you must have a sample size large enough for it to be statistically valid.

When it comes to cybersecurity defense strategy, this must be taken to the next level. Here, you are not endeavoring to analyze what is normal—you want to analyze what is beyond the norm. The latter are the threats that matter. It’s a sign that you need more data to spot nuanced deviations (while avoiding false positives) in the seemingly typical behavior of a stealthy hacker doing reconnaissance and lurking as an APT inside your network.

Just as how human babies (or any human being for that matter) are different from each other, analytics need to account for the unique nature of each possible attack vector. As with a baby, rigid rules and thresholds do not work. To detect true anomalies of concern, parents must learn what is normal for their baby. From that baseline, they can look for, and quantify, deviations from what is normal.

The same principles apply to cybersecurity defense. You need enough samples to avoid a high false-positive rate.

When Interset does user and entity behavior analytics (UEBA) analytics, we make sure there is enough data and experience so that threats surfaced are the ones that matter, without creating false positives. SOC teams today have too much on their plates to go on wild goose chases and pursue dead ends.

For some customers, we process more than nine billion events each day through our big-data storage and processing architecture. As a result, our self-learning algorithms help SOC teams sift through the vast volumes of cybersecurity data, producing meaningful lists of prioritized threat leads on which SOC teams can focus.  

My parental alarms were triggered by a series of false positives. In the same way, your SOC teams should not become distracted by the false positives of rules- and thresholds-based systems.