Teach Your AI Well: A Potential New Bottleneck for Cybersecurity

October 8, 2018 TH Author

Artificial intelligence (AI) holds the promise of easing the skills shortage in cybersecurity, but implementing AI may result in a talent gap of its own for the industry.

Ann Johnson is leaning forward from her seat in the lobby of a tourist-district Hilton as she shares her excitement about the promise of AI. “Microsoft sees 6.5 trillion security signals a day,” she says. “AI helps rationalize them down to a quantity that humans can deal with.”

As corporate vice president in Microsoft’s Cybersecurity Solutions Group, Johnson spends a lot of time thinking about tools to make human security analysts more effective. “The goal is to reduce the number of humans required – since there aren’t nearly enough humans to do the work – and automate simple remediation, leaving humans to do more complex work,” she explains.

The shortage of qualified security analysts is an issue that the IT security industry has been dealing with for years. There is little question that technology tools – from better analytics engines, to increased automation, to artificial intelligence – are seen as methods for dealing with the shortage. But will the fact that artificial intelligence, like its human analog must be carefully trained, limit its ability to help the industry out of its expertise deficit?

A Blank Slate

Whether the technology is labeled artificial intelligence or machine learning, it is almost never a “one size fits all” proposition. Michelle Laurence, a data scientist at the University of Nebraska Institute for Applied Science, explains: “Every client’s environment is different,” she says. “The machine learning algorithm needs to learn on the client’s data.”

And every AI engine deployed in a real-world situation must be trained on environment-specific data, whether the AI is looking at a problem as narrowly defined as stopping phishing messages – or as broad as a generalized SIEM.

Watchguard, for example, uses AI as part of its anti-malware protection product. “There’s definitely a ‘big data’ aspect behind the training, as we’re training machine learning algorithms,” says Corey Nachreider, Watchguard’s CTO. “We gather millions and millions of files that, over time, have been known to be bad; and millions and millions of files that have been known to be good and we throw them through these various types of machine learning algorithms.”

These millions of files are where AI can slow down, because the files don’t appear by magic. Someone (or a team of someones) must choose the files and make sure that they’re in a format usable by the AI engine. And someone must make sure that the files chosen train the AI to do the right thing.

Intelligence v. Learning

Training is the critical ingredient in AI, but is it also crucial for machine learning (ML) success? For that matter, is there a meaningful difference between AI and ML when it comes to their application in security?

“When people use ‘AI’ in cybersecurity, more often than not they are referring to the application of machine learning, either unsupervised or supervised, for various tasks,” says David Atkinson, founder of Senseon and former commercial director of Darktrace. He explains that, for him, machine learning is where the engine is looking at a known set of data, and through analysis produces a predictable range of outcomes. Artificial intelligence, on the other hand, can produce outcomes that lie completely outside the range of predicted results because its techniques can go beyond those reachable through linear algorithms.

It’s not just that commercial security companies haven’t developed the technology to do “true” AI – in most cases customers wouldn’t be comfortable with the possibilities AI presents, says Ariel Herbert-Voss, a Ph.D candidate at Harvard University. “We like to have things that are interpretable; you want to know why your algorithm is making a particular decision because if you don’t know, then you might be making some terrible, horrible decision,” she explains.

Whether your preferred term is AI or ML, training is as vital to success as the specifics of the engine, both to increase the chances of application success and to keep the AI from turning to evil purposes.

At the DEF CON hacker conference back in August, University of Nebraska’s Laurence spoke in the AI Village about techniques for mis-training an AI engine – and how those engines can be protected. Because both AI and ML engines learn from the data they receive, she said, flooding one with bad data results in bad lessons-learned. Perhaps the most famous example of this was the infamous “Tay” chatbot Microsoft created in 2016. Designed as a machine learning bot that would learn to converse with humans on Twitter, Tay was taught to be an abusive racist in less than 24 hours in a concerted effort to feed it bad information.

Laurence pointed to the stickers recently developed by researchers at Google that tricked image recognition systems into seeing toasters where none existed. The key, she said, is in understanding how the AI system learns, and the factors it uses for recognition.

Sven Cattell, a researcher at EndGame, explains how that understanding can be so difficult to develop. For example, consider the dimensions of objects in the theoretical space where they exist – a space that may have many more dimensions than the four-dimensional world humans inhabit. Most people are quite comfortable thinking about the three dimensions in which we move, plus one more for time. Four dimensions is what we’re taught in geometry and trigonometry classes.

But these four dimensions used by human brains to figure out virtually everything about our environments can be insufficient for machine-based AI. AI engines may need to resort to representing objects in many dimensions — hundreds, in fact. Humans who have to build and train the AI then must employ such an advanced mathematics model to train AI to analyze tasks that range from visual recognition to treating the multiple dimensions of potentially malicious human behavior. That’s why training the AI — building the models the system will use to understand and act upon — is a rigorous discipline.

Moving the Bottleneck

The paradox this creates is that the AI seen as a potential fix for the shortage of trained security professionals yet there’s a shortage of skilled AI trainers.

“At the end of the day it’s people building these systems, and it’s people maintaining these systems, and it’s people using these systems,” says Harvard’s Herbert-Voss. “You have very few machine learning professionals that can handle and clarify and gain meaning from the data, right? So in [my] presentation [at DEF CON] there was a number of 22,000 professionals worldwide as estimated by elementAI that can perform research in this area,” she says.

And as the discussion progresses, those professionals may be as much a limiting factor as anything else on how quickly AI can rescue security from its talent deficit.

It’s not just a question of throwing bodies at the problem — they need to be the right bodies, notes Microsoft’s Johnson. “We have learned that volume isn’t the key in training,” she says. “Diversity is the key in type, geography, and other aspects. That’s important because you have to have non-bias in training.”

Bias can include things like making assumptions about gender, social network profiles, or other behavioral markers, and items like looking for a specific nation-state actor and missing actors who are from other areas, she explains.

Even with the difficulty in training AI and ML engines, though, machine intelligence is increasingly becoming a feature in security products. “We are incrementally building the presence of AI in security,” Johnson says. “It’s not flipping a switch.”

One of the issues around human resources is that the people with expertise in security and the people with expertise in training AI engines are rarely the same people. “Data scientists don’t have the subject matter expertise about which devices are vulnerable or why are they’re acting in a particular way,” University of Nebraska’s Laurence says. “These are questions that most data science professionals don’t have answers to. And then on the flip side, cybersecurity experts they have all of this data and they don’t understand how to train the machine learning algorithms to get alerts, or get additional automation to reduce their overhead or their labor.”

Ultimately, she says, “I think it’s kind of an amplification effect, where you have one group of subject matter experts and another group of data science experts – both of which the talent pool is lacking.”

Hybrid Win

For Chris Morales, head of security analytics at Vectra, the answer to both shortages is an approach in which AI augments human effort rather than seeking to replace it.

“Machine learning allows us as defenders to adapt much more quickly in real-time to threats that are constantly changing,” he says. “What machine learning is good at doing is learning over time and adapting. As environments change, the machine can start to change.”

Morales explains his thinking. “The threat constantly changes and adapts; and if we have a changing landscape, and we have a changing threat, we don’t know what’s going to happen next.”

But machine learning is well suited to those dynamic environments. “I think that’s going to continue to be true that machine learning allows us, as defenders, to adapt much more quickly, in real time, to threats that are constantly changing,” he says.

Related Content:

Black Hat Europe returns to London Dec 3-6 2018 with hands-on technical Trainings, cutting-edge Briefings, Arsenal open-source tool demonstrations, top-tier security solutions and service providers in the Business Hall. Click for information on the conference and to register.

Curtis Franklin Jr. is Senior Editor at Dark Reading. In this role he focuses on product and technology coverage for the publication. In addition he works on audio and video programming for Dark Reading and contributes to activities at Interop ITX, Black Hat, INsecurity, and … View Full Bio

More Insights

Leave a Reply Cancel reply