How Machine Learning can Expose and Illustrate Network Threats
Although machine learning algorithms have been around for years, additional use cases are being discovered and applied all the time, particularly when it comes to network and data security. As years have passed, the skills and sophisticated approaches being utilized by hackers have risen in severity and frequency, and white hats as well as enterprise IT and security leaders must use every tool at their disposal to stem the tide of threats.
In this way, it’s only natural to deploy the latest techniques and processes to enable the safeguarding of key network components and critically sensitive data. Recently, machine learning has come to the forefront of IT security efforts, and researchers have identified several successful ways that machine learning tools can support overall IT protection.
Machine learning: The bigger picture
Before we take a closer examination of the ways in which machine learning is being put to use for network security, it’s important to have a foundation of understanding in place.
As defined by Trend Micro researchers, machine learning is a process that relies on the use of specialized technological tools that enable a computer to learn and utilize new information without the benefit of human intervention. Robust and intelligent algorithms allow a computerized platform to process and “understand” large repositories of information, pulling out results based on the data and patterns it observes therein.
“This system analyzes these patterns, groups them accordingly, and makes predictions,” Trend Micro explained. “With traditional machine learning, the computer learns how to decipher information as it has been labeled by humans – hence, machine learning is a program that learns from a model of human-labeled datasets.”
As the machine learning program repeats this task of identify and categorizing patterns and leveraging these for insights, it further “learns” how to best complete this objective without the crutch of human guidance or specific human-directed programming.
Current real-world use cases
Machine learning is being put to work all across different industries sectors, enabling stakeholders to learn from processed data and utilize these results in way that supports their mission.
Netflix, for example, has utilized machine learning for several years now to make more customized streaming entertainment recommendations for users. According to data gathered by Statwolf, the streaming company saved an estimated $1 billion with its use of machine learning.
Machine learning is also being used to support customer service capabilities, saving time and effort for human agents. Gartner predicted that by 2020, the vast majority – 85 percent – of all customer service interactions will be enabled by machine learning-aided chatbots.
Powerful machine learning tools don’t just support savings in customer service, though – Statwolf noted that about 12 percent of staff time for marketing professionals is lost through data collection. That’s equivalent to over five hours, or 11 working days over the course of a year. With the help of machine learning, though, this precious time can be won back and put to good use.
Machine learning in threat identification: Classifying network traffic
However, one of the biggest arenas currently for machine learning tools is within IT security, including to support efforts for pinpointing threats to the network and the sensitive data contained and accessed therein.
As noted in Trend Micro’s definition, machine learning is able to process considerable amounts of data, highlight the patterns within this information and leverage these for predictions and insights. This foundational process is well-suited for analyzing network traffic, and can help identify regular, legitimate traffic (including user activity) and separating this from suspicious and potentially malicious traffic.
Supervised machine learning
As Trend Micro noted in the research paper, Ahead of the Curve: A Deeper Understanding of Network Threats Through Machine Learning, this level of network traffic classification leverages supervised machine learning. In other words, while the machine learning tool is able to process and pull key insights from data on its own, it is also guided in the sense that human users “train” it as it how to process the data being input into the system. This type of machine learning model is improved through human-labeled data to support accuracy.
Unsupervised machine learning
While the above-described supervised machine learning can certainly enable the identification of potential threats through analysis of human-labeled, network traffic flow data, this information is not inherently labeled. In this way, this level of supervised machine learning requires considerable time and effort to support the process, as most data in the real-world – including network traffic flow data – is unlabeled.
However, unsupervised machine learning can also be utilized to support threat detection. In this type of process, unlabeled data is fed into the machine learning platform, which is then analyzed and classified through data clustering. The benefit of this type of machine learning activity is less necessity for time and guidance on the part of human users, as data doesn’t need to be labeled before processing. In addition, the results achieved through unsupervised machine learning analysis of network traffic flow data can be put to work in real-time to pinpoint zero-day and other new threats.
Identification of malware characteristics through cluster classification
Trend Micro Technology Researchers Joy Avelino, Jessica Balaqui and Carmi Loren Mora leveraged a combination of supervised and unsupervised machine learning – or semi-supervised – to demonstrate how the process can be applied to identifying threats within network traffic. Their goal was to process significant amounts of unlabeled network data in order to pinpoint key characteristics of current malware samples, and the potential relationships among them.
The results within this study, described by Avelino, Balaqui and Mora in the research paper, were illuminating. Through the use of machine learning and specific data clustering algorithms – including density-based algorithms DBSCAN and HDBSCAN – researchers were not only able to separate legitimate network traffic from malicious data flows, but were also able to identify the threats based on their analyzed characteristics.
“[T]he clustering model was able to find similarities in the network flows, allowing them to be grouped together,” Avelino, Balaqui and Mora wrote. “From the multiple characteristics seen in each malware family … the clustering model was able to identify which ones constitute a certain profile that correlates among the similar samples.”
The machine learning program pinpointed well-known threats from the unlabeled network traffic flow data, including the Rig, Flashpack, Neutrino, Blacole and Angler exploits, and leveraged a color-coded system to show and identify the individual characteristics of each. In this way, threats can be recognized and any overlapping attributes can be compared and analyzed.
“[M]achine learning plays a key role in the process of successfully clustering network threats,” Avelino, Balaqui and Mora noted. “Using machine learning for analysis vastly improves the speed at which data is organized and conclusions are obtained.”
Check out the research paper to learn more.
Read More HERE