Artificial intelligence (AI) and machine learning (ML) are all the rage. Specifically, large language models (LLMs) have become the current frontrunner in this space, especially with the release of ChatGPT as an interface into OpenAIs (Generative Pre-trained Transformer) GPT models. As Microsoft’s Security Trailblazer for 2023, we have (for some time) been exploring use cases for using the LLM tools Microsoft provides, which we can leverage to drive our modern SOC.
Bridging Machine Learning and Cybersecurity
These are Large Language Models that excel in interpreting, “understanding,” and predicting language. They are also pretty good at reasoning. So, what use cases does language and reasoning play in a Modern SOC?
First, let’s explore some common characteristics of problems you should tackle with ML:
- Complex: The problem is complex enough to warrant the use of something other than `If à then` logic, a simple algorithm, or a quick set of human eyes. Bonus points if the complexity and data are changing.
- Predictive: The ML model can make a “prediction” (as opposed to an absolute answer) to the problem.
- We approach cyber security with a “threat informed” lens. Adversaries have a say in the fight, so we are unlikely to absolutely determine their future actions. But we can predict what they may do with some likelihood.
- Given a history of a threat, what industries/companies are they likely to attack in the future?
- Inexpensive: The cost of being wrong is inexpensive. An example of an “inexpensive wrong answer” is search. If an ML engine returns a poor result for a user search, the user simply won’t click it. Not a big deal. As opposed to ML predicting military targeting, where a wrong answer could be catastrophic.
- Data Availability: We have access to data or information about the problem.
How to Apply in a Modern SOC
As a refresher, our “Modern SOC” is composed of:
- Intel Cell: Ingest, curate, analyze, and distribute threat intelligence for the organization.
- Current Ops (COPS): Manage and respond to alerts.
- Future Ops (FOPS): Plan, test, update, and automate SecOps processes.
- Adversary Team: Test defensive capabilities from an adversary’s perspective.
Intel Cell: CTI and Report Analysis
Attackers are continuously targeting new things, changing behaviors, emerging, changing teams, and responding to defenses. CTI analysts constantly learn and share new information about said adversaries. As consumers of threat intel, we can become overwhelmed with the number of bulletins, reports, analysis, feeds, and information coming across our desks.
These reports often contain nuggets of information we want to “mine.” For example, what MITRE ATT&CK techniques does a report mention? Sometimes, these can be difficult to extract using normal programmatic techniques. What if the report is in a PDF of tables? Or an image? Or the techniques are described, rather than explicitly indicated? Many security shops lack the human resources to dig into every applicable report, news feed, social media platform, and CTI feed and manually extract pertinent information.
Other times, we simply want to summarize a report to share or digest more quickly. LLMs do a solid job of synthesizing language into more concise forms.
- Is complex: These reports are often filled with technical jargon designed for other CTI analysts, organized in any number of formats, may or may not explicitly define information we want, and are written in human language, not a protocol which can be easily programmed.
- Needs a predictive response: Often, when reading these reports, we are not looking for absolute answers. We are more interested in developing a confidence interval around whether this threat actor belongs in our threat model and should be tracked. Questions we can ask are: does this report about a threat actor suggest potential future targeting of our organization based off past attacks or emerging trends/campaigns?
- Is inexpensive: The false-positive cost of analyzing a report is low: we are still likely to learn something. The false-negative cost is also low. It’s unlikely we will make high-stake decisions based on the output of a single report. These are often aggregated and drive tactical decisions which should be layered across other defenses.
- Has access to data: The large language models have been trained on an enormous corpus of human language. Because the reports are human language, there has already been significant relevant data used. And we have a constant flow of new reports for input.
Current Operations Cell (COPs), SOC, SIEM: Explanation of Malicious Code
Behind the scenes of these “chat” interfaces are models, some of which have been highly trained to understand code. They excel at both interpreting and describing code as well as generating code.
When we receive alerts where code is present, a script was executed, email headers are involved, etc. There is often a technical hurdle to overcome before we can adequately handle the alert. What was the code doing? Where did this email originate? These are the types of questions an LLM can explain back to an analyst, shortening the time to triage. Now, an analyst doesn’t need to understand Go or obscure PowerShell commands. Instead, they can simply read the human language describing what the code does.
- Is complex: code can be complex, even when intended to be clear! This is especially true for newer analysts who haven’t had time to experience multiple platforms, languages, etc. When adversaries write code, they often purposefully make it complex to disguise the intent. This can be confusing for humans, but LLMs make quick work of obfuscation.
- Needs a predictive response: In this case, we are using ML to predict what the code could be doing. It’s accompanied by line-by-line explanations which give an analyst the opportunity to make a final determination.
- Is inexpensive: Because this is augmenting an analyst and provides more of a step-by-step analysis, there are multiple opportunities for the LLM to be “wrong” and still end up with a good result. It’s more about deciphering the intent and possibilities of the code vs. the actual execution results.
- Has access to data: Again, these models have been trained on copious amounts of code and we will continue feeding it more code.
Future Operations Cell (FOPs), Detection Engineering, SOAR: Detection Analysis and Generation
Like the COPs example, some LLMs can turn human language into code. As an example, GitHub Copilot is a staple in our development processes for code generation, unit testing, and documentation. It can also be used in detection engineering. So, can we describe a desired detection in human language and have an LLM generate the code? With some creative prompt engineering, you bet we can!
- Is complex: Detection engineering is a complicated process often requiring experienced security practitioners to generate thoughtful queries which reduce false positives, while not creating false negatives. Too often, this results in layers upon layers of detections which can become deprecated or create noise.
- Needs a predictive response: We recommend using this process as a starting point for detections, not as a final solution. The code produced is a time-saver and can get an analyst moving in the right direction but is dependent on the quality of the prompt. That’s why we build tools to assist our Detection Engineers with automating prompts and simplifying the workflow.
- Is inexpensive: This is a focus on augmenting our analysts and code completion is often used to save time, not necessarily generate completed, production-ready code. Good prompt engineering in the form of comments can improve the quality of the output and get closer to the real intent. It’s very easy to test the code generated (via a query validator or sandbox) and adjust as necessary. We are huge proponents of test automation, pipelines, and are in progress building fully automated solutions for this exact problem.
- Has access to data: The models have already been trained on query languages. We expect Microsoft will continue to feed and train the models using KQL which can then be used to produce higher quality detections for Sentinel or M365 Defender.
These are just a few of the many possible examples of using (a small subset of) ML to augment a modern SOC. Note that in none of the cases did we eliminate humans from the loop, rather, we can augment to help them perform more efficiently or effectively. Stay tuned for more posts where we explore each of these cases more in depth and continue to implement ML into our Modern SOC!
If you’re interested in determining threats most likely to target your business, reach out to us at firstname.lastname@example.org for more information.