Hunters after ransomwares

Posted on
ransomware windows logging azure sentinel kusto kql hunting detection sysmon

Ransomware is one of the biggest buzzwords nowadays in security. Vendors are advertising their security products by telling it can stop ransomwares, but also on the other side of the field, ransomwares, ransomware kits or services are selling pretty well. Over the last year, one could read an article every month about how ransomwares are not relevant now but also about the rising and more and more sophisticated ransomware attacks. This article is not going to be a general description, it is rather going to be about some specific behavior of ransomwares I observed lately that can help us detect and hunt for these nasty things.

featured

Not all ransomwares are the same. All of them tries to take hostages and force the user to pay for them. But there are multiple ways to do this. One of the types simply locks the user out of the machine. Nothing else is touched on the machine, but the user won’t be able to log in again thus taking the machine hostage. The second version steals your files and threatens to leak them if you do not pay a specific amount of money. A third version destroys the file on your machine but tells you it is encrypted and tries to make you pay for the encryption key. The last one I want to mention here really encrypts your data with a secure encryption algorithm and only decrypts it for you if you pay the ransom. Because this last one is the most famous and notorious one, I’m going to dig deeper into this type (and somewhat touch the 3rd version as well).

Is it worth detecting?

While I can see a lot of ransomware attack in the news (and these are just the disclosed ones), still a lot of companies don’t focus on the detection of these infections. They rely on their anti-malware or EDR system which are frequently ineffective, instead of trying to create some rules for themselves or create threat hunt scenarios to find them. Using solely third party solutions without any modification is not a good idea. To increase the TP detections and decrease the FP ones, a company has to adjust these tools/detection logics to its need. If you have a rule that triggers on specific malicious traffic, but in your network the same pattern is normal, then there is no way you could get away with that FP-heavy rule as a detection engineer without adjusting it. The queries I’m going to share in this post are really general ones. I tested them in multiple environments, in some of them they worked without any further fine-tuning, in other networks they needed some adjustment or whitelisting. For example, the OneDrive process created a lot of FPs in one of the networks, but in another one OneDrive was turned off by policy, so it couldn’t mess with the detection. I recommend you to look around in your network with the provided searches but do some modification and do not rely on them out of the box.

But why do we want to detect ransomware or organize a threat hunt session around it? - could you ask this question. Well, maybe You wouldn’t ask it but I’ve heard the same questions from managers with the following reasonings. (Be aware, I disagree with the upcoming sentences.) When the ransomware is detected it already did the damage, possibly even finished its actions. A ransomware is not a subtle infection, it doesn’t try to hide, it wants to make sure the user knows about it, otherwise nobody will pay the ransom. At this point, a user can report the ransomware as well and a user will generate less FP than a rule. So, no reason to create a detection. Threat hunt also doesn’t make sense because it tries to capture something that is already in the network, possibly for a long time. If a ransomware is there for a long time then it is already too late, and possibly noisy and conspicuous enough to raise some tickets (manually). Thus, the Incident Response team should be also aware of this.

Well, this was the opinion of some managers I had the chance to talk to lately. However, my experience based on real-life scenarios and my testing tells differently:

  1. I encountered a user who hasn’t reported a ransomware infection rather just sent his machine to re-image to the helpdesk. He didn’t report the issue because he was afraid, he did something wrong. This machine has been found during a threat hunt around ransomwares using the old logs in the SIEM. While users can report a ransomware attack, it is not going to be done every time. This is especially true if you punish your users in case their machines became infected. They will be afraid of the retaliation and try to hide the problem.

  2. Make sure the helpdesk knows how to report similar attacks to the security team, or you can just have a field in the ticket that can be filled by them and you can query it so you can be notified of attacks like this. I haven’t seen issues like this in case of ransomware infections, but I have seen a lot of communication problems between the helpdesk and other teams. Helpdesk is the first point of contact for the users so sometimes they will know about an incident before the SOC.

  3. During my test, I also found some ransomware samples which crashed in the middle of the execution. Their outcome was different, in some cases, nothing happened on the machine, in other cases, some of the files were encrypted in a subfolder, but not everything was touched. In a situation like this, there is a chance that the user isn’t going to realize the infection soon because the encrypted files/folders are not frequently used by him. For example, the Hydra ransomware crashed on my machine after a really short period, I checked some folders randomly and I did not see any changes, any encryptions. Only later on did I realize when I checked the logs that Hydra has encrypted a lot of files in some folders and created a lot of ransomware notes, but it was so deep in the folder structure that I just haven’t realized it.

  4. Ransomwares are pretty quick, they can encrypt a whole machine in 5 minutes in some cases. You have to react quickly to prevent any further wrongdoing, and for this, you have to create some automated incident response logic. This is only feasible though if you have detections in place as well that could trigger the IR actions.

Detecting a cryptor ransomware

To create detection logic first we have to understand how a ransoware works. A ransomware is similar to any other malware in a lot of its behavior but has some unique ones as well, and here I’m going to focus on these special things.

  1. Initial Access:: First every ransomware has an Initial Access phase. However, this is the same as in case of other malwares. It can infect a machine via phishing, or it can exploit a vulnerability. However, you won’t need any specific ransomware-related detection to catch this activity. Because this is not ransomware specific, I’m not going to go into details.

  2. Encryption: The most typical step is to encrypt the file (or fake encrypt it) so the user won’t be able to access them.

  3. Notification: Notify the user about the encryption and demand ransom. This is also a typical step, without this, the user won’t know what happened on the machine and the attacker won’t be able to get the money.

  4. Prevent restoration: If the goal of the ransomware is to encrypt the files, it also has to make sure that the files can’t be restored. It can prevent any further backup or overwrite/destroy the existing backups to prevent restoration.

  5. Propagation: Propagate to other machines. Not specific to ransomware infection, but one interesting activity I’m going to point out.

The steps don’t have to happen in this specific order; the order can be different, or the actions can be carried out simultaneously.

I did my tests with Sysmon installed and configured on my machine. Unfortunately, Sysmon does not pick up every action which a ransomware can carry out. Having different logging, Windows File System Auditing turned on, or EDR installed on a machine can provide some additional data. Because of this, some of the logic couldn’t be implemented by me based on Sysmon logs. In these cases, I describe the logic and provide a query based on an imaginary log source (which is similar to a lot of EDR logs though). The queries are written and tested in Azure Sentinel KQL.

File encryption

The main goal of the ransomware is to make the file unusable. If this is a normal encryption, the ransomware has to read the file, encrypt its data, save the encrypted file, and destroy the original file.

  1. Class 1: The original data is replaced by the encrypted data in-place. For this, the data is read from the file (file read), data is encrypted, then the encrypted data is written back to the file (file content modification) and finally, the file is closed.

  2. Class 2: Still an in-place replacement, but the file is copied (or moved) to a different folder where the in-place encryption happens. Then the encrypted file is moved back to overwrite the original file (or to take its place).

  3. Class 3: The ransomware reads the data from a file, but stores and encrypts it in a new file. In the end the original file is removed (after closing it).

So, in general, the following things can happen during encryption:

  1. Batch file read (Class 1/2/3): Sysmon doesn’t log access to files, or file reads.

  2. Batch encryption (Class 1/2/3): No specific encryption event is logged by Sysmon.

  3. Batch file content modification (Class 1 and 2 can use it): If the file content is directly modified then it is not logged by Sysmon.

  4. Batch file overwrite (Class 1 and 2 can use it): Overwrite is handled as file creation by Sysmon so it is logged as EventID=11.

  5. Batch file move or copy (Class 2 and 3)

  6. Batch file removal (Class 3 uses it, Class 2 has a non-direct removal (overwrite)): Sysmon from version 11.0 can detect removal (File Delete).

  7. Frequently files are renamed as well. Not mandatory, most ransomwares do it and they do it in different ways, while others don’t do it. Sysmon doesn’t log file rename activity.

And here are the actions which could be logged by Sysmon and how they are logged:

  1. File creation: Logged by Sysmon with EventID == 11.

  2. File overwrite: Sysmon handled different types of overwriting in different ways. Actually, Windows does the same. My tests resulted in the following outcome:

    1. Overwrite using ctrl+c and ctrl+v: 1 event with ID 11 (File create) and 1 with ID 23 (File delete).

    2. Overwrite using ctrl+x and ctrl+v: Not logged at all.

    3. Overwrite using powershell Copy-Item command: 1 event with ID 11 (File create) and 1 with ID 23 (File delete).

    4. Overwrite using powershell Move-Item command: 1 event with ID 23 (File delete). The file_name parameter is the overwritten file.

  3. File move: Not logged by Sysmon (except if the file overwrites another one, see above).

  4. File copy: Logged as a file creation at the destination of the copy.

  5. File deletion: This is a new feature in Sysmon since April 2020. Deleted files are logged under the event id 23. Also deleted files can be backed up before their deletion. This can be useful if a malware tries to remove itself (most ransomwares do this).

As we can see, some of the ransomwares theoretically can’t be detected by Sysmon. For example, one that reads the file content, encrypts it, and writes it back to the file stays hidden because “File Read” and “File Content Modification” are not picked up by Sysmon.

Detections

(1) Huge amount of file creation by one process in a short time

As we saw, one of the signature moves of ransomware is to create a lot of files (encrypted files), or to overwrite already existing ones with their encrypted pairs. Both of these actions are logged as File Create (11) action by Sysmon, so we can create a rule that detects a huge amount of file encryption by checking the created files. During my tests, I executed ransomwares on machines that contained copies of filesystems from other production machines. (So the ransomware action looked like it happened on a real, heavily-used machine.) Normally, a ransomware ran for 5-15 minutes, so this is what I consider a “short time”.

Query:

Sysmon 
| where EventID ==11 
| project TimeGenerated, process_id, process_path, file_name, Computer 
| summarize dcount(tostring(file_name)) by tostring(process_id), Computer, bin(TimeGenerated, 15min), tostring(process_path) 
| where dcount_file_name > 1200 

Possible False Positive Scenarios:

  1. An installer can create a lot of files as well.

  2. Download files with a script or cloning a repository from GitHub can also create many files in a short period.

  3. File restore software, or backup restoration can also create a great amount of files on the machine.

In my case, the OneDrive and TIWorker process created some false positives. These can be whitelisted from the rule, or OneDrive execution can be blocked on the machine to stop it from creating files on the machine.

Possible False Negative Scenarios:

  1. If you do not have enough file to be encrypted the ransomware will never trigger this rule.

  2. Ransomware can execute multiple processes and each of them can create only a few new files. Therefore, collecting the files process-by-process won’t be the best approach in a situation like that.

(2) File creation in a lot of different folders by one process in a short time

As I mentioned during the previous scenario, relying solely on the number of created files can trigger some false positives. The reason is that one GitHub repo (or any other download) can contain more files than our threshold, making it pretty hard to define the proper threshold without missing TPs but still keeping the amount of FPs low. What is common in FPs like this is that they frequently have a folder somewhere deep down in the folder structure and every file creation happens in that one folder. On the following picture, one can see that the github_repo folder is the only one that is touched, and no other folders are modified at all.

folder structure

On the picture one can see that the github_repo folder is in Level 5. No folder above this one will contain any newly created file during the cloning (related to the cloning). On the other hand, a ransomware tries to infect as many files and as many different folders as it can. This is especially true for other partitions or network shares. So, one idea is to check how many different folders have been touched by one process. But we are not going to weigh all of the folders equally to eliminate the FPs created by installers, folder downloads, or some backup restorations.

Let’s say a ransomware really tries to infect every partition so the partition level (level 1) will get a score of 50. Every partition that is touched by the process will increase the overall score with 50. I randomly assigned 35 points to level 2, 30 to level 3, 4 to level 4, and 1 point to level 5. This is a random assignment and doesn’t take any other indicator into consideration. For example, the “Default” user-profile is rarely used, so it could have a higher score, etc, etc.

Query:

Sysmon 
| where EventID ==11 
| extend file_name_list = split(file_name,"\\") 
| extend level1 = strcat(file_name_list[0]), 
    level2 = strcat(file_name_list[0],"\\",file_name_list[1]), 
    level3 = strcat(file_name_list[0],"\\",file_name_list[1],"\\",file_name_list[2]), 
    level4 = strcat(file_name_list[0],"\\",file_name_list[1],"\\",file_name_list[2],"\\",file_name_list[3]), 
    level5 = strcat(file_name_list[0],"\\",file_name_list[1],"\\",file_name_list[2],"\\",file_name_list[3],"\\",file_name_list[4]) 
| project TimeGenerated, process_id, process_path, file_name, Computer,level1,level2, level3, level4, level5 
| summarize l1 = dcount(tostring(level1)) ,l2 = dcount(tostring(level2)),l3 = dcount(tostring(level3)),l4= dcount(tostring(level4)),l5 = dcount(tostring(level5))by tostring(process_id), Computer, bin(TimeGenerated, 15min), tostring(process_path) 
| extend total_score = l1*50 + l2*35 + l3*30 + l4*4 + l5*1 
| where total_score > 500 
| sort by total_score 

And here is a result from my test network. This network contains approx 20 Windows systems, few of them are production systems, used for everyday scenarios and some of them are test machines. (I tested the rule in a big company environment as well, but I can’t share those results). Here is a picture of the output of the search:

Rule #2 output

On the picture the lines which contain msiexec.exe and svchost.exe are FPs, everything else is a true positive. Not all of the malicious activities have been caught by this rule though, so we will need some additional rules. Also please be aware that I used some exclusion (whitelisting) in the rule that I can’t share here. Also, the scoring should be fine-tuned in your environment. For example, if you do not have multiple drives then giving level 1 high score can distort the results.

False Positive:

  1. Backup restoration can still create FPs. For example, a machine after a re-image can download the whole user folder/profile from a central backup server so the user can continue his work where he left.

  2. With bad scoring, you can include a lot of processes that are otherwise benign.

False Negatives:

  1. This only detects ransomwares which try to encrypt as many things on the machine as possible. If the ransomware only encrypts the User’s Documents folder, or the ransomware crashes during execution then this rule won’t be able to capture it.


(3) Huge amount of file deleted in 1 folder

Normally a folder contains a limited amount of files, and not really a huge amount of them. However ransomwares from Class 2 copy the file to a different directory (but every file to the same directory) where it encrypts the file. After this, the files can be moved back to their original place in different ways. One of the ways is to copy those files back, then remove them from this temporary directory. This last removal step can be detected by Sysmon. But not all of the ransomwares do this, so this rule doesn’t have high reliability.

This rule is really-really noisy, so you definitely have to fine-tune it in your network or correlate it with something else. There are some tools and folders which can generate a lot of File Delete events, so it can be worthwhile to whitelist these. But be aware that for this rule you can hide TPs by whitelisting processes or excluding folders as well. Thus, in this case, if the ransomware uses that one folder you have just whitelisted, you won’t be able to detect it at all. Whitelisting a folder is not an issue for other rules because those check the activity over every folder, but this rule aggregates data by folder.

Query:

Sysmon 
| where EventID ==23 
| extend DirectoryPath = parse_path(tostring(file_name)).DirectoryPath 
| project TimeGenerated, process_id, process_path, Computer, DirectoryPath, file_name 
| summarize count(),make_set(file_name) by tostring(process_id), Computer, bin(TimeGenerated, 15min), tostring(process_path) 
| where count_ > 200 

FP:

  1. Normally, a user folder doesn’t contain too many files but there are multiple caches and temp folders on windows with a lot of files in them, and data in these ones are frequently deleted. In general, this is a noisy rule.

FN:

  1. Easy to whitelist a folder and hide the info with it

  2. Only works on ransomwares that collect every file to 1 folder before encrypting them. And even in that case, it only works if the files are deleted from that folder and not moved back to their original place.

(4) File deletion in a lot of different folders

Some ransomwares delete the original file before “replacing” it with the new one. Also, as I stated previously, some of the overwrite operations are logged by Sysmon as file deletion (partly).

This rule is practically the same as the second rule, but it uses file deletion logs (23) instead of file creation logs (11). Not going to copy the query here, you just have to change the EventID==11 to EventID==23 and everything else is the same. Still needs testing because it can create different False Positives.

Some of the malwares try to bypass the logging function. Depending on its capabilities it can circumvent everything or only the logging of some specific actions. If the logging of file creation has been altered, it can still be a good idea to have a detection based on file deletion.

One of the ransomwares I tested hasn’t generated any File Creation events. (This was possibly due to my Sysmon configuration, but who knows.). However, I got a File Delete event for every encrypted file. As I investigated previously, Sysmon only logs an event with 23 if a file is deleted OR if a file is overwritten by another file with the same name which is moved there via command line. I did not do any deeper investigation but my guess here is that the ransomware copied every file to a folder which is possibly not monitored by me, encrypted the files there, moved them back to their original place, and with this step it overwrote everything. And at last, it renamed the files.

(5) Lot of (different) files renamed by 1 process in short time

Unfortunately, Sysmon doesn’t log file rename operations, therefore I can’t create a detection based on this. However, a lot of EDRs can log some rename operations at some level so I can still create a pseudo-rule. Also unfortunate that the data in the RENAME logs are not always sufficient to create a good rule-based on them. Different solutions can log different data. So, it actually depends on the information in the log, what can be or can’t be done.

For the rules related to the RENAME operation I’m going to assume that a log contains this information in general:

  1. TimeGenerated: timestamp.

  2. dvc: Name of the machine.

  3. action: Name of the executed action, in this case: “RENAME”.

  4. process_id: ID of the process (or its name/path).

One possible rule is to trigger if a lot of files have been renamed by one process on the system in a short period. Batch rename operations are quite rare so even with poor logging, we can create some detection. We can create this rule if your EDR (or any other log source) events contain information of the initial file that is renamed or the new file which has been renamed. Any of them is good for us, we only want to detect if a huge amount of the files were renamed - we do not care about anything else.

Additionally, for this rule I need:

  1. file_name: The full filename with the path, name, and extension. (new or old)

Query:

EDR 
| where action="RENAME" 
| project TimeGenerated, process_id, process_path, file_name, Computer 
| summarize dcount(file_name) by tostring(process_id), Computer, bin(TimeGenerated, 15min), tostring(process_path) 
| where dcount_file_name > 100 

You can also add a “different folder” restriction as we did earlier in rule#2 or you can use a different method as well.

If you have RENAME logs but without any information about the old/new filename, you can still create a rule. In this case, you are only going to check the amount of renamed files by one process. Most of the time the two rules will have the same results.

(6-7) File extension changes (Rename)

A lot of ransomwares change the extension of the file. Some of them change every files’ extension to a specific one, while others use an (almost) random extension. I also saw one that simply HEX-encoded the filename and its extension.

To be able to detect any changes like this, your log needs to contain at least the new filename, but for some of the detections, you will need the old and the new filename as well.

Additional information needed in the log:

  1. old_file_path

  2. old_file_name

  3. old_file_extension

  4. new_file_path

  5. new_file_name

  6. new_file_extension

One logic that can be created using only the new filename is a detection that triggers if a lot of files have been renamed to have the same extension. File rename in general is not that frequently used as file creation or deletion. It is also rare that a process renames a lot of files to the same extension. The following query detects if more than 100 files have been renamed in 15 minutes by the same process and all of the newly renamed files have the same extension.

Query rule #6:

EDR 
| where action="RENAME" 
| project TimeGenerated, process_id, process_path, new_file_extension, Computer 
| summarize count() as counter by tostring(process_id), Computer, bin(TimeGenerated, 15min), tostring(process_path), tostring(new_file_extension) 
| where counter > 100 

This rule doesn’t specifically check whether the extension has been changed or not. It only checks whether a lot of files have the same extension or not. I’m explaining the FP/FNs based on this.

FPs:

  1. A situation where a lot of files keep their extension but not their name, when somebody batch renames them so their name will follow a pattern. These activities are mostly done in 1 or few folders, so adding the folder condition (summarize by folder as well) can eliminate these FPs.

FNs:

  1. Only triggers if files are indeed renamed. But it happens most of the time.

  2. If the ransomware starts multiple processes to rename the files the rule won’t trigger.

On the other hand, if you have the old and the new filename as well, you can create a more reliable rule. In this case, you can check if a lot of filenames have been changed to have the same extension, or simply whether a lot of extensions has been changed to something different. In my query, I’m only going to cover the former scenario but the latter one is not hard to implement either.

Query rule #7:

EDR 
| where action="RENAME" 
| project TimeGenerated, process_id, process_path, old_file_extension, new_file_extension,Computer 
| where old_file_extension != new_file_extension 
| summarize count() by tostring(process_id), Computer, bin(TimeGenerated, 15min), tostring(process_path), tostring(new_file_extension) 
| where count > 100 

FPs:

  1. Some encryption softwares encrypt the data and change it to an exe file. The encrypted file, later on, can be executed, and with the proper password, it will be decrypted (you can whitelist the exe extension).

  2. Similar can happen if you archive a lot of files one-by-one, or you encode a lot of data to a different format (like encode every movie on your machine to wmv).

FN:

  1. Won’t trigger if the files are renamed but their extensions are not changed, or not changed to the same string.

As we did previously, you can also add a constraint that’s going to check the number of distinct folders and not just the amount of files. As a ransomware is going to rename files in various folders, this can be even more reliable.

Interesting tidbit: As I already said Sysmon does not log file rename operations. Sometimes an investigation can be tricky without this information. For example, a user downloaded a file from the internet. The file was renamed after the download. Then the file was executed, and it triggered an AV solution. It can be hard to make the connection between the downloaded file and the detected suspicious one because the rename operation is not logged. The same can happen with ransomwares. You can see on a machine that the files are encrypted and all of them have the same extension. However, they were renamed after creation so you will never be able to identify the process which created them just based on their common extension.

(8) Ransom note creation

The ransomware has to notify the user about the infection and tell him the way and the amount one has to pay. Most of the ransomwares create ransom notes for this reason. Usually, these files are created in every folder the ransomware went through or in every folder a ransomware encrypted a file in. One way or another, this is normally a lot of folders and a great deal of files. It is also typical that the ransomware notes on the machine have the same name and content. Content (hash or even size) is not provided by Sysmon so I did not create a rule for that but you can try if you have the necessary data. Still, we can create a rule that checks whether a lot of files with the same name are generated on a machine by one process. Different folders are not important here because multiple files can’t exist with the same name in the same folder. But we can still use the folder information to eliminate some false positives.

This rule was especially noisy for me, so it needed way more whitelisting than the other rules so far (whitelists are not in the query).

If the rule is not working properly for you, you can still try the weighted method I used in the 2nd rule.

Query:

Sysmon 
| where EventID ==11 
| extend file_info = parse_path(tostring(file_name)) 
| extend folder = file_info.DirectoryPath, filename = file_info.Filename 
| project TimeGenerated, process_id, process_path, file_name, filename, Computer, folder 
| summarize dcount(tostring(folder)) by tostring(process_id), Computer, bin(TimeGenerated, 15min), tostring(process_path), tostring(filename) 
| where dcount_folder > 100 

 

FP:

  • There are some processes in Windows which generates lot of files with the same name in different folder. UpdateNotificationMgr.exe is one of them, Chrome also triggered a lot alerts for me. These could be eliminated by using the weighted method, or excluding some subfolders or processes.

FN:

  • Some ransomwares haven’t created a huge amount of ransomware notes.

This rule was created to detect if a process creates files with the same name in different directories. Ransomware notes are generally like this. However, I also caught a ransomware which during the encryption process created a tmp file for every file in the folder the original file was placed in. Furthermore, it named every tmp file the same. After encryption, it saved the data to the tmp file and then renamed the tmp file and overwrote the original file with it. This is the same pattern, many files created with the same name in different folders, so this action was detected as well by accident.

(9) Killing backup/restore services

Ransomwares tend to target various backup/restore processes. That means if you find a process that kills a backup service you should be alert. If the process tries to terminate a various other processes, mainly backup-related programmes, it is already more suspicious. If beside these, you also know you do not have those installed on your system then you should definitely investigate that machine. It is an odd scenario in which a legitimate process tries to kill completely random backup/restore-related processes that you are not even using.

Not 100% sure what is the purpose of killing backup services in some cases. Well, I get that it tries to prevent further backups, or backups that are triggered during file modification or removal, as some backup tools can save a file before changes happen on them. However, in some cases, this process termination can actually save your data. If you only store 1 version of your files as backup, then encrypting the original one and then letting the backup tool to overwrite the backup copy with the new encrypted file can actually destroy your chances of proper restoration.

In some cases, it also tries to kill a lot of other processes as well, so in this rule, I’m collecting every process I’ve seen to be killed by ransomwares. The rule is a general one to cover every service shutdown, process termination I saw, you can create an individual one for every initiating process (net.exe, cmd.exe, etc…).

Some explanations for the upcoming query. First, I defined the processes which are frequently used by ransomwares to kill other process or do some nasty things (commandList). After that, I defined the processes and services which are frequently terminated by ransomwares and saved it into the suspicious_process variable. Lastly, I defined the processes which are in the suspicious list (so we want to monitor them) but are otherwise valid in our network. The query is quite self-explanatory in this case. If one of the command lines contains any of the “suspicious_process” I give it 2 points as a score. If the process is normal in the environment, I decrease the score to one. In the end, I’m checking how many points a Computer collected this way in a short period. I’m not summarizing based on the processes here, because ransomwares frequently start a new process (new cmd execution for example) for each service termination.

The services and processes in the query are just examples, you can download a full list of services and processes from here.

Please be aware this query is not optimal. To check whether the command line contains any of the services from the list it uses the “has_any” function. However, has_any only going to return true if we can find the exact word there bounded by some hard delimiters. Here is an example:

  • “net stop firefox” has_any (“fire”,“firef”) –> this will return False

  • “net stop fire.fox” has_any (“fire”,“firef”) –> this will return True, because in the string “fire.fox” the “fire” part is bounded by a space (before it) and a dot (after it)

The file contains strings which were used with an asterisk at the end of them, those matches are not going to be captured by this rule.

Query:

let commandList =dynamic(["powershell.exe","cmd.exe","taskkill.exe","net.exe","net1.exe"]); 
let suspicious_process = dynamic(["winword", "wordpad", "outlook", "thunderbird", "AcronisAgent", "MSSQLServerADHelper100", 
  "CobianBackup11","mysqld.exe", "Apach2.4","SQLWriter", "MSSQL$SQLEXPRESS","MSSQLServerADHelper100","MongoDB","SQLAgent$SQLEXPRESS", 
  "SQLBrowser","CobianBackup11","cbVSCService11","QBCFMontorService","QBVSS","cbVSCService11","mysqld.exe","cerber.exe","Microsoft.Exchange","MSExchange"]); 
let known_suspicious_process = dynamic(["winword", "wordpad", "outlook", "thunderbird", "AcronisAgent"]); 
Sysmon
| where EventID == 1 
| where file_name in~ (commandList) 
  // only show the suspicious processes, and give the event 2 point as a score 
| where  process_command_line has_any (suspicious_process) 
| extend score = 2 
  // if the process is valid in your machines remove 1 point (so if it is a backup process you don't use, the score is going to be higher) 
  // this way a service kill towards an unknown service worth 2 points 
  // and a service kill attempt against a valid process in your network worth 1 point 
| extend score = iif(process_command_line has_any (known_suspicious_process), score-1, score) 
| summarize sum(score) by Computer, bin(TimeGenerated, 15min) 
| where sum_score > 5 

False Positive:

  • The rule can generate lots of false positives if you did not define your lists well.

  • FP can be the result if it is typical in your environment that these processes are started or killed by powershell/cmd/etc. The query doesn’t check whether the process has been started or terminated, it just checks whether the command-line contains it or not. You can further define them, but all of them have their own commands to stop a service/process so in that case, you need to create individual rules for each one of them.

False Negative:

  • Lots of ransomwares don’t utilize this method at all.

  • Ransomwares checking the running processes first before killing them, can keep their values low enough to be able to fly under the radar.

(10) Messing with shadow copy

Turning off backup services is not necessary the best solution. If the machine has been backed up already, then turning off the service won’t do anything. On the other hand, destroying the shadow copy of the machine will destroy the restore points as well. Thus, if a machine does not have any external backup, this solution is not only going to prevent further backups but also make the restoration impossible.

Most of the time these tools are used to configure shadow copy:

  • Cmd.exe

    • /c vssadmin resize shadowstorage /for=C: /on=C: /maxsize=401MB

    • /c vssadmin resize shadowstorage /for=C: /on=C: /maxsize=unbounded

    • /c vssadmin.exe delete shadows /all /quiet

  • wmic.exe

    • wmic shadowcopy delete
  • VSSADMIN.EXE

    • vssadmin Delete Shadows /all /quiet

    • vssadmin resize shadowstorage /for=c: /on=c: /maxsize=401MB

    • vssadmin resize shadowstorage /for=c: /on=c: /maxsize=unbounded

And there was a sneaky one as well: “C:\c\hmh....\Windows\mfjqm..\system32\autqg..\wbem\exs\ahxl\bcgv......\wmic.exe” shadowcopy delete

In this case, I tried to create a short rule only that covers all of the above scenarios. One could come up with a command that triggers this rule but not malicious, or at least it doesn’t do anything, however, I do not consider this a risk.

Query:

let commandList =dynamic(["cmd.exe","wmic.exe","vssadmin.exe"]); 
let suspiciousCommand = dynamic(["resize", "delete"]); 
Sysmon
| where EventID == 1 
| where file_name in~ (commandList) 
| where process_command_line contains "shadow" and process_command_line has_any (suspiciousCommand) 
| project TimeGenerated, Computer, file_name, process_command_line 

There is barely any reason to just remove the shadow copies, and almost every ransomware does this. Therefore, the FP and FN actions are very rare.

(11) Bcdedit usage

This tool could be used to configure Windows Automatic Recovery during boot. The ransomware creator wants to prevent this so he can use this tool with various command-line arguments.

The commands I found look like these:

  • bcdedit /set {default} recoveryenabled No

  • bcdedit /set {default} bootstatuspolicy ignoreallfailures

  • bcdedit /set {current} safeboot minimal

Query:

let commandList =dynamic(["cmd.exe","bcdedit.exe"]); 
Sysmon
| where EventID == 1 
| where file_name in~ (commandList) 
| where process_command_line contains "bcdedit" 
and ( 
    (process_command_line contains "recoveryenabled" and process_command_line contains "no") or  
    (process_command_line contains "bootstatuspolicy" and process_command_line contains "ignoreallfailures") or 
    (process_command_line contains "safeboot" and process_command_line contains "minimal")  
) 
| project TimeGenerated, Computer, file_name, process_command_line 

False Positive:

  • I have never encountered this command in the wild for legitimate purposes. Based on what it does, I think it is rarely used.

False Negative:

  • I saw this action used at ~30% of the ransomwares I tested. This means plenty of ransomwares are going to be possibly missed by this rule.

(12) RDP usage

Using RDP for lateral movement or even for initial access is not new and definitely not a ransomware specific activity. It is still interesting how many ransomwares have been using it recently (and maybe other malwares as well). But the reason I’m mentioning it here is not this. During my tests I saw multiple infected systems on which the RDP related registry keys have been tampered with. The most surprising and funny thing on the other hand is that one of the RDP related keys is obviously not clear for everybody.

Here is the code to enable RDP connection:

reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server" /v fDenyTSConnections /t REG_DWORD /d 0 /f 

And here is the code to disable it:

reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server" /v fDenyTSConnections /t REG_DWORD /d 1 /f 

Maybe you already see why is it not clear for everybody. ‘0’ means enable in the code, and ‘1’ means disable the RDP connection. If you read the key, the whole thing makes sense. Deny TS Connection? False. Then we don’t deny the connection, therefore we are going to allow RDP connection. But again, some people obviously just checked the 0 and the 1 in the code.

This could be the reason why I could see a ransomware that tried to use it incorrectly. It actually disabled the RDP before it tried to propagate. Funnily enough a later version of the same ransomware (same family) did the opposite, it actually enabled the RDP. But there is something even funnier. I also encountered a famous EDR solution that handled this activity incorrectly. The EDR created an “RDP Enabled” event when actually the EDR has been disabled (the enable event wasn’t even logged). So, I recommend you to test this activity on your machine and confirm your EDR actually does the proper logging or that your rules are actually checking the proper actions.

Other registry key is also modified sometimes so I’m adding that one to the query as well. Unfortunately, this isn’t as funny as the previous one.

 
reg add "HKLM\System\CurrentControlSet\Control\Terminal Server\WinStations\RDP-Tcp" /v "UserAuthentication" /t REG_DWORD /d 0 /f 

Query:

 Sysmon 
| where EventID == 13 
| where registry_key_path == "HKLM\\System\\CurrentControlSet\\Control\\Terminal Server\\fDenyTSConnections" or 
   registry_key_path == "HKLM\\System\\CurrentControlSet\\Control\\Terminal Server\\WinStations\\RDP-Tcp\\UserAuthentication" 
| project TimeGenerated,Computer, process_path, registry_key_details, registry_key_path 

False Positive:

  • Anybody can turn it off and on, so it can trigger FPs.

False Negative:

  • Not every ransomware uses it.

Additional fine-tuning opportunities to prevent FPs

Having these rules for ransomwares can be beneficial. But as I explained, in some cases they can create a decent amount of FPs in some networks. You definitely have to adjust it to your network. Also every threshold and scoring has to be configured based on your data.

At this point we have a bunch of rules to detect various activities of a ransomware:

  1. Detecting file encryption (rule 1-6)

  2. Notifying the user (rule 7-8)

  3. Preventing backup/restore (rule 9-11)

Correlate alerts

When we have these many different types of alerts, we can correlate them and their output. For example, one can create a rule that triggers if 2 basic rules trigger from at least 2 different groups. One rule from file encryption and one from ransom note creation.

You can also assign a score to the alerts. The one that triggers more TPs in your network and seems not too FP-heavy could have a higher score, while less reliable rules can have lower scores. You can create a correlation rule that triggers if the score reaches a threshold on a machine.

Creating correlation rules generally decreases the false positives in the network, which is nice, but also going to miss more true positives. The reason for this is that some of the real infections are going to be picked up by only 1 rule, or rules with a lower score. To balance this out, you can lower the threshold in the individual rules, you can make them more FP-prone because the correlation will eliminate the high amount of FPs.

Elaborate the whitelist of the ransomware

Many ransomwares whitelist some file extensions they don’t want to encrypt. These files are most of the time some kind of executables like exe, bat, cmd. There are two reasons behind it. The first is that these are normally not high-value files. If an executable is encrypted, there is a huge chance that a user can download the same file from the internet after a machine re-image. The source code of an executable can be important and high value, but the interpreted version is rarely indispensable.

The other reason is that the actor wants to keep the machine up and running. This is the only way to let the user know about the infection and the way of the payment. These whitelisted files are often needed for the machine to work.

So you can add some additional filter to the rules you created. If you have already found a process by any previously created rule that looks like a ransomware, you can get the process id/process_path and you could check whether any process with the id/path touched any file with the whitelisted extensions. If it did, it is possibly not a ransomware. You can look up on the internet which file extensions are not encrypted by most of the ransomwares.

Additionally, you can check whitelisted process names and folders which are not going to be touched by a ransomware.

Default profile

Every user on a Windows machine has its own profile. Beside those ones, there are some others not directly related to the user. One of the is the Public profile which is accessible by all of the users and can be used to share information and files between users. The other profile unrelated to a user is the Default profile. The Default profile serves as a template for all the user profiles. When a new user is created, the content and directory structure of the Default profile is going to be copied.

It is pretty rare though that somebody touches the Default profile. It can be modified if somebody wants to change the structure and files in a new user’s folder. This is rarely happening in a company environment. Normally, there is only 1 user on a machine so new profiles are not going to be created. If the company still wants to modify the profile for the new employees, it can do it in the default Windows image it installs to the machine. Modifying the Default profile on an already onboarded and used machine is not necessary.

Besides, because this is not a user profile, it is only there to be a template. That means creating new files there by processes or by a new user is not really useful. However, I saw that some of the ransomwares still created ransomware notes there as well.

The newly introduced Maze ransomware or the famous WannaCry both encrypted files or created ransom notes here.

However, a more important thing is the “One” ransomware. During my test, this cryptor only created 4 ransomware notes, so I wouldn’t be able to detect it based on the amount of created files. On the other hand, touching the Default profile was suspicious enough in this case.

Even if not a rule in itself but rather as a correlation rule this could be implemented for ransomware detection:

Query:

Sysmon 
| where TimeGenerated > ago(30d) 
| where EventID == 11 
| where file_name startswith("C:\\Users\\Default") 

Process without file on the disk

Lots of malwares tend to remove their original executable from the machine to prevent proper investigation and reverse engineering. If you have the proper capabilities you can check whether a given process (based on its process_path) has an existing executable on the disk or not.

Either you can utilize some automation here and try to collect the file from the machine or you can use a tool like osquery that can collect this information from the machine directly. This or that way it is worth to automate an action like this.

Whitelisting

If you want to whitelist anything from your rule, you can use multiple fields. One of the most trivial ideas would be to whitelist a process. Whitelisting some of the most famous Windows processes could really decrease the FP ratio. But in this case, you also have to be aware that you can hide some of the True Positives as a lot of malware copies a known Windows filename, replaces a valid Windows executable, or inject its code into another process.

I would rather recommend that you try to whitelist folders. A ransomware acts on tons of different folders, only excluding a few, and not adding it to a counter shouldn’t significantly distort your results. Also, in my queries, I used the “dcount” command frequently. You should be aware that this does not provide you an exact number. It only returns an estimate for the number of distinct values. If this is not good enough for you, you can try to use the “distinct” command itself. But the gist is, the command itself is already not accurate. So, removing 1-2 folders (or even more) from the rule shouldn’t significantly change anything.

Honeytoken

Creating “honeyfiles” or “honeyfolders” is also a good solution. It can be really reliable if the file/folder is hidden enough and a user can’t touch it. If you are not monitoring file reads (you can’t with Sysmon) then you shouldn’t have too many FPs with honeytokens.

Automation

This post is already way too long so I’m not going to dig deeper into automation, but here are some ideas you can think of:

  • Automatically terminate the process that is identified as possible ransomware process by your detection

  • Collect the process memory of the identified process

  • Temporarily archive the deleted files from the machine. If a ransomware is identified on the machine, keep the archive otherwise just throw it out after half an hour or so (archiving can be done with the new version if Sysmon). This way you can have backup of the files, and you can have the original file of the ransomware. The original binary is frequently removed by the malware, so it is big deal if you can still get it somehow.

The End

This post could have been longer…