Analyse, Hunt, And Classify Malware Using .NET Metadata
Introduction
Earlier last week, I ran into a sample that turned out to be PureCrypter, a loader and obfuscator for all different kinds of malware such as Agent Tesla and RedLine.
Upon further investigation, I developed Yara rules for the various stages, which can be found here (excluding the final payload):
With that out of the way, all of this reminded me of the fact that we can also write Yara rules for unique identifiers specific to malware written in .NET, or any other .NET assemblies for that matter.
A bit of history
This isn’t my first encounter with analysing .NET malware at scale: several years ago, I co-authored a presentation with Santiago on hunting SteamStealer malware, which was surging exponentially at the time (the malware intended to steal your Steam inventory items and/or your account). A huge thanks goes to Brian Wallace who had developed a tool at the time called GetNetGUIDs with which it was trivial to extract all the GUID types and start clustering to identify patterns: basically, which of the malware samples are likely authored by the same person or belong to the same attack campaign.
.NET assemblies or binaries often contain all sorts of metadata, such as the internal assembly name and GUIDs, specifically; the MVID and TYPELIB.
-
GUID: Also known as the TYPELIB ID, generated when creating a new project.
-
MVID: Module Version ID, a unique identifier for a .NET module, generated at build time.
-
TYPELIB: the TYBELIB version – or number of the type library (think major & minor version).
These specific identifiers can be parsed with the strings command and a simple regular expression (regex): [a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}
Taking a sample of PureLogStealer posted by James_in_the_box, you could then write a Yara rule based on the MVID or Typelib detected.
As shown on VirusTotal for this sample:
Figure 1 – Sample with MVID 9066ee39-87f9-4468-9d70-b57c25f29a67 |
And the resulting (simple) Yara rule, could then be as follows:
rule PureLogStealer_GUID
{
strings:
$mvid = “9066ee39-87f9-4468-9d70-b57c25f29a67” ascii wide fullword
condition:
$mvid
}
There are however some issues with this:
-
The MVID is stored as a binary value rather than a string, whereas the Typelib GUID is effectively stored as a string and since we only have the MVID here, the sample above will not be detected with this rule.
-
It is important to note that VirusTotal does not seem to report the Typelib.
-
It is cumbersome to “do it the manual way” with strings and regex, especially on larger data sets – and it’s prone to issues such as:
-
false positives: if you run “strings” on the sample and then use the following CyberChef recipe – we get plenty of GUIDs, but only 1 is the actual Typelib;
-
false negatives: we miss out on unique identifiers, which means we might miss detection of samples, campaigns or actors.
Note that with tools such as IlSpy or dnSpy(Ex), you can also view the Typelib GUID and MVID, however, not all tools display all data, for example:
Figure 2 – dnSpy detects the Typelib GUID of the sample |
And if we go the “oldschool” route using ildasm:
Figure 3 – ildasm displays the MVID or Module Version ID |
For all the above reasons, let’s go beyond and do more: both with Yara, and with a new Python tool I’ve created.
The now and the tooling
Before we dive into the tooling, some final history to say that Yara has evolved and thanks to that, we can now hunt and detect more effectively due to the following modules added:
This means that using the .NET module, we can now write a Yara rule like so instead:
import “dotnet“
rule PureLogStealer_GUID
{
condition:
dotnet.guids[0]== “9066ee39-87f9-4468-9d70-b57c25f29a67“
}
And indeed:
Figure 4 – Yara now detects the sample |
Yara rule
Let’s now leverage the power of Yara and its dotnet and console modules to write a new Yara rule that displays useful data of any given .NET sample that can be leveraged to create meaningful rules, for example: assembly name, typelib and MVID.
Figure 5 – Yara rule to display .NET information to the console |
We first verify if the binary is a .NET compiled file, if so, log certain Portable Executable (PE) or binary information to the console as well, and then display all relevant .NET information.
And the output will be, again for the same sample:
Figure 6 – Yara rule output: sample metadata! |
Meaning we can now write a rule as follows:
import “dotnet“
rule PureLogStealer_GUID
{
condition:
dotnet.guids[0]==“9066ee39-87f9-4468-9d70-b57c25f29a67“ or
dotnet.typelib==“856e9a70-148f-4705-9549-d69a57e669b0“
}
Python tool
But what if we want to run this on a large set of samples and produce statistics, which we can then use to hunt or classify malware families, or cluster campaigns?
A newly developed Python tool will help you do exactly just that. It supports both a single file as well as a whole folder of your samples or malware repository. It will skip over any non-.NET binary and simply report the typelib, MVID and typelib ID (if present, which is seldom the case and rarely useful).
If we run it on our single sample like before:
Figure 7 – New tool output on single sample |
The tool (or script) has the following capabilities:
|
I highly recommend to use the tool rather than the Yara rule, as it detects .NET metadata more reliably. Both Yara rule and Python tool can be adapted to display less or more information according to your needs.
Clustering
Tracking attacker’s campaigns is always an exercise, and can be both fun and exhausting, depending on how many rabbit holes you (want to) go through. An example of clustering campaigns as well as malware developers was done in the work I did with Santiago as mentioned earlier, which resulted in the following graphics:
Figure 9 – Statistics from 2016 research (bonus obfuscation stats) |
This was a pretty large dataset (1.300 samples!) and specific to SteamStealers at the time.
For our analysis purposes, I took 4 of the most current popular malware (that are .NET based or have at least a .NET variant) according to Any.run’s Malware Trends: https://any.run/malware-trends/. These are:
-
RedLine
-
Agent Tesla
-
Quasar
-
Pure*: basically anything related to PureCrypter, PureLogs, …
Downloading the latest available samples per family from MalwareBazaar, then running my DotNetMetadata Python script, and playing around with pandas and matplot, we can create the following graphs per family:
RedLine – 56 samples
Figure 10 – RedLine Typelib GUID frequency |
|
Agent Tesla – 140 samples
|
|
Quasar – 141 samples
|
Figure 15 -Quasar MVID frequency |
Pure* family – 194 samples
Figure 16 – Pure* Typelib GUID frequency |
Figure 17 -Pure* MVID frequency |
While these piecharts are certainly hypnotic and display the frequency – or occurrence of the same typelib or MVID, we can also leverage these and create meaningful Yara rules for clustering samples per family, especially in the case of Quasar, the MVID with GUID “60f5dce2-4de4-4c86-aa69-383ebe2f504c” appears like a good candidate.
You might think that while these charts look visually appealing (depending on your art preferences), they may not be particularly useful because they don’t scale well with larger datasets. You’re exactly right! By limiting the amount of results displayed, we can indeed produce even better results. In our sample dataset for the 4 malware families above, so a total of 531 samples, let’s run our visualisations again and now we will:
-
Run it on the whole sample set
-
Extract the assembly name
-
List only the top 10 of assembly names
-
Use a bar chart instead of a pie
And the result:
Figure 18 – Assembly name frequency – looking better right? |
The top 3 is then:
-
“Client”: Quasar family
-
“Product Design 1”: Pure family
-
“Sample Design 1”: Pure family
Client is likely the default assembly name when compiling the Quasar malware (project), and Product Design and Sample Design are likely default assembly names from the PureCrypter builder.
If we then want to write a Yara rule for Quasar based on the default assembly name:
import “dotnet“
rule Quasar_AssemblyName
{
condition:
dotnet.assembly.name == “Client“
}
But why stop there? We can build a Yara rule to classify our malware dataset or repository:
import “dotnet“
import “console“
rule DotNet_Malware_Classifier
{
condition:
(dotnet.assembly.name == “Client“ and console.log(“Likely Quasar, assembly name: “, dotnet.assembly.name)) or
(dotnet.assembly.name == “Product Design 1“ and console.log(“Likely Pure family, assembly name: “, dotnet.assembly.name)) or
(dotnet.assembly.name == “Sample Design 1“ and console.log(“Likely Pure family, assembly name: “, dotnet.assembly.name))
}
And we run this new Yara rule on the combined samples of the Pure family and Quasar:
Figure 19 – Simple “malware classifier” |
We can combine sets of Yara rules bases on assembly name, Typelib, MVID and so on to create rules with a higher confidence, and we can use this in further hunting, classification and… much more.
Bonus
If you’ve made it this far, it only makes sense to add in an additional extra use-case for all of this: finding new crypters or obfuscators!
When I ran the script on the +500 samples, there was 1 assembly / binary that stood out:
Figure 20 – Potential new crypter “Cronos” |
Making a simple Yara rule again:
import “dotnet“
rule cronos_crypter
{
strings:
$cronos = “Cronos-Crypter“ ascii wide nocase
condition:
dotnet.is_dotnet and $cronos
}
Running this on the Unpac.me dataset yields:
Figure 21 – Unpac.me Yara hunt results |
4 matches in 12 weeks: it appears this crypter is not popular (yet): 2 Async RAT samples and 2 PovertyStealer samples have used it so far.
Bonus on Bonus
Let’s go with a final bonus round: improving the previous “classification” rule by also reviewing results for Async RAT. Seeing the previous crypter was used on at least 2 Async RAT samples, I wanted to see some statistics for this malware as well, for just the assembly name. This results in the following, based on 86 samples:
Figure 22 – Another pie chart: AsyncRat top used assembly names |
Jumping out are the following assembly names:
-
AsyncClient
-
Client ???? Also seen in Quasar!
-
XClient
-
Output
-
Loader
-
Stub
AsyncClient is likely the default name when building the Async RAT project. But we are interested in widening the net: from the previous rule DotNet_Malware_Classifier, let’s update it with these new “generic” or default assembly names:
import “dotnet“
import “console“
rule DotNet_Malware_Classifier
{
condition:
(dotnet.assembly.name == “Client“ and console.log(“Suspicious assembly name: “, dotnet.assembly.name)) or
(dotnet.assembly.name == “Output“ and console.log(“Suspicious assembly name: “, dotnet.assembly.name)) or
(dotnet.assembly.name == “Loader“ and console.log(“Suspicious assembly name: “, dotnet.assembly.name)) or
(dotnet.assembly.name == “Stub“ and console.log(“Suspicious assembly name: “, dotnet.assembly.name))
}
Figure 23 – Classifier Yara rule results |
Conclusion
In this blog post, two new tools were presented to extract metadata from .NET malware samples. Specifically, we can now reliably extract 2 unique GUIDs: the Typelib and the MVID.
The Python script is capable of extracting the desired data from a large set of .NET assemblies, whereas the Yara rule is tailored for use with one particular sample. Of course, either of them can be used interchangeably: you can still fine-tune the Yara rule for a large set and work this way if you don’t want to rely on an external script. Similarly, the script can be extended to extract more data to be used.
Based on the output of these tools, you can then create Yara hunting rules, combine it with your existing rule sets, or use them in an attempt to classify malware families or specific attack campaigns.
Some closing remarks:
-
GUIDs could be spoofed or even removed. No method is 100% reliable.
-
However, this method can enhance already existing rulesets, especially those where .NET obfuscators (e.g. SmartAssembly) obfuscate (user) strings, modules and more, making it harder to write Yara rules for a malware family. Detecting based on GUID however, can work regardless of obfuscation method.
-
That said, obfuscating or deobfuscating may also alter the GUIDs. Keep this in mind when creating your detection rules based on an original or unpacked/deobfuscated sample.
-
If you encounter a GUID comprised entirely of zeros, such as 00000000-0000-0000-0000-000000000000, avoid using it for hunting since it’s an empty GUID.
This indicates the value may not be set or has been altered. This would make for a poor hunting rule as it can be a default value for any .NET project. -
You can also this for .NET assemblies that are not malicious: extract developer information and other metadata per your use case or purpose.
Happy .NET hunting! You can find the tools and some of the example Yara rules in the repository: https://github.com/bartblaze/DotNet-MetaData
As always, feedback is welcomed.
READ MORE HERE