Hunting Malicious Domains: Introducing DomainTools Threat Profile

Share and earn Cybytes
Facebook Twitter LinkedIn Email

“Every path you take, every domain registration you make, we’ll be watching you.”

Summary (TL;DR)

We created Threat Profiles, a set of supervised machine learning classifiers, to find domains which could become weaponized by bad actors for phishing, malware, or spam campaignsWe find these domains before they are weaponized; we think of them as domains registered with “malicious intent”The accuracy of our classifiers is pretty darn goodWe created a new machine learning infrastructure, called the Crank, to make quick changes, run experiments, and automatically evaluate the results,We have a dedicated Data Science and R&D team so that we’ll constantly stay one step ahead of the bad actorsYou can easily integrate Threat Profile scores into your firewall rules, Splunk, or other threat intelligence processesIt’s in beta now, and will be released soon


There are, unfortunately, bad actors on the Internet who register, weaponize, and deploy domains as part of phishing, malware, or spam campaigns. Malicious domains make the internet a less safe and more annoying place for everyone. Our goal is to identify and flag these domains—domains registered with “malicious intent”—before they are weaponized and they “Cry ‘Havoc!’, and let slip the dogs of war”.

So, we’ve invested heavily in Data Science and R&D over the last 18 months to create Threat Profile, a component of the DomainTools Risk Score that you can use to augment your existing threat intelligence processes. We think of domains with a high Threat Profile score as belonging on a “domain watchlist”, domains which we believe may become dangerous in the near future. An overview of Threat Profile is shown in Figure 1.

Threat Profile embodies our belief that bad actors make the Internet a less safe and more annoying place for everyone.

Figure 1: An overview of the Domaintools Threat Profile, showing how malicious domain data is used to train classifiers which then generate risk scores on new and and updated domain registrations

What is a Threat Profile?

A Threat Profile is our view into the mindset of bad actors: how they determine which domains to register and how they set up their malicious infrastructure. From that view, we’ve created a set of three machine learning classifiers: one for phishing, one for malware, and one for spam.

Each classifier is independently engineered, trained, and optimized to find domains registered with malicious intent. We use the same high-quality blacklist information as the Proximity component of the Risk Score as well as our extensive Whois and DNS databases to identify important domain features against which we train our classification models. Each model is repeatedly tested and optimized over time to validate its accuracy.

These classifiers analyze all new and updated domain registrations, generating scores indicating our belief that a given domain has malicious intent. Specifically, our models look for domains that may become weaponized anytime within next 18 months. A high score doesn’t guarantee badness; a bad actor may register many domains but only end up using a few, but our Threat Profile classifiers are designed to find all such domains registered by bad actors whether or not they ever become weaponized.

Important Note: we’re not looking for compromised domains, only domains we believe are registered with malicious intent by bad actors or their proxies.

Classifying Domains

We are using state-of-the-art supervised machine learning classifiers to build our Threat Profiles. Each classifier is selected and tuned independently to best identify phish, malware, and spam threats respectively. Here’s an overview of how we do it:

Create training and test datasets using curated blacklist data and our Whois and DNS databasesUse our extensive domain knowledge of bad actors, our expertise in cybersecurity TTPs, and detailed analysis of our data to determine which intrinsic properties of the domains, a.k.a. features, are most useful for identifying malicious intentRun grid searches using our machine learning infrastructure over different sets of features and tunings to optimize our classification modelsCompare model accuracy on the test dataset using standard classification metrics

feature is machine learning terminology for an intrinsic property of an item which is used to train a classifier or used to classify an item. “Raw” information about an item isn’t useful, it needs to be encoded for a computer to understand it. This “encoding” is the feature. For example, if you want to train a classifier to predict someone’s age, you might use height as measured in inches as a feature. The feature is encoded as a number to represent height. When a classifier does training, it looks at the set features for each item and learns from the patterns it finds.

For Threat Profiles, we created features from three categories of data:

The domain name itself, including TLDDomain registration informationDomain infrastructure information

We found that different features are more or less important when predicting phish, malware, or spam intention. One concrete example: using a hyphen in your domain name, such as “com-online-today[.]test”. While hyphens are important to identify phishing domains, they aren’t nearly as important for identifying spam domains.

For each classifier, we look for discriminatory features in two ways. First, we leverage our internal expertise in cybersecurity and domain registrations. For example, many of the features used in our phish classifier come from our expertise creating PhishEye. Second, we’re doin’ data science—we look for correlations and patterns among domain metadata. It’s deeper than just the characters in the domain name or TLD. We look at how and when a domain was registered, and we look at the infrastructure used to host the domain.

Train, Test, Repeat

We’ve spent the last 18 months in research and development of the Threat Profile classifiers. To that end, we created a robust machine learning infrastructure (affectionately called “the Crank”) to deploy and test changes to our features and classifiers quickly. Using the Crank, we can run not just a few, but hundreds of classifier experiments on our cluster at once. All of this to find interesting interactions between features and improve our models.

To make sure our models are awesome, we use a consistent training/testing methodology. We randomly sample domains to be in either a training or test dataset and then perform k-fold cross-validation over models built with the training dataset. This helps ensure that the models are not brittle or overly sensitive to the training data. We like k-folds so much, we built it into the Crank.

We use a standard set of accuracy metrics to evaluate our models. Some metrics measure the classifier’s overall performance, and others measure its performance at a given threshold. We evaluate against the withheld test datasets. Our metrics include:

Receiver-Operator Characteristic (“ROC”) CurvesPrecision-Recall (“PR”) CurvesPrecision, Recall, and the F1 Score, at given thresholds

For both ROC and PR curves, it is common to look both at a visualization of the curve as well as the area under the curve (AUC). The higher the AUC, the better the classifier is doing, with 1.0 being “perfect”. The F1 score is the harmonic mean of precision and recall, and thus takes both false positives and false negatives into account. It’s more robust than precision or recall alone, and harder to achieve a high score. It’s perfect for us and our high standards. It also ranges from 0.0 to 1.0.

The Crank allows us to encode and execute hundreds of classification experiments at once. We can quickly compare each experiment’s results and use that data to help us improve our models over time.

Peeking Under the Hood

So, just how good is it? Let’s look at the metrics for one of our Threat Profile classifiers: Phishing. This data comes from one of our recent rounds of testing; we expect the performance of our releasing Threat Profile to be even better.

Table 1 shows some summary metric scores for our Phishing Threat Profile classifier. While AUC and F1 score performance is application dependent, we are very happy with these scores and the domains we classify as having phishing intent.

Table 1: Summary Metrics for Phishing

The classifier we’re using for the Phish Threat Profile returns a raw score between 0 and 1, where 0 means not “phishy” at all and 1 means totally “phishy”. To compare the classifier’s score against the test dataset, you select a threshold, typically 0.5, and do “a cut”. Everything below the threshold is considered a 0 (not phishy), and everything above it is a 1 (totally phishy). For this instance of Phish, we set our threshold at 0.46.

Figure 2 shows how the Precision, Recall, and F1 scores for Phish vary as we adjust the Threshold parameter from 0 to 1. In the figure, you can trace the tradeoffs between optimizing a classifier for precision versus recall: as one falls the other increases. We are happy to see that for a broad set of thresholds our Phish classifier generates high F1 scores. This implies that most of the raw classification scores are not near 0.5, but rather towards the two ends of the spectrum and gives us high confidence in the quality of our classifications.

Figure 2: Precision, Recall, and F1 scores for Threat Profile Phish by Threshold. The x-axis is the threshold, and the y axis is the metric score.

Our Malware and Spam Threat Profile classifiers show even better performance with F1 scores near or exceeding 0.9 and ROC AUC scores above 0.95. Why did we show you Phishing, the worst performer of the three classifiers? It is important that our customers trust our scores and the data science behind them. In security trust is earned and not given. Our customers will want to know how we generate and update our scores before including them in their operational security practices and processes. Most importantly we hold ourselves to the highest standards, both our own and those of our customers and partners.

Interpretation and Use

Threat Profiles are risk scores and should be used as part of your existing threat intelligence processes. Think of domains with a high Threat Profile score as belonging on a “domain watchlist,” domains that could become weaponized anytime within the next 18 months. Depending on how severely we score the domain and your organization’s risk tolerance, you may want to take different actions: everything from flagging their appearance in your server logs to blocking the domains outright.

The Threat Profile score format is similar to the Proximity format, following a 0 to 100 scale. The higher the Threat Profile score, the more likely the domain was registered with malicious intent:

0, domain is whitelisted50+, suspicious70+, our recommended threshold for indicating malicious intent90+, strong confidence in near-term weaponization100, domain is blacklisted

Along with our Proximity score, we generate a total of four risk scores to help you understand the kinds of threats appearing on your network. Each Threat Profile score should be interpreted independently. For example, if your organization is more worried about phishing than spam, set alerts at two different places: 65 for Phishing, and 90 for Spam.

If you just want to mitigate your overall risk, use the DomainTools Risk Score, which is a combination of Proximity and the Threat Profiles scores. It is the “one score to rule them all” which you can easily integrate into your firewall rules or other automated threat intelligence processes.

A Note About Dormant Domains

Not every domain registered by a bad actor will be weaponized. Many will sit dormant until their registration period ends. The goal of Threat Profile is to find all domains registered with malicious intent, even if they remain dormant. From a classification perspective, these domains are not “false positives” but rather “future positives,” because we believe they have the potential to become weaponized at any time.

There’s always a tradeoff between providing users access to resources online and protecting your network from threats. We believe watching and/or blocking domains flagged with our Threat Profile score is an effective way to isolate potential threats while minimizing the impact to your users and customers.

The DomainTools Advantage

We have a dedicated R&D and Data Science team continually monitoring changes in our DNS databases and evaluating new blacklisted domains to determine how bad actors behave and update our models accordingly. Moreover, we built the Crank, our flexible machine learning infrastructure, to make quick changes to the features, run experiments, and automatically evaluate the results. This infrastructure is just as important as the Risk Scores themselves–it means we can keep putting out high quality predictions in the future, no matter how bad actors change their tactics.

In the cat-and-mouse game of domains registered with malicious intent, we’re not building a better mousetrap, we’re making the cats better hunters. Hunting malicious domains.

Take it for a Test Drive

The DomainTools Risk Score with Threat Profile will be available soon; we’re rolling it out in beta now. Moreover, our latest release of the DomainTools App for Splunk has built-in support for our new Risk Score with Threat Profile & Proximity.

Contact us today to get access to the API and try it out for yourself.

Share this post and earn Cybytes
Facebook Twitter LinkedIn Email
About DomainTools
DomainTools helps security analysts turn threat data into threat intelligence. We take indicators from your network, including domains and IPs, and connect them with nearly every active domain on the Internet. Those connections inform risk assessments, help profile attackers, guide online fraud investigations, and map cyber activity to attacker infrastructure. Fortune 1000 companies, global government agencies, and leading security solution vendors use the DomainTools platform as a critical ingredient in their threat investigation and mitigation work. Learn more about how to connect the dots on malicious activity at
Promoted Content
The Distribution of Malicious Domains
In our previous reports, we profled malicious domains by describing patterns in theirregistration details: top level domain (TLD), free email provider, Whois privacy provider, andhosting location. In this edition, we compared the distributions of malicious domains vs neutraldomains across a measure of age (both of the domain and of the name server domain) anda measure of the entropy of the domain name. We also examined malicious domains acrossregistrars to fnd additional clues as to how and when these domains were registered.

Our Revolution

We believe Cyber Security training should be free, for everyone, FOREVER. Everyone, everywhere, deserves the OPPORTUNITY to learn, begin and grow a career in this fascinating field. Therefore, Cybrary is a free community where people, companies and training come together to give everyone the ability to collaborate in an open source way that is revolutionizing the cyber security educational experience.

Cybrary On The Go

Get the Cybrary app for Android for online and offline viewing of our lessons.

Get it on Google Play

Support Cybrary

Donate Here to Get This Month's Donor Badge

Skip to toolbar

We recommend always using caution when following any link

Are you sure you want to continue?