ExtraHop Releases Massive Machine Learning Dataset to Bolster Cybersecurity Defenses Against Domain Generated Algorithms

ExtraHop, a leading provider of cloud-native network detection and response solutions, has made a groundbreaking move in the world of cybersecurity by releasing their extensive machine learning dataset to the public. The dataset, consisting of 16 million rows, is aimed at enabling security teams to effectively detect and combat malware and botnets, particularly those generated by algorithm (DGAs). By sharing their valuable research and data, ExtraHop hopes to foster collaboration and innovation within the industry.

DGAs present a significant challenge to security teams as they are employed by threat actors to infiltrate an organization’s network covertly and maintain control. These attacks are notoriously difficult to identify and mitigate promptly. With the ever-evolving threat landscape, open sourcing research and datasets is becoming a crucial solution for security teams to stay ahead.

Raja Mukerji, ExtraHop’s Chief Scientist and Co-founder, emphasized the importance of democratizing threat research detection tools. By making their DGA detector dataset accessible on GitHub, ExtraHop aims to empower security teams of different sizes, backgrounds, and industries to proactively identify and address malicious activity in their environments. Mukerji emphasizes the significance of collaboration in the cybersecurity community, asserting that sharing best practices is the key to staying one step ahead of attackers. He encourages other teams to follow suit and contribute their insights to benefit the industry as a whole.

Initially developed for ExtraHop’s Reveal(x) NDR platform, the dataset is now available for any security researcher to construct their own machine learning classifier model. This will enable quick and accurate identification of DGAs, allowing security teams to intervene in attacks with greater speed and precision. Remarkably, the ExtraHop DGA model has demonstrated an accuracy rate of over 98% since its implementation in the Reveal(x) platform.

Todd Kemmerling, ExtraHop’s Data Science Director, outlines the motivation behind sharing this dataset. Recognizing the lack of accessible public datasets for security teams, he states that ExtraHop aims to fill this gap by providing crucial data needed to swiftly detect and combat DGAs. As threat actors become more sophisticated in operating undetected, defending against DGAs is becoming increasingly vital.

ExtraHop’s move to release this massive machine learning dataset marks a significant step forward in enhancing cybersecurity defenses. By promoting collaboration and knowledge-sharing, ExtraHop aims to create a stronger and more secure digital landscape for organizations of all types. Security teams now have a valuable resource at their fingertips to stay ahead of emerging threats and protect their environments from malicious attacks.


What is a DGA?

A Domain Generated Algorithm (DGA) is a technique employed by threat actors to generate an extensive list of domain names dynamically. These domains are used to establish and maintain control within an organization’s network, making detection and prevention of attacks more challenging.

Why is open sourcing research and datasets important for cybersecurity?

Open sourcing research and datasets allows for collaboration and knowledge-sharing within the cybersecurity community. By making valuable data accessible to security teams, it enables them to develop more effective defenses against evolving threats and stay ahead of malicious actors.

How can security teams benefit from ExtraHop’s machine learning dataset?

ExtraHop’s machine learning dataset provides security teams with the necessary resources to construct their own machine learning models for identifying DGAs. This aids in the early detection of malicious activity, allowing teams to respond swiftly and mitigate potential damage. The dataset has demonstrated an accuracy rate of over 98% in detecting DGAs.