Home / AI & Machine Learning / Automating Malicious Infrastructure Detection With Graph Neural Networks

Automating Malicious Infrastructure Detection With Graph Neural Networks

Jan 14, 2025

Thomas NeumainEnterprise Software Specialist

The world of cybersecurity is fraught with a constant battle between defenders and attackers, where threat actors inadvertently leave behind traces when setting up and maintaining their cyber attack infrastructure. These reused, rotated, and shared portions of infrastructure can be critical for cybersecurity professionals to uncover new attack vectors by pivoting on known indicators, also known as Indicators of Compromise (IOCs). In response to this cyclical cat-and-mouse game, Palo Alto Networks has introduced an automated, proactive method using Graph Neural Networks (GNN) to efficiently discover malicious infrastructure. By examining real-world case studies such as postal services phishing, a credit card skimmer campaign, and financial services phishing, Palo Alto Networks demonstrates how defenders can proactively monitor and identify evolving threat vectors using automated detection models.

Automated Threat Detection and Infrastructure Mapping

Automated pivoting on known indicators is essential for staying ahead of cyber threats. Using machine learning techniques such as Graph Neural Networks (GNN), cybersecurity defenders can significantly reduce the time needed to detect and block newly emerging malicious infrastructures. These automated methods reveal hidden connections that manual processes might overlook, enabling earlier detection and more robust protection against cyber threats.

To illustrate the benefits and functionality of this approach, several examples and a detailed breakdown of the GNN’s application are provided. For instance, detecting clusters of malicious infrastructure involves using a network crawler that leverages relationships among domains to discover network artifacts surrounding known indicators. These discovered artifacts and infrastructure are then employed to train a GNN model, which subsequently detects further malicious domains.

Pivoting on known infrastructure is another essential aspect, where various indicators like co-hosted domains, malware delivery URLs, and command-and-control (C2) domains are analyzed to uncover new connections. Additional pivots such as SSL/TLS certificates and phishing kits help discover associated domains and IP addresses. The power of GNN models lies in their ability to identify multiple associations between malicious domains, revealing more intricate relationships than shared hosting alone can indicate. By training a GNN classifier on an enriched graph of network artifacts, the model can accurately detect new domains with high confidence.

Postal Service Phishing Campaign

Spanning several countries, this particular campaign targeted postal services via domains registered and rotated through similar hosting infrastructure. Over the past year, approximately 4,000 domains hosted on around 1,200 IP addresses were linked to this campaign. The attackers’ strategy involved using short windows of live activity to evade detection, often employing a fast-flux pattern and frequently reusing specific IP addresses.

The attackers cleverly mimicked legitimate postal services by registering domains that made it challenging for users to discern between real and fake sites. By frequently rotating these domains, they managed to stay under the radar of traditional detection methods. However, the Graph Neural Network (GNN) model identified patterns and connections that led to the discovery of the entire malicious infrastructure. This proactive detection enabled cybersecurity professionals to mitigate the threats posed by such campaigns more effectively.

Credit Card Skimmer Campaign

Attackers in the credit card skimmer campaign compromised legitimate commercial websites, implanting malicious JavaScript (skimmers) to exfiltrate stolen data to attacker-controlled endpoints. The detection of these skimmers prompted an investigation via GNN, uncovering a larger infrastructure network dating back to 2022. This discovery included 65 domains and 815 IP addresses, with a noticeable uptick in hosting activities observed in early 2024.

The analysis of connections between compromised sites and the endpoints receiving the stolen data highlighted the attackers’ ability to infiltrate legitimate websites and remain undetected for long periods. The GNN model’s detailed mapping of the entire network of malicious infrastructure facilitated quicker response and mitigation efforts. By proactively identifying these connections, defenders could dismantle the infrastructure behind the skimmer campaign and reduce its impact on affected businesses and individuals.

Financial Services Phishing Campaign

The financial services phishing campaign targeted banking and financial services worldwide, utilizing 5,000 domains hosted on over 5,600 IP addresses from October 2023 to 2024. Attackers employed sophisticated techniques to spoof financial organization web pages, harvesting personal and financial data. A recurring theme in this campaign was the use of shared hosting infrastructure, with dozens of domains detected daily targeting a broad range of global financial services.

By leveraging shared hosting infrastructure, attackers efficiently launched large-scale phishing attacks with minimal effort. The GNN model’s real-time detection capabilities were paramount in identifying these shared infrastructures and uncovering new malicious domains. This proactive approach was pivotal in mitigating the threats posed by such campaigns, ensuring better protection for financial institutions and their customers.

The financial services phishing campaign demonstrated the attackers’ use of advanced techniques to impersonate legitimate financial institutions. By analyzing shared hosting infrastructure, the GNN model could identify patterns among malicious domains, swiftly reacting to new threats as they emerged. This capability allowed defenders to safeguard financial services against widespread phishing attacks, reducing potential data breaches and financial losses.

Detailed Analysis of Infrastructures and Detection Trends

The detailed analysis of the three case studies reveals several key trends and methods of detection in cybersecurity. One significant trend is the frequent reuse and sharing of hosting infrastructures, domains, and IP addresses by threat actors to conduct expansive campaigns. This reuse creates detectable patterns that automated systems like GNN can identify, improving the chances of early detection.

Fast-flux techniques are another common method used in many campaigns, where short-lived domains rotate rapidly to evade detection. The GNN model’s ability to detect these fast-flux patterns is crucial for timely detection and mitigation. Additionally, attackers’ automated setup processes inadvertently create detectable patterns that defenders using machine learning models can leverage.

By identifying these patterns, the GNN model can uncover the entire malicious infrastructure supporting cyber attack campaigns. Proactive monitoring and detection are essential for detecting and mitigating harmful activities before they cause significant damage. The case studies highlight how cybersecurity defenders can improve their response capabilities by continuously monitoring for new threats and employing advanced machine learning techniques.

Through these advancements, cybersecurity defenders equipped with sophisticated tools like GNN can more effectively counter large-scale, automated cyberattacks, translating to improved protection for organizations and individuals.

Graph Neural Networks and Machine Learning in Cybersecurity

Graph Neural Networks (GNN) play a critical role in detecting correlations and infrastructure that traditional manual methods might miss. By leveraging multiple data points, such as shared hosting providers and malware distributions, GNN models chart associations between malicious domains more effectively than single correlation points can manage. Enriching the graph nodes with distinctive features allows for a more profound understanding of malware campaign infrastructures.

Classifiers trained on these enriched graphs enable ongoing, automated detection of new indicators, highlighting the efficiency and scalability of this approach. As the GNN model continuously learns from the data it processes, its detection capabilities improve over time, keeping cybersecurity defenders one step ahead of evolving threats. This adaptability is crucial in a landscape where attackers are constantly changing tactics to evade detection.

The combination of machine learning and GNN models offers a powerful toolset for cybersecurity professionals. By automating the detection process and leveraging advanced analytics, defenders can identify malicious infrastructure more quickly and accurately, reducing the time and resources needed for manual analysis. This approach not only enhances threat detection capabilities but also improves the overall security posture of organizations.

Protection and Response Measures

In order to shield against the threats discussed, Palo Alto Networks emphasizes the deployment of proactive measures to enhance cybersecurity defenses. Advanced URL filtering and DNS security capabilities encompass proactive threat-hunting functions designed to rapidly uncover malicious URL infrastructures. These tools provide an additional layer of protection by identifying and blocking harmful URLs before they can cause significant damage.

In conjunction with advanced URL filtering, advanced WildFire technology covers associated malware samples and further IOCs, enhancing protective measures. By analyzing malware samples and identifying additional indicators of compromise, defenders can strengthen their defenses and reduce the risk of successful cyber attacks. This comprehensive approach ensures that potential threats are addressed from multiple angles, improving overall security.

Collaboration through the Cyber Threat Alliance (CTA) further bolsters protection efforts by sharing findings with CTA members. This coordinated approach enables a more systematic response to cyber threats, disabling malicious infrastructures more effectively. By working together, cybersecurity professionals can improve their collective defenses and minimize the impact of cyber attacks on organizations and individuals.

With these protection and response measures in place, defenders are better equipped to identify and mitigate threats posed by advanced cyber attack campaigns. By leveraging advanced technologies and fostering collaboration within the cybersecurity community, organizations can enhance their security posture and better protect their assets from malicious actors.

Conclusion

The escalating complexity of cyber threats necessitates advanced, automated detection measures like those described in the article. By employing Graph Neural Network (GNN) models to analyze known indicators, defenders can stay a step ahead of attackers, discovering and disrupting malicious infrastructure before it becomes operational. The case studies presented reinforce the importance of proactive monitoring and detection facilitated by machine learning techniques, underlining the persistent nature of cyber threats and the continuous evolution of attackers’ tactics.

Through these advancements, cybersecurity defenders equipped with sophisticated tools like GNN are able to counter large-scale, automated cyberattacks more effectively. This leads to improved protection for organizations and individuals, highlighting the crucial role of machine learning and automation in modern cybersecurity strategies. By staying vigilant and continuously adapting to new threats, defenders can ensure the safety and security of their digital assets in an ever-evolving threat landscape.