Boosting URL detection with syntactic features in spam emails

Friday 26 September 11:00 - 11:30, Red room

Antonia Scherz (Net at Work)

Traditional URL blocklisting methods often miss automated techniques used in spam URL generation. For example, attackers may use URLs with newly registered domains and similar paths or query parameters that often differ only by the domain and a random string. This dynamic variation of domains frequently evades detection. However, using syntactic patterns to identify spam URLs directly addresses this issue and classifies many spam domains at once.

In this work, we present a practical approach to identifying structural patterns in URLs generated by automation. Our goal is to enhance URL detection systems and reduce the risk of phishing, malware, and spam threats. The presented technique complements existing detection methods, which mainly focus on character-level (e.g. frequencies of special characters) or content-based features (e.g. suspicious domain names, email body text). Our multi-step approach enables us to link seemingly dissimilar URLs to the same spam campaign origin, even when traditional detection techniques fail.

We propose a four-step procedure to include syntactic pattern recognition:

  1. To extract syntactic patterns from the URLs using regex representation
  2. To pre-filter syntactic patterns based on similarity in n-gram usage
  3. To cluster prefiltered patterns based on Locality Sensitive Hashing and DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
  4. To classify URLs in each cluster as malicious or benign based on cluster information and additional (traditional) email metadata.

We apply this technique to a large corpus of URLs extracted from email bodies, demonstrating where syntactic features excel and fail. Our results show that combining URL structural analysis with contextual email metadata – such as sender reputation and email headers – improves the robustness of our current spam email detection systems, as validated on a dataset of over 1 million emails. This approach also provides additional signals to identify suspicious campaigns that evade traditional methods. By leveraging these insights, organizations can more effectively detect, mitigate, and adapt to evolving spam and phishing threats.

 


 

Antonia Scherz

Antonia Scherz is a machine learning specialist with a focus on AI-driven security systems. As a data scientist at Net at Work, she develops and optimizes machine learning models for NoSpamProxy, enhancing spam detection and defending against phishing and malware campaigns. Previously, she led the development of AI security applications for public sector clients. Her expertise includes spam detection, differential privacy, and synthetic data.

Back to VB2025 Programme page

Back to VB2025 conference page

Other VB2025 papers

Silent killers: unmasking a large-scale legacy driver exploitation campaign

VB2025 presentation: Silent killers: unmasking a large-scale legacy driver exploitation campaign, Jiří Vinopal

Everyday tools, extraordinary crimes: the ransomware exfiltration playbook

VB2025 presentation: Everyday tools, extraordinary crimes: the ransomware exfiltration playbook, María José Erquiaga, Darin Smith, Robert Harris, Raymond McCormick & Josh Pyorre

Practical AWS antiforensics

VB2025 presentation: Practical AWS antiforensics, Santiago Abastante

The Wolf of Wall Steal: inside crypto traffer group operations

VB2025 presentation: The Wolf of Wall Steal: inside crypto traffer group operations, Anna Pham & Joan Garcia

Demystifying the Playboy RaaS

VB2025 presentation: Demystifying the Playboy RaaS, Gijs Rijnders

Evading in plain sight: how adversaries beat user-mode protection engines for over a decade

VB2025 presentation: Evading in plain sight: how adversaries beat user-mode protection engines for over a decade, Omri Misgav

From Latin America to the world: ransomware TTPs, prolonged intrusions, and regional adaptation

VB2025 presentaiton: From Latin America to the world: ransomware TTPs, prolonged intrusions, and regional adaptation, Isabel Manjarrez

Tracking the IoT botnet's bloodline: code footprints don’t lie

VB2025 presentation: Tracking the IoT botnet's bloodline: code footprints don’t lie, Chanbin Jeon, ChangGyun Kim & SeungBeom Lim

Invisible thieves in the front yard -- from an advanced evasive edge-device attack to potential mitigation methods

VB2025 presentation: Invisible thieves in the front yard -- from an advanced evasive edge-device attack to potential mitigation methods, Ting-Wei Hsieh

Google Calendar as C2 infrastructure: a China-nexus campaign with stealthy tactics

VB2025 presentation: Google Calendar as C2 infrastructure: a China-nexus campaign with stealthy tactics, Tim Chen & Still Hsu

Goodbye loaders, hello RMM: the rise of legit software in ecrime campaigns

VB2025 presentation: Goodbye loaders, hello RMM: the rise of legit software in ecrime campaigns, Selena Larson & Ole Villadsen

Silent Lynx: uncovering a cyber espionage campaign in Central Asia

VB2025 presentation: Silent Lynx: uncovering a cyber espionage campaign in Central Asia, Subhajeet Singha & Sathwik Ram Prakki

The dark prescription: inside the infrastructure of illegal online pharmacies

VB2025 presentation: The dark prescription: inside the infrastructure of illegal online pharmacies, Martin Chlumecky & Lubos Bever

Panel: Tales from the Old West

VB2025 presentation: Panel: Tales from the Old West, Righard Zwienenberg, Jan Hruska, Pavel Baudis & Tjark Auerbach

Unmasking the GrassCall campaign: the hackers behind job recruitment cyber scams

VB2025 presentation: Unmasking the GrassCall campaign: the hackers behind job recruitment cyber scams, Dixit Panchal & Soumen Burma

Cracked by the GRU: how Russia’s notorious Sandworm unit weaponizes pirated software usage to target Ukraine

VB2025 presentation: Cracked by the GRU: how Russia’s notorious Sandworm unit weaponizes pirated software usage to target Ukraine, Arda Büyükkaya

Hunting potential C2 commands in Android malware via Smali string comparison and control flow analysis

VB2025 presentation: Hunting potential C2 commands in Android malware via Smali string comparison and control flow analysis, JunWei Song

Vo1d rising: inside the botnet controlling 1.68 M+ Android TVs worldwide

VB2025 presentation: Vo1d rising: inside the botnet controlling 1.68 M+ Android TVs worldwide, Alex Turing

Arachnid alert: Latrodectus loader crawls through defences

VB2025 presentation: Arachnid alert: Latrodectus loader crawls through defences, Albert Zsigovits

When avatars come alive: understanding hybrid threat actors

VB2025 presentation: When avatars come alive: understanding hybrid threat actors, Itay Cohen & Omer Benjakob

Inside Akira ransomware's Rust experiment

VB2025 presentation: Inside Akira ransomware's Rust experiment, Ben Herzog

Rogue hirer, rogue hiree: workplace cyber threats to individuals and businesses

VB2025 presentation: Rogue hirer, rogue hiree: workplace cyber threats to individuals and businesses, Chris Boyd

You definitely don’t want to CopyPaste this: FakeCaptcha ecosystem

VB2025 presentation: You definitely don’t want to CopyPaste this: FakeCaptcha ecosystem, Dmitrij Lenz & Roberto Dasilva

The Phantom Circuit: the Lazarus Group’s evolution in supply chain compromise

VB2025 presentation: The Phantom Circuit: the Lazarus Group’s evolution in supply chain compromise, Ryan Sherstobitoff

From p0f to JA4+: modern network fingerprinting for real-world defence

VB2025 paper: From p0f to JA4+: modern network fingerprinting for real-world defence, Vlad Iliushin

DeceptiveDevelopment and North Korean IT workers: from primitive crypto theft to sophisticated AI-based deception

VB2025 presentation: DeceptiveDevelopment and North Korean IT workers: from primitive crypto theft to sophisticated AI-based deception, Matej Havranek

Deep dive into the abuse of DL APIs to create malicious AI models and how to detect them

VB2025 presentation: Deep dive into the abuse of DL APIs to create malicious AI models and how to detect them, Mohamed Nabeel & Alex Starov

Vietnamese hacking group: a rising of information stealing campaigns going global

VB2025 presentation: Vietnamese hacking group: a rising of information stealing campaigns going global, Chetan Raghuprasad & Joey Chen

Stealth over TLS: the emergence of ECH-based C&C in ECHidna malware

VB2025 presentation: Stealth over TLS: the emergence of ECH-based C&C in ECHidna malware, Yuta Sawabe & Rintaro Koike

Prediction of future attack indicators based on the 2024 analysis of threats from malicious app distribution sites in South Korea

VB2025 presentation: Prediction of future attack indicators based on the 2024 analysis of threats from malicious app distribution sites in South Korea, Kyung Rae Noh, Shinho Lee, Eui-Tak Kim, Yujin Shim, Jonghwa Han & Jung-Sik Cho

Unmasking the unseen: a deep dive into modern Linux rootkits and their detection

VB2025 presentation: Unmasking the unseen: a deep dive into modern Linux rootkits and their detection, Ruben Groenewoud & Remco Sprooten

Boosting URL detection with syntactic features in spam emails

VB2025 presentation: Boosting URL detection with syntactic features in spam emails, Antonia Scherz

Dissecting evil twin RATs: tracking the long-term use of TA410's FlowCloud toolset

VB2025 presentation: Dissecting evil twin RATs: tracking the long-term use of TA410's FlowCloud toolset, Hiroshi Takeuchi

Unmasking TAG-124: dissecting a prevalent traffic distribution system in the cybercriminal ecosystem

VB2025 presentation: Unmasking TAG-124: dissecting a prevalent traffic distribution system in the cybercriminal ecosystem, Julian-Ferdinand Vögele

The Bitter end: unravelling 8 years of APT antics

VB2025 presentation: The Bitter end: unravelling 8 years of APT antics, Abdallah Elshinbary, Nick Attfield, Konstantin Klinger & Jonas Wagner

Grandoreiro: sounds like a Clint Eastwood movie but it's not

VB2025 presentation: Grandoreiro: sounds like a Clint Eastwood movie but it's not, Thibault Seret

The attribution story of WhisperGate: an academic perspective

VB2025 presentation: The attribution story of WhisperGate: an academic perspective, Alexander Adamov

Emmenhtal Loader: the silent enabler of modern malware campaigns

VB2025 presentation: Emmenhtal Loader: the silent enabler of modern malware campaigns, Lovely Antonio, Ricardo Pineda & Louis Sorita

Sophistication or missed opportunity? Analysing XE Group’s long-term exploitation of zero-days with limited impact

VB2025 presentation: Sophistication or missed opportunity? Analysing XE Group’s long-term exploitation of zero-days with limited impact, Justin Lentz & Nicole Fishbein

Attacker identity revealed: insights from rogue VMs & BYOVD in EDR evasion

VB2025 presentation: Attacker identity revealed: insights from rogue VMs & BYOVD in EDR evasion, Navin Thomas, Renzon Cruz & Cuong Dinh

Living in the hypervisor: defeating anti-[VM, sandbox, analysis] via patching hypervisor

VB2025 presentation: Living in the hypervisor: defeating anti-[VM, sandbox, analysis] via patching hypervisor, Kağan Işıldak

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.