Boosting URL detection with syntactic features in spam emails

Friday 26 September 11:00 - 11:30, Red room

Antonia Scherz (Net at Work)

Traditional URL blocklisting methods often overlook recurring patterns from automated URL generation algorithms prevalent in high volume spam campaigns. Attackers register new domains highly frequently reusing shared path structures or query parameters, which vary only slightly, for example by domain name or embedded random strings. Superficial variations and fast domain switching evades detection fairly often when systems rely solely on domain-level or character-based features. We demonstrate the potential of recurring syntactic patterns and derived features that capture URL similarities across spam campaigns. The approach offers a more robust detection strategy and classifies related malicious domains immediately for URLs with sufficient string lengths. If applicable, it increases detection coverage and improves resilience against obfuscation techniques commonly used in large-scale spam and phishing operations. Our syntax-based detection complements existing detection methods that focus on character-level (e.g. frequencies of special characters) or content-based features (e.g. suspicious domain names, email body text).

We propose a four-step procedure to include syntactic pattern recognition:

extract syntactic patterns from the URLs using regex representation,
identify similarity in n-gram usage within pre-filter syntactic patterns,
refine classification labels for URLs and domains based on n-grams from detected domains,
classify URLs in each cluster as malicious or benign based on cluster information and additional (traditional) email metadata.

This multi-step approach links seemingly dissimilar URLs to the same spam campaign origin, even when traditional detection techniques fail.

We apply this technique to a large corpus of URLs extracted from email bodies, demonstrating where syntactic features excel and fail. Our results show that combining URL structural analysis with contextual email metadata – such as sender reputation and email headers – improves the robustness of our current spam email detection systems, as validated on a dataset of over 1 million emails. This approach also provides additional signals to identify suspicious campaigns that evade traditional methods.

Antonia Scherz

Antonia Scherz is a machine learning specialist with a focus on AI-driven security systems. As a data scientist at Net at Work, she develops and optimizes machine learning models for NoSpamProxy, enhancing spam detection and defending against phishing and malware campaigns. Previously, she led the development of AI security applications for public sector clients. Her expertise includes spam detection, differential privacy, and synthetic data.

antonia-scherz-7b4740178

Back to VB2025 Programme page

Back to VB2025 conference page

Other VB2025 papers

Silent killers: unmasking a large-scale legacy driver exploitation campaign

VB2025 presentation: Silent killers: unmasking a large-scale legacy driver exploitation campaign, Jiří Vinopal

Practical AWS antiforensics

VB2025 presentation: Practical AWS antiforensics, Santiago Abastante

The Wolf of Wall Steal: inside crypto traffer group operations

VB2025 presentation: The Wolf of Wall Steal: inside crypto traffer group operations, Anna Pham & Joan Garcia

Demystifying the Playboy RaaS

VB2025 presentation: Demystifying the Playboy RaaS, Gijs Rijnders

Evading in plain sight: how adversaries beat user-mode protection engines for over a decade

VB2025 presentation: Evading in plain sight: how adversaries beat user-mode protection engines for over a decade, Omri Misgav

From Latin America to the world: ransomware TTPs, prolonged intrusions, and regional adaptation

VB2025 presentaiton: From Latin America to the world: ransomware TTPs, prolonged intrusions, and regional adaptation, Isabel Manjarrez

Tracking the IoT botnet's bloodline: code footprints don’t lie

VB2025 presentation: Tracking the IoT botnet's bloodline: code footprints don’t lie, Chanbin Jeon, ChangGyun Kim & SeungBeom Lim

Invisible thieves in the front yard -- from an advanced evasive edge-device attack to potential mitigation methods

VB2025 presentation: Invisible thieves in the front yard -- from an advanced evasive edge-device attack to potential mitigation methods, Ting-Wei Hsieh

Google Calendar as C2 infrastructure: a China-nexus campaign with stealthy tactics

VB2025 presentation: Google Calendar as C2 infrastructure: a China-nexus campaign with stealthy tactics, Tim Chen & Still Hsu

Goodbye loaders, hello RMM: the rise of legit software in ecrime campaigns

VB2025 presentation: Goodbye loaders, hello RMM: the rise of legit software in ecrime campaigns, Selena Larson & Ole Villadsen

Silent Lynx: uncovering a cyber espionage campaign in Central Asia

VB2025 presentation: Silent Lynx: uncovering a cyber espionage campaign in Central Asia, Subhajeet Singha & Sathwik Ram Prakki

The dark prescription: inside the infrastructure of illegal online pharmacies

VB2025 presentation: The dark prescription: inside the infrastructure of illegal online pharmacies, Martin Chlumecky & Lubos Bever

Panel: Tales from the Old West

VB2025 presentation: Panel: Tales from the Old West, Righard Zwienenberg, Jan Hruska, Pavel Baudis & Tjark Auerbach

Unmasking the GrassCall campaign: the hackers behind job recruitment cyber scams

VB2025 presentation: Unmasking the GrassCall campaign: the hackers behind job recruitment cyber scams, Dixit Panchal & Soumen Burma

Cracked by the GRU: how Russia’s notorious Sandworm unit weaponizes pirated software usage to target Ukraine

VB2025 presentation: Cracked by the GRU: how Russia’s notorious Sandworm unit weaponizes pirated software usage to target Ukraine, Arda Büyükkaya

Hunting potential C2 commands in Android malware via Smali string comparison and control flow analysis

VB2025 presentation: Hunting potential C2 commands in Android malware via Smali string comparison and control flow analysis, JunWei Song

Vo1d rising: inside the botnet controlling 1.68 M+ Android TVs worldwide

VB2025 presentation: Vo1d rising: inside the botnet controlling 1.68 M+ Android TVs worldwide, Alex Turing

Arachnid alert: Latrodectus loader crawls through defences

VB2025 presentation: Arachnid alert: Latrodectus loader crawls through defences, Albert Zsigovits

When avatars come alive: understanding hybrid threat actors

VB2025 presentation: When avatars come alive: understanding hybrid threat actors, Itay Cohen & Omer Benjakob

Inside Akira ransomware's Rust experiment

VB2025 presentation: Inside Akira ransomware's Rust experiment, Ben Herzog

Rogue hirer, rogue hiree: workplace cyber threats to individuals and businesses

VB2025 presentation: Rogue hirer, rogue hiree: workplace cyber threats to individuals and businesses, Chris Boyd

You definitely don’t want to CopyPaste this: FakeCaptcha ecosystem

VB2025 presentation: You definitely don’t want to CopyPaste this: FakeCaptcha ecosystem, Dmitrij Lenz & Roberto Dasilva

The Phantom Circuit: the Lazarus Group’s evolution in supply chain compromise

VB2025 presentation: The Phantom Circuit: the Lazarus Group’s evolution in supply chain compromise, Ryan Sherstobitoff

DeceptiveDevelopment and North Korean IT workers: from primitive crypto theft to sophisticated AI-based deception

VB2025 presentation: DeceptiveDevelopment and North Korean IT workers: from primitive crypto theft to sophisticated AI-based deception, Matej Havranek

Deep dive into the abuse of DL APIs to create malicious AI models and how to detect them

VB2025 presentation: Deep dive into the abuse of DL APIs to create malicious AI models and how to detect them, Mohamed Nabeel & Alex Starov

Vietnamese hacking group: a rising of information stealing campaigns going global

VB2025 presentation: Vietnamese hacking group: a rising of information stealing campaigns going global, Chetan Raghuprasad & Joey Chen

Stealth over TLS: the emergence of ECH-based C&C in ECHidna malware

VB2025 presentation: Stealth over TLS: the emergence of ECH-based C&C in ECHidna malware, Yuta Sawabe & Rintaro Koike

Prediction of future attack indicators based on the 2024 analysis of threats from malicious app distribution sites in South Korea

VB2025 presentation: Prediction of future attack indicators based on the 2024 analysis of threats from malicious app distribution sites in South Korea, Kyung Rae Noh, Shinho Lee, Eui-Tak Kim, Yujin Shim, Jonghwa Han & Jung-Sik Cho

Unmasking the unseen: a deep dive into modern Linux rootkits and their detection

VB2025 presentation: Unmasking the unseen: a deep dive into modern Linux rootkits and their detection, Ruben Groenewoud & Remco Sprooten

Boosting URL detection with syntactic features in spam emails

VB2025 presentation: Boosting URL detection with syntactic features in spam emails, Antonia Scherz

Dissecting evil twin RATs: tracking the long-term use of TA410's FlowCloud toolset

VB2025 presentation: Dissecting evil twin RATs: tracking the long-term use of TA410's FlowCloud toolset, Hiroshi Takeuchi

Unmasking TAG-124: dissecting a prevalent traffic distribution system in the cybercriminal ecosystem

VB2025 presentation: Unmasking TAG-124: dissecting a prevalent traffic distribution system in the cybercriminal ecosystem, Julian-Ferdinand Vögele

The Bitter end: unravelling 8 years of APT antics

VB2025 presentation: The Bitter end: unravelling 8 years of APT antics, Abdallah Elshinbary, Nick Attfield, Konstantin Klinger & Jonas Wagner

The attribution story of WhisperGate: an academic perspective

VB2025 presentation: The attribution story of WhisperGate: an academic perspective, Alexander Adamov

Emmenhtal Loader: the silent enabler of modern malware campaigns

VB2025 presentation: Emmenhtal Loader: the silent enabler of modern malware campaigns, Lovely Antonio, Ricardo Pineda & Louis Sorita

Sophistication or missed opportunity? Analysing XE Group’s long-term exploitation of zero-days with limited impact

VB2025 presentation: Sophistication or missed opportunity? Analysing XE Group’s long-term exploitation of zero-days with limited impact, Justin Lentz & Nicole Fishbein

Attacker identity revealed: insights from rogue VMs & BYOVD in EDR evasion

VB2025 presentation: Attacker identity revealed: insights from rogue VMs & BYOVD in EDR evasion, Navin Thomas, Renzon Cruz & Cuong Dinh

Living in the hypervisor: defeating anti-[VM, sandbox, analysis] via patching hypervisor

VB2025 presentation: Living in the hypervisor: defeating anti-[VM, sandbox, analysis] via patching hypervisor, Kağan Işıldak

PepsiDog: inside the rise of a professional Chinese phishing actor

VB2025 presentation: PepsiDog: inside the rise of a professional Chinese phishing actor, Stefan Tanase & Ionut Bucur

Code Red: How KnowBe4 exposed a North Korean IT infiltration scheme

VB2025 keynote presentation: Code Red: How KnowBe4 exposed a North Korean IT infiltration scheme, Martin Kraemer

TIPS: Smashing smishing by quashing quishing

VB2025 TIPS presentation: Smashing smishing by quashing quishing, Andrew Brandt

TIPS: Collective intelligence in OT cybersecurity: transforming threat insights into proactive defence

VB2025 TIPS presentation: Collective intelligence in OT cybersecurity: transforming threat insights into proactive defence, AJ Eserjose

TIPS: The battlegrounds are moving faster than we are - can we turn this oil-tanker on a dime?

VB2025 TIPS presentation: The battlegrounds are moving faster than we are - can we turn this oil-tanker on a dime? Tim West

TIPS: How MITRE is AI, anyway?

VB2025 TIPS presentation: How MITRE is AI, anyway? Samir Mody

TIPS: Fireside chat: The tortured “cybersecurity” poets department

VB2025 TIPS presentation: Fireside chat: The tortured “cybersecurity” poets department, Cat Self, Jeanette Miller, Jeannette Jarvis, Selena Larson

TIPS: Beyond machine translation: struggles and adaptations of North Korean IT workers in Japan’s crowdsourcing market

VB2025 TIPS presentation: Beyond machine translation: struggles and adaptations of North Korean IT workers in Japan’s crowdsourcing market, Takahiro Kakumaru & Yoshihiro Kori

TIPS: Panel: The wheels on the CVE go round and round: breaking the cycle of vulnerability fatigue

VB2025 TIPS presentation: Panel: The wheels on the CVE go round and round: breaking the cycle of vulnerability fatigue, Righard Zwienenberg, Robin Staa, John Alexander, Geri Revay

TIPS: Stop the flood: building a quality and trust-driven threat intelligence ecosystem

VB2025 TIPS presentation: Stop the flood: building a quality and trust-driven threat intelligence ecosystem, Kihong Kim & SuhMahn Hur

TIPS: Diff’ing the light fantastic – tracking typosquatting and disinformation in a resource-constrained environment

VB2025 TIPS presentation: Diff’ing the light fantastic – tracking typosquatting and disinformation in a resource-constrained environment, James Slaughter

TIPS: From clusters to actors: a practical threat actor attribution framework

VB2025 TIPS presentation: From clusters to actors: a practical threat actor attribution framework, Kyle Wilhoit & Robert Falcone

Cybersecurity 2035: where will we be in 10 years' time?

VB2025 presentation: Cybersecurity 2035: where will we be in 10 years' time?, Paul Ducklin

European PDNS readiness

VB2025 presentation: European PDNS readiness, Viliam Peli, George Buhai

Collaborative response to emerging critical RCE vulnerabilities in exposed edge devices

VB2025 presentation: Collaborative response to emerging critical RCE vulnerabilities in exposed edge devices, Piotr Kijewski

TIPS: Keynote

VB2025 TIPS presentation: Keynote, Gonçalo Ribeiro

Don’t fear journalists! Talk to me! Hacks, exploits & best practices for improving researcher-reporter ties

VB2025 presentation: Don’t fear journalists! Talk to me! Hacks, exploits & best practices for improving researcher-reporter ties, Omer Benjakob