How accurate is this?

Most scam tools just say 'trust the AI'. We don't think that's good enough. So we measured our detector against a labeled set of real-world-style messages and published exactly how it performs, the wins and the misses.

The single biggest tell

Manufactured urgency is 11.20932710433571x more common in scams than in genuine messages.

Across thousands of real messages, this one signal separated flagged messages from genuine ones better than any other, a single, provable number.

Appeared in 11% of scams vs 1% of real messages.

Signals that almost never appear in a legitimate message

In this dataset, these tactics showed up only in flagged messages, which is why we treat them as strong warning signs.

Suspicious link, 3% of scams, 0% of real messages

How often it's right

The live AI detector, scored on real messages from a public dataset. We report every number, including the misses.

Precision: 91%
Recall: 95%
F1 score: 0.93
Accuracy: 93%

The full breakdown

Scams correctly caught

Real messages correctly cleared

False alarms (real messages we flagged)

Scams we missed

Does confidence mean anything?

For each confidence level, here's the share of messages that were actually scams. Higher confidence should mean a higher real scam rate.

When we said…	…this share were really scams	messages
high	46%	126
medium	71%	24

Every signal, ranked

How often each tactic appeared in scams versus genuine messages.

Tactic	In scams	In real messages	How much more common in scams
Manufactured urgency	11%	1%	11.20932710433571x
Too-good reward	9%	0%	445.86746987951807x
Suspicious link	3%	0%	scams only
Irreversible payment demand	0%	0%	scams only
Account threat	0%	0%	scams only
Guaranteed returns	0%	0%	scams only
Asking for a code	0%	0%	0x
Sworn to secrecy	0%	0%	0x
Impersonating authority	0%	0%	0x

Method, in plain terms

1We use a public dataset of thousands of real messages, each labeled as a flagged (spam/scam) message or a legitimate one.
2We run the same detector that powers the live tool over each one and compare its verdict to the true label.
3We count the hits and misses to get precision, recall, and calibration, and we publish all of it.
4We measure how often each manipulation tactic appears in flagged versus legitimate messages to find the strongest single signal.

Where we're honest about the limits

•We test against the public SMS Spam Collection. Its 'spam' class is broader than high-harm scams (it includes old-style marketing), so this is a tough, honest test rather than a flattering one.
•The headline accuracy comes from the live AI detector scored on a stratified sample; the full dataset is also checked by our transparent rule-based baseline.
•The dataset is English-only. Scams target every language, and our live analysis now responds in several.
•No detector is perfect. This is a second opinion, always verify anything important through an official channel too.

Why we show our work

What this is

An honest scorecard. We run our detector over messages we've labeled as 'scam' or 'legitimate', then report how often it gets them right.

Why it matters

A verdict you can't verify is just an opinion. Showing real numbers, including our mistakes, is the difference between a demo and a product you can actually rely on.

Where the numbers come from

A transparent, hand-labeled starter set, scored by the same engine that powers the live checker. The pipeline is reproducible and dataset-agnostic, so it can be re-run on large public benchmarks too.

How to read it

Higher precision means fewer false alarms; higher recall means fewer scams missed. Calibration shows whether 'high confidence' really does mean a higher chance of a scam.

SMS Spam Collection (spam + ham) ↗FBI IC3 2024 Internet Crime Report ↗

These numbers were generated by our benchmark on 17 June 2026 · SMS Spam Collection (UCI), 5,574 real SMS messages (5,574 messages, 747 flagged-class) · scored by the AI engine on a 150-message stratified sample