Engineering · Apr 18, 2026 · 6 min read

How Voipy Shield detects deepfake calls in under a second

The first deepfake call we intercepted in production was a cloned voice of a resident's grandson. He'd posted three minutes of himself on Instagram talking about a lacrosse game. That was enough. The attacker ran it through a voice-cloning model, called the resident, and asked to be wired bail money because he'd been arrested.

The grandmother in question is 82 and lives at a senior community in Pasadena. Voipy Shield blocked the call. In this post I'll walk through how.

The two-model pipeline

Every inbound call runs through two models in parallel:

Transcription model. Realtime speech-to-text, timestamped per word. Cheap, well-understood. Feeds the next stage.
Voice-authenticity model. Fine-tuned on synthetic-voice detection — specifically on the popular open-source models used by attackers (ElevenLabs API, Tortoise, RVC). Outputs a confidence score 0..1 that the voice is synthesized.

If the authenticity confidence drops below 0.4 at any point in the first 10 seconds, we pause the call and ping staff for review. No live resident conversation happens with a voice we think is fake.

What stops a false positive

Early in testing our bar was 0.2 and we were auto-blocking legitimate calls from residents' family members whose phones had poor audio. Nothing destroys trust in an elder-safety product faster than blocking grandma's actual grandson.

Two changes fixed this:

Ambiguity → escalation, not block. The AI defers to the safer path. Below 0.4 confidence, we don't auto-disconnect. We flag the call, ring the staff line, and give staff a 15-second window to override before the caller is disconnected.
Trusted-caller allowlist. Families register their primary callers in advance; those numbers skip the authenticity check unless the call pattern is actively suspicious (unexpected hour, unusual duration, etc.).

Why this matters right now

Voice-cloning used to require an hour of source audio, a desktop GPU, and some ML experience. In 2026 it needs 30 seconds of any public video, a web form, and $5. The pool of attackers went from 50,000 to 50 million overnight. If your elder-care facility doesn't have a screening layer, you are the control group in a multi-year natural experiment, and I do not recommend it.

What's next

We're rolling out a family-facing weekly digest — a plaintext email to every registered family member listing what we blocked on that resident's line last week, categorized by threat type. This is the kind of report that turns a security product from "thank god we have this" to "look at what this saved us from" — the difference between insurance you forget about and a tool you recommend to every peer.

If you run a senior-living facility and want Shield on your line: start a 14-day trial — no card charged until day 15.