The call came in at 11pm. The voice on the other end was her son’s, unmistakably, right down to the way he says “Mum” with a slight upward lilt and the faint background noise of what sounded like a car interior. He had been in an accident. He needed $2,400 transferred immediately to pay for a tow and cover a fine before he could leave. He would explain everything in the morning.
Her son was asleep in his bedroom down the hall. She had just been scammed out of $2,400 using a voice model built from his TikTok videos.
This specific fraud, the grandparent scam or family emergency scam using AI-cloned voices, is not a hypothetical. It has been documented in the United States, Canada, the UK, and Australia, and it is accelerating. The technology required to build a convincing voice clone has gone from weeks of work requiring thousands of audio samples to a matter of minutes using freely available tools and a few minutes of publicly posted audio.
How Easy It Is Now
The uncomfortable truth is that building a working voice clone of almost anyone with a public social media presence requires no technical expertise and no specialised hardware. Services like ElevenLabs, which markets itself to podcasters and content creators, can produce a passable voice clone from under a minute of audio. The quality improves with more samples, but the basic capability is accessible to anyone with an internet connection and twenty dollars a month.
Criminal groups operating out of Eastern Europe, West Africa, and Southeast Asia have industrialised this. They are not sitting in their bedrooms making one-off calls. They are running operations with scripts, voice models organised by target demographics, and quality control processes. The $2,400 call described above is not an outlier; it is a product in a criminal supply chain.
The Phone Cannot Tell You Who Is Calling
Part of what makes this so effective is that it exploits a fundamental trust mechanism that most people have never examined: the assumption that a voice you recognise belongs to the person it sounds like. This was a reasonable assumption for most of human history. It is no longer a safe one.
Caller ID is easily spoofed and provides no real verification of identity. The voice itself, which used to be the backup verification, is now reproducible by machine. There is currently no reliable technical countermeasure available to ordinary people for distinguishing a real phone call from a convincing clone.
The FBI’s advice is to establish a family safe word — a word or phrase agreed upon in advance that anyone can use to verify identity in a suspicious call. It is analogue, old-fashioned, and genuinely effective. Use it.
The Liability Gap
The companies that built the voice cloning tools occupy a strange position in this. ElevenLabs and its competitors have terms of service prohibiting malicious use, filters that attempt to detect public figures’ voices and block cloning them, and consent mechanisms built into their professional tools. These measures are largely performative against determined bad actors and essentially invisible to the criminal groups actually using this at scale, who are operating through accounts with fake identities using stolen payment details.
There is no legal framework that currently holds voice AI companies liable for downstream fraud. There is active lobbying against creating one. The companies argue, not unreasonably, that they cannot be responsible for misuse of their technology any more than a knife manufacturer is responsible for a stabbing. The counterargument is that the technology is more analogous to a weapon than a knife, and that building it without meaningful safeguards while knowing the fraud use cases is a choice, not an inevitability.
What to Do Right Now
Create a family safe word. Tell your elderly relatives about voice cloning. Call back on a known number before sending any money. These are not foolproof, but they raise the cost of the attack enough that most operations will move on to easier targets.
The deeper problem is that we are moving into a world where you cannot trust that the voice you hear is the person you think it is, and we have no infrastructure for dealing with that. The safe word is a stopgap. The solution requires a rethinking of how identity is verified over any remote channel, and nobody in a position of authority is moving fast enough on it.