Skip to main content
    GevorderdLes 7 van 9·Phishing & social-engineering e-mail

    Deepfake Voice Phishing in 2026 — When the Voice on the Phone Is Synthetic

    Deepfake voice phishing uses AI-synthesised speech to impersonate a real person — most often a CEO, CFO, parent, child, or other high-trust contact. In 2026 a credible voice clone needs three to thirty seconds of source audio and produces real-time conversational responses indistinguishable from the real speaker on a phone line. The only reliable defence is process: a callback protocol or a shared codeword that no AI model can guess.

    Reviewed by the HackersHub team — updated 13 May 20268 min readVrij te gebruiken — CC-BY-ND 4.0

    Het scenario

    An executive assistant at a 1,800-person Dutch B2B SaaS company gets a phone call Tuesday 11:47. The display says Private number. The voice is unmistakably the CEO's — the same accent, same cadence, same use of 'right?' at the end of every other sentence. He sounds slightly hoarse 'because of the flight back from Singapore' and asks her to fast-track an authorisation for a €420,000 partner payment that was 'agreed last week but the contract is still with legal'. He says he'll forward the supplier's invoice from his personal Gmail because his corporate inbox is rate-limited from the hotel WiFi. The assistant pushes back once: 'should I check with the CFO?' The voice answers, 'No, the CFO knows, we discussed this in the board pre-read — just push it through standard AP and I'll sign off when I'm back tomorrow morning.' The conversational responses are fluid; the pretext is plausible; the urgency is calibrated. She authorises the payment. The CEO has been in Singapore but never called — the attacker scraped 4 minutes of audio from his keynote at Web Summit six weeks earlier, trained a clone in under an hour, and ran the live call through commercial voice-cloning infrastructure that handles real-time prosody. The money moved at 11:53 and the CEO learned about it from the assistant's email Wednesday morning. Less than €60,000 was recovered.

    Hoe de aanval werkt

    Voice cloning crossed the credibility threshold for phone-quality real-time impersonation around 2023. By 2026 the pipeline is commoditised: source audio from any public appearance (podcast, conference talk, earnings call, YouTube video, even social-media livestream) is fed to a fine-tuned voice model that produces real-time conversational audio. Some kits handle live two-way dialogue; others use a short pre-recorded message plus interactive 'live' clips synthesised on the fly. The attacker also handles the channel layer: caller ID is spoofed to match the target's known contact for the impersonated person, background ambience is added to match the claimed setting (airport, taxi, conference), and the attacker uses pretext detail (board meetings, recent travel, internal project names) scraped from LinkedIn, public press releases, leaked breach data, or prior reconnaissance. The combination of voice + caller ID + pretext detail + emotional pressure (urgency, authority, secrecy) defeats most cognitive defences. MITRE ATT&CK techniques: T1566.004 (Spearphishing Voice), T1585.002 (Establish Accounts: Email — for the supporting paper trail), T1656 (Impersonation). The only category of defence that consistently works is process: out-of-band callback to a verified number, shared codeword authentication, two-person rule on financial actions, in-person or verified-video confirmation for high-value or off-process requests.

    Waar je op moet letten

    • Inbound call from a private or unfamiliar number claiming to be a known contact in unusual circumstances ('flight delay', 'new mobile', 'borrowed phone')
    • Urgency framed around financial actions, credential resets, account changes, or sensitive document sharing
    • Caller resisting normal verification channels — 'no time to do callbacks', 'don't loop in finance, this is between us', 'I'll be on a flight in five minutes'
    • Voice prosody that is almost perfect but has subtle artefacts — oddly even breathing, flat affect on emotional words, unnatural pauses between sentences
    • Background ambience that is too clean (perfectly studio-quiet) or that doesn't match the claimed environment
    • Conversational mistakes a real person wouldn't make — incorrect names, wrong recent shared context, slightly off recall of an event you both attended
    • Calls outside normal hours combined with pressure to act before someone else (CFO, manager, partner) becomes aware
    • A coordinated multi-channel pretext: a voice call following an SMS, email, or push notification that softens you up beforehand

    Wat te doen

    1. Hang up and call back on a verified number — every time, regardless of how convincing the voice isUse the number in your phone contacts, your CRM, your contracts file, or the company switchboard. Never use a number provided during the call.
    2. Use a shared codeword for sensitive requests within your family and your executive teamA simple agreed phrase known only to the real parties. If the caller doesn't know it, or improvises around it, treat as adversarial. The codeword should not be guessable from public information.
    3. Apply the financial two-person rule and verification workflow with no exceptionsNo phone call from any voice — synthetic or real — bypasses the workflow. Process beats perception.
    4. If you suspect a deepfake voice call, ask a question the real person would answer reflexively and the attacker would not knowPersonal anecdotes, recent shared experiences, internal jokes. AI voice clones can synthesise speech but cannot reliably fabricate accurate context.
    5. Report attempted deepfake calls to security with as much detail as you can recallTime, claimed pretext, caller-ID, requested action. Every attempt is reconnaissance evidence and may indicate active targeting of the executive.
    6. If money or credentials moved, escalate within the first hour for recall and revocationBank fraud line + finance leadership + security simultaneously. Hours matter for clawback.

    Verdediging — voor IT en beleid

    Technische controles

    • Voice-spoofing detection on critical inbound numbers (offered by several carriers in 2026) — flags SS7-routed and SIP-injected calls
    • Privacy review of executive voice exposure — limit duration and quality of public audio appearances where practical; for many roles this is unavoidable, in which case defences must focus on process
    • Caller-ID display hygiene — apps that flag unknown numbers and label spoofing risk; corporate devices configured to suppress private/withheld numbers from reaching key roles without screening
    • AI-detection software in contact-centre environments — emerging in 2026, still maturing; useful but not yet reliable enough to be sole control
    • Pre-shared identity tokens (PIN codes, codewords) for executive-to-executive sensitive comms — explicitly required by policy and rehearsed

    Beleidscontroles

    • Written policy: payment instructions cannot be authorised on phone alone, regardless of who appears to be calling — must go through ERP / signed authorisation workflow
    • Documented executive codeword protocol — explicitly required for any high-value or off-process request from an executive over phone or text
    • Family-level guidance for executives and their families — a shared codeword for 'is this really you?' calls (especially for 'I'm in trouble, send money' family-emergency scams)
    • Out-of-band verification requirement for any voice-only request to: reset MFA, change banking details, share credentials, transfer money, share sensitive documents
    • Quarterly tabletop with finance + executive team that rehearses a deepfake-voice scenario

    Trainingsfrequentie

    Annual simulated deepfake voice exercise targeting executive assistants, finance leadership and family members of high-value targets. Include a 'what would you have done?' debrief — the goal is to make the verification reflex automatic so the conscious 'this sounded just like them' override cannot occur. Pair training with a one-line internal mnemonic: 'A real voice never minds being called back.'

    Korte check

    Vijf vragen. Antwoorden en toelichting verschijnen na inzenden.

    1. Q1.

      A trusted contact's voice on the phone urgently asks you to authorise a wire transfer 'within the next ten minutes'. The voice is unmistakable and the caller ID matches. What is the safest response?

    2. Q2.

      How much audio does an attacker typically need to produce a credible voice clone in 2026?

    3. Q3.

      What is a codeword protocol and why does it work?

    4. Q4.

      If a deepfake voice call has already convinced you and money has moved, what is the most time-critical action?

    5. Q5.

      Which family member is most often targeted in deepfake-voice family-emergency scams?

    Bronnen & verdere lectuur

    Verwante modules

    Wil je een echte aanvaller in je omgeving testen?

    HackersHub voert betaalde red-team-engagements uit.

    Praat met een expert

    Deze module is door HackersHub goedgekeurd in exact deze vorm, inclusief watermerk. Gratis onder CC-BY-ND 4.0. Wil je de inhoud aanpassen? Verwijder dan eerst ons watermerk. — Het HackersHub-team Bekijk licentievoorwaarden.