What is a deepfake and how does it work?
Deepfakes are AI-generated videos, images, or audio that replace or fabricate a person's likeness with convincing realism. Here's how they're made, why they matter, and what organizations can do about them.
A deepfake is a synthetic media file — video, image, or audio — in which a person's likeness has been digitally fabricated or replaced using AI. The word combines "deep learning" and "fake." The result can be a video of someone saying something they never said, a photograph of someone who doesn't exist, or a voice clone that passes as the real person. As of 2026, the technology is accessible enough that non-experts can produce convincing deepfakes in minutes with consumer hardware.
This post explains how deepfakes are created, what varieties exist in the wild, and what organizations need to understand to protect themselves.
How deepfakes are made: GANs
The original deepfake technique — and still one of the most common — is based on Generative Adversarial Networks (GANs), introduced by Ian Goodfellow and colleagues in 2014. A GAN pits two neural networks against each other:
- The generator tries to produce synthetic images that look real.
- The discriminator tries to tell real images from fake ones.
Over thousands of training iterations, the generator gets better at fooling the discriminator. The discriminator gets better at spotting fakes. They push each other toward higher and higher realism. By the time training converges, the generator has learned to produce images that are statistically indistinguishable from the real training data.
For face-swap deepfakes, the model is trained on footage of a target person. It learns the geometry of their face, their skin tone, their expressions. At inference time, it replaces the face in source footage with a synthesized version that mimics the target.
NVIDIA's research on GAN-based image synthesis gives a technical window into how far this architecture has come.
How deepfakes are made: diffusion models
Since 2022, diffusion models have overtaken GANs as the dominant architecture for image generation. Tools like Stable Diffusion, Midjourney, and DALL-E 3 are all diffusion-based.
Diffusion models work differently. During training, Gaussian noise is gradually added to real images until they become pure noise. The model learns to reverse this process — to denoise an image step by step. At generation time, you start with random noise and apply the learned denoising process, guided by a text prompt or reference image, until a coherent picture emerges.
Diffusion models are harder to fingerprint than GANs. They produce higher-fidelity results across a wider range of subjects and styles. MIT's research on diffusion model detection has shown that even state-of-the-art detectors struggle with diffusion outputs.
Types of deepfakes
Not all deepfakes work the same way. The main categories:
Face swap — The most recognizable type. A source video has its faces replaced with a target person's likeness. Used heavily in non-consensual deepfakes and political disinformation.
Face reenactment — Rather than swapping the entire face, the target person's facial expressions are driven by a source actor's movements. The target appears to say or do whatever the source does.
Fully synthetic faces — GAN-generated faces of people who never existed. In fraud, these are used to create fake identity documents, false social media profiles, and manufactured references.
Voice cloning — Audio-only deepfakes that replicate a person's voice with a few seconds of training data. Used in CEO fraud attacks and phone-based social engineering.
Lip sync manipulation — Real video is edited so a person's lip movements match a fabricated audio track.
Real-world impacts
The consequences of deepfakes are not hypothetical:
- Identity fraud and KYC bypass: Criminals submit deepfake selfie videos during identity verification flows, fooling systems that rely on liveness checks. Our KYC use case page covers how this attack vector works in practice.
- Legal evidence manipulation: Deepfake images and video are increasingly submitted in litigation — divorce proceedings, employment disputes, criminal cases. See our legal evidence use case for the current landscape.
- Corporate disinformation: Fake executive statements, fabricated earnings calls, and synthetic product announcements have moved markets.
The EU AI Act, which came into force in 2024, classifies certain deepfake use cases as high-risk and mandates transparency labeling.
How detection works
Detecting deepfakes is an adversarial problem: as generation improves, detection must keep pace. Current approaches operate at several levels:
Pixel-level forensics — Deepfake generators leave statistical traces in the pixel distribution. These include GAN fingerprints and diffusion model artifacts. Forensic models trained to recognize these patterns can flag synthetic media even when it looks clean to the human eye.
Biological signal analysis — Videos of real people contain subtle physiological signals: micro-expressions, natural eye blink rates, pulse-driven color changes in skin. Deepfake videos often lack these signals or render them inconsistently.
Geometric consistency checks — Face geometry, lighting direction, and shadow placement must be physically consistent in a real photograph. Synthesis pipelines sometimes violate these constraints.
Provenance and metadata — C2PA (Coalition for Content Provenance and Authenticity) is an emerging standard that cryptographically signs the capture and edit history of media files.
No single technique is sufficient. Production-grade detection — like the models behind our deepfake detection platform — combines multiple approaches and continuously retrains against emerging generation methods. For a deeper look, see our post Can AI detect deepfakes? What works in 2026.
What organizations should do
Detection technology is necessary but not sufficient. Organizations should also:
- Audit their verification flows. Liveness checks that rely on video selfies without multi-modal validation are increasingly vulnerable.
- Train staff to recognize social engineering. Voice clone-based CEO fraud starts with a phone call. Employees need updated protocols.
- Establish a media intake policy. Any image or video that will be used in a decision should pass through a documented verification step.
- Monitor for synthetic identity fraud. Fully synthetic identities created with AI-generated faces are harder to catch than traditional document fraud.
- Stay current on regulation. The EU AI Act, emerging US state laws, and sector-specific guidance are moving quickly.
Stanford Internet Observatory's research on synthetic media is one of the best public resources for tracking the policy and technical landscape.
The bottom line
Deepfakes are not a future problem. They are a present operational risk for any organization that relies on visual or audio media for identity, evidence, or communication. The right response is not panic. It's infrastructure: detection tools integrated into workflows, updated verification protocols, and staff who understand what's at stake.
If you want to see how Reality AI's detection platform handles deepfake media at scale, book a demo. We'll walk through your specific use case and show you what the models catch.
Ready to verify what's real?
See how Reality AI authenticates images and documents for enterprise teams.
Book a demo