The origins
Precursor of what we call today “deepfake”, the first example of multimedia content manipulation occurred in 1860, when a portrait of Southern politician John Calhoun was cleverly manipulated by replacing his head with the President of the United States for propaganda purposes.
Today, this type of manipulation is carried out by adding (splicing), removing (inpainting) and replicating (copy-move) objects within or between two images. Appropriate post-processing steps, such as scaling, rotation, and colour adjustment, are then applied to improve visual appearance, scale and perspective consistency.
Apart from these traditional manipulation methods, advances in computer graphics and deep learning (DL) techniques offer different automated approaches to digital manipulation, with improved semantic consistency. A recent trend is the synthesis of videos from scratch using autoencoders or generative adversarial networks (GANs) for different applications and, more specifically, the photorealistic generation of human faces. Another widespread manipulation, called “shallow fakes” or “cheap fakes”, are audiovisual manipulations created with cheaper and more accessible software.
Superficial fakes involve basic editing of a video using selective slowing down, speeding up, cutting, and splicing of existing unaltered footage that can alter the entire context of the information delivered. In May 2019, a video of US politician Nancy Pelosi was selectively edited to make it appear as if she was slurring her words and was drunk or confused. The video was shared on Facebook and received more than 2.2 million views in 48 hours. Video manipulation for the entertainment industry, specifically in film production, has been done for decades.
One of the first notable academic projects was the Video Rewrite Program for film dubbing applications, published in 1997. It was the first software capable of automatically reanimating facial movements from an existing video with a different audio track and achieved surprisingly convincing results.
The first deepfake
The first real deepfake appeared online in September 2017, when a Reddit user called “deepfake” posted a series of computer-generated videos of famous actresses with their faces swapped into pornographic content. Another notorious case of deepfake was the launch of the deepNude app that allowed users to generate fake nude images. It was then that deepfakes gained wider recognition within a broad community. Today, deepfake technology and applications, such as FakeApp, FaceSwap and ZAO, are easily accessible and users without a background in computer engineering can create a fake video in a matter of seconds. In addition, open source projects on GitHub, such as DeepFaceLab and related tutorials, are readily available on YouTube.
Most of the deepfakes currently present on social platforms such as YouTube, Facebook or Twitter can be considered harmless, entertaining or artistic. However, there are also some examples where deepfakes have been used for revenge porn, hoaxes, political or non-political influence and financial fraud. In 2018, a deepfake video went viral online in which former US president Barak Obama appeared to insult the current president, Donald Trump. In June 2019, a fake video of Facebook CEO Mark Zuckerberg was posted on Instagram by the Israeli advertising company “Canny”. More recently, extremely realistic deepfake videos of Tom Cruise posted on the TikTok platform garnered 1.4 million views in just a few days.
New thrends
Apart from visual manipulation, audio deepfakes are a new form of cyberattack, with the potential to cause serious harm to individuals due to highly sophisticated voice synthesis techniques, such as WaveNet, Tacotron and DeepVoice. Audio-assisted financial scams increased significantly in 2019 as a direct result of the progression in speech synthesis technology.
In August 2019, the CEO of a European company, duped by an audio deepfake, made a fraudulent transfer of $243,000. Voice mimicking AI software was used to clone the victim’s voice patterns by training ML algorithms using audio recordings obtained from the internet. If these techniques can be used to mimic the voice of a senior government official or military leader and applied at scale, it could have serious security implications.