Visual media have enjoyed a high level of public trust since the development of photography in the 19th century. Unlike audio recordings, photos and videos have been widely used as evidence in court cases (Meskin and Cohen, 2008). It is also widely acknowledged that visual media are a particularly potent propaganda tool (Winkler and Dauber, 2014). As a result, there have always been strong incentives for producing fake visual documents. The Soviet Union has documented extensive use of altered photographs for political purposes dating back to the 1920s (Dickerman, 2000; King, 2014).
On the other hand, since each frame had to be altered separately, video manipulation required highly experienced professionals and a sizable amount of time to render through a cloud rendering system. Hollywood refined video manipulation technology in the 1990s (Pierson, 1999), but only a select few films fully utilised it due to the expense. As a result, making modified videos for political propaganda was uncommon. Deepfake, a technology that allows for easily altering entire films using consumer-grade gear, has only lately been accessible. It uses contemporary artificial intelligence to automate monotonous cognitive processes like recognising a person’s face in every film frame and replacing it with another face, making producing such modified videos very cheap.
Modification of Specific Images
Retouching is a reasonably simple method for changing certain photos. In the 1920s, the approach was already widely used, most notably in the Soviet Union (King, 2014). The removal of Alexander Marchenko from official photographs following his execution in 1930 serves as a notable example (Dickerman, 2000; King, 2014). While the ability to modify photos in general and this particular example in particular have long been well known, the work required to produce changed images was, until recently, fairly high.
Furthermore, such changes were often detectable by professionals. As a result, unlike audio recordings, photographs often enjoyed popular trust. Even now, despite the fact that this confidence appears to be ebbing, the term “photographic evidence” is still employed occasionally (Meskin and Cohen, 2008).
The general public has had access to picture alteration since the 1990s. In fact, image manipulation is so widespread these days that the term “to photoshop,” which derives from the Adobe Photoshop software (Adobe, 2020), is being used as a verb. Typically, this relates to little manipulations like forcing someone to meet a certain standard of beauty, although there are many greater manipulations carried out for profit or pleasure (Reddit, 2020). Naturally, propaganda can and has been accomplished using these methods. A professional user only needs a few minutes to complete the alteration depicted in Figure 1 using image processing software, which removes a person from a single photograph.
Using artificial intelligence today, manipulations like deleting a person can be made easier for both single photographs and big groups of images (AI). NVIDIA Image Inpainting, for instance, is a straightforward tool for eliminating people or objects from photographs (NVIDIA, 2020).
Video Manipulation
Traditionally, video editing takes a lot more time and technical expertise than photo editing. High technological hurdles prevent successful video manipulation due to the vast number of frames (i.e., images) that must be modified and the requirement for uniformity in the modification. However, using new machine learning approaches, the cost of doing so can be significantly lowered, exactly like in the case of photo modification.
Direct manipulation of the video material was uncommon until recently; instead, videos that claim to depict something other than what they actually do or suggestive editing, which involves chopping real video content in a way that distorts the scene that was captured, were more frequently used to spread false information using videos (Matatov et al., 2018). The term “shallowfakes” has been proposed to distinguish these methods from deepfakes (European Science Media Hub, 2019). The third scenario is the alteration of the real video content at a level that does not require AI. These movies are occasionally referred to as cheapfakes.
Deepfake Technology
Reddit users posted films of celebrities in 2017 whose faces had been switched out for those of other people. The software that produced these photographs depended on a technology that had been established a few years prior, even though the effect was original.
Automatic Deepfake Encoders
A detector and the appropriate generator can be connected. Such a system is known as an autoencoder because, during training, it learns to abstractly encode an image—in this case, the face of a human. The network can detect faces in noisy images or at odd angles if there is sufficient training data. Using the generator network, it can produce the face without noise since it internally abstractly depicts it.
Two such autoencoders are trained to produce deep fakes for face swapping. While the other network is trained to detect the source person, the first network is trained to recognise the target individual whose face will be substituted by that of the source person. Then, a new autoencoder is made by connecting the detector for the source person and the generator for the target person. The face of the target individual appears in place of the face of the source individual when used on a video of the source individual.
Countermeasures
Numerous countermeasures have been thought of for deepfakes given their potential threat. Here, we cover both technical and judicial defences.
The Defense Advanced Research Projects Agency (DARPA) of the United States started the media forensics project in response to the threat posed by manipulated visual media to develop tools for identifying manipulation in videos and images using a wide range of tools like semantic analysis (Darpa, 2020). In light of this, Adobe, the company behind the well-known Photoshop software, unveiled a tool that can identify the majority of image manipulations that Photoshop is capable of (Adobe Communications Team, 2018). The technique is the result of an extensive study done at the University of Maryland (Zhou et al., 2018). Additionally, there are now readily available big data sets for training and testing detection systems (Dolhansky et al., 2019; Guan et al., 2019; Rossler et al., 2019).