@whatsaiLouis Bouchard
I explain Artificial Intelligence terms and news to non-experts.
Wondering about the best ways to spot a deepfake? In this video, learn about a breakthrough US Army technology that uses artificial intelligence to find deepfakes.
Watch the video
References
►Read the full article: https://www.louisbouchard.ai/spot-deepfakes
►Test your deepfake detection capacity: https://detectfakes.media.mit.edu/
►DeepFakeHop: Chen, Hong-Shuo et al., (2021), “DefakeHop: A Light-Weight High-Performance Deepfake Detector.” ArXiv abs/2103.06929
►Saab Transforms: Kuo, C.-C. Jay et al., (2019), “Interpretable Convolutional Neural Networks via Feedforward Design.” J. Vis. Commun. Image Represent.►OpenFace 2.0: T. Baltrusaitis, A. Zadeh, Y. C. Lim and L. Morency, “OpenFace 2.0: Facial Behavior Analysis Toolkit,” 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), 2018, pp. 59-66, doi: 10.1109/FG.2018.00019.
Video Transcript
00:00
While they seem like they’ve always been there, the very first realistic deepfake didn’t appear
00:05
until 2017.
00:07
It went from these first-ever resembling fake images automatically generated to today’s
00:13
identical copy of someone on videos, with sound.
00:16
The reality is that we cannot see the difference between a real video or picture and a deepfake
00:22
anymore.
00:23
How can we tell what’s real from what isn’t?
00:25
How can audio files or video files be used in court as proof if an AI can entirely generate
00:32
them?
00:33
Well, this new paper may provide answers to these questions.And the answer here may again
00:37
be the use of artificial intelligence.
00:40
The saying “I’ll believe it when I’ll see it” may soon change for “I’ll believe it when
00:45
the AI tells me to believe it…”
00:47
I will assume that you’ve all seen deepfakes and know a little about them.
00:51
Which will be enough for this article.
00:53
For more information about how they are generated, I invite you to watch the video I made explaining
00:58
deepfakes just below, as this video will focus on how to spot them.
01:04
More precisely, I will cover a new paper by the USA DEVCOM Army Research Laboratory entitled
01:09
“DEFAKEHOP: A LIGHT-WEIGHT HIGH-PERFORMANCE DEEPFAKE DETECTOR.”
01:14
Indeed, they can detect deepfakes with over 90% accuracy in all datasets and even reach
01:20
100% accuracy in some benchmark datasets.
01:24
What is even more incredible is the size of their detection model.
01:27
As you can see, this DeFakeHop model has merely 40 thousand parameters, whereas the other
01:33
techniques yielding much worse accuracy had around 20 million!
01:37
This means that their model is 500 HUNDRED times smaller while outperforming the previous
01:43
state-of-the-art techniques.
01:44
This allows the model to quickly run on your mobile phone and allows you to detect deep
01:49
fakes anywhere.
01:50
You may think that you can tell the difference between a real picture or a fake one, but
01:54
if you remember the study I shared a couple of weeks ago, it clearly showed that around
01:59
50 percent of participants failed.
02:01
It was basically a random guess on whether a picture was fake or not.
02:05
There is a website from MIT where you can test your ability to spot deefakes if you’d
02:10
like to.
02:11
Having tried it myself, I can say it’s pretty fun to do.
02:14
There are audio files, videos, pictures, etc.
02:17
The link is in the description below.
02:19
If you try it, please let me know how well you do!
02:21
And if you know any other fun apps to test yourself or help research by trying our best
02:26
to spot deepfakes, please link them in the comments.
02:29
I’d love to try them out!
02:31
Now, if we come back to the paper able to detect them much better than we can, the question
02:36
is: how is this tiny machine learning model able to achieve that while humans can’t?
02:42
DeepFakeHop works in four steps.
02:44
Step 1:
02:45
At first, they use another model to extract 68 different facial landmarks from each video
02:50
frame.
02:51
These 68 points are extracted to understand where the face is, recenter, orient and resize
02:57
it to make them more consistent, and then extract specific parts of the face from the
03:02
image.
03:03
These are the “patches” of the image we will send our network, containing specific individual
03:09
face features like the eyes, mouth, nose.
03:12
It is done using another model called OpenFace 2.0.
03:16
It can accurately perform facial landmark detection, head pose estimation, facial action
03:22
unit recognition, and eye-gaze estimation in real-time.
03:26
These are all tiny patches of 32 by 32 that will all be sent into the actual network one
03:32
by one.
03:33
This makes the model super efficient because it deals with only a handful of tiny images
03:38
instead of the full image.
03:40
More details about OpenFace2.0 can be found in the references below if you are curious
03:45
about it.
03:46
Step 2 to 4 (left to right, blue, green, orange):
03:47
More precisely, the patches are sent to the first PixelhHop++ unit named Hop-1, as you
03:53
can see.
03:54
Representing the step one in blue.
03:55
This is an algorithm called Saab transform to reduce the dimension.
03:58
It will take the 32 by 32 image and reduce it to a downscaled version of the image but
04:04
with multiple channels representing its response from different filters learned from the Saab
04:10
transform.
04:11
You can see the Saab transform as a convolution process, where the kernels are found using
04:16
the PCA dimension reduction algorithm replacing the need of backpropagation to learn these
04:21
weights.
04:22
I will come back to the PCA dimension reduction algorithm in a minute as it is repeated in
04:26
the next stage.
04:27
These filters are optimized to represent the different frequencies in the image, basically
04:32
getting activated by varying degrees of details.
04:35
The Saab transform was shown to work well against adversarial attacks compared to basic
04:40
convolutions trained with backpropagation.
04:43
You can also find more information about the Saab transformation in the references below.
04:47
If you are not used to how convolutions work, I strongly invite you to watch the video I
04:52
made introducing them:
04:56
I said Saab transforms worked well on adversarial attacks.
05:00
These adversarial attacks happen when we “attack” an image by changing a few pixels or adding
05:06
noise that humans cannot see to change the results of a machine learning model processing
05:11
the image.
05:12
So to simplify, we can basically see this PixelHop++ Unit as a typical 3 by 3 convolution
05:19
here since we do not look at the training process.
05:21
Of course, it works a bit differently, but it will make the explanation much more straightforward
05:26
as the process is comparable.
05:28
Then, the “Hop” step is repeated three times to get smaller and smaller versions of the
05:33
image with concentrated general information and more channels.
05:37
These channels are simply the outputs, or responses, of the input image by filters that
05:42
react differently depending on the level of detail in the image, as I said earlier.
05:48
One new channel per filter used.
05:50
Thus, we obtain various results giving us precise information about what the image contains,
05:55
but these results are smaller and smaller containing less spatial details unique to
06:00
that precise image sent in the network, and therefore have more general and useful information
06:06
with regard to what the image actually contains.
06:09
The first few images are still relatively big, starting at 32 by 32, being the initial
06:15
size of the patch and thus contains all the details.
06:18
Then, it drops to 15 by 15, and finally to 7 by 7 images, meaning that we have close
06:24
to zero spatial information in the end.The 15 by 15 image will just look like a blurry
06:29
version of the initial image but still contains some spatial information, while the 7 by 7
06:35
image will basically be a very general and broad version of the image with close to no
06:40
spatial information at all.
06:43
So just like a convolutional neural network, the deeper we get, the more channels we have
06:48
meaning that we have more filter responses reacting to different stimuli, but the smaller
06:53
they each are, ending with images of size 5×5.
06:56
Allowing us to have a broader view in many ways, keeping a lot of unique valuable information
07:02
even with smaller versions of the image.
07:05
The images get even smaller because each of the PixelHop units is followed by a max-pooling
07:12
step.
07:13
They are simply taking the maximum value of each square of two by two pixels, reducing
07:17
the image size by a factor of four at each step.
07:20
Then, as you can see in the full model shown above, the outputs from each max-pooling layer
07:24
are sent for further dimension reduction using the PCA algorithm.
07:26
Which is the third step, in green.
07:28
The PCA algorithm mainly takes the current dimensions, for example, 15 by 15 here in
07:34
the first step, and minimizes that while maintaining at least 90% of the intensity of the input
07:40
image.
07:41
Here is a very simple example of how PCA can reduce the dimension, where two-dimensional
07:45
points of cats and dogs are reduced to one dimension on a line, allowing us to add a
07:51
threshold and easily build a classifier.
07:54
Each hop gives us respectively 45, 30, and 5 parameters per channel instead of having
08:00
images of size 15 by 15, 7 by 7, and 3 by 3, which would give us in the same order 225,
08:07
49, and 9 parameters.
08:11
This is a much more compact representation while maximizing the quality of information
08:16
it contains.
08:17
All these steps were used to compress the information and make the network super fast.
08:22
You can see this as squeezing all the helpful juice at different levels of details of the
08:27
cropped image to finally decide whether it is fake or not, using both detailed and general
08:32
information in the decision process (step 4 in orange).
08:35
I’m glad to see that the research in countering these deepfakes is also advancing, and I’m
08:39
excited to see what will happen in the future with all that.
08:43
Let me know in the comments what you think will be the main consequences and concerns
08:47
regarding deepfakes.
08:48
Is it going to affect law, politics, companies, celebrities, ordinary people?
08:53
Well, pretty much everyone…
08:55
Let’s have a discussion to share awareness and spread the word to be careful and that
09:00
we cannot believe what we see anymore, unfortunately.
09:02
This is both an incredible and dangerous new technology.
09:06
Please, do not abuse this technology and stay ethically correct.
09:10
The goal here is to help improve this technology and not to use it for the wrong reasons.
Tags
Create your free account to unlock your custom reading experience.
Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://hackernoon.com/how-to-spot-a-deepfake-in-2021-yd1v3539?source=rss