Zephyrnet Logo

High-Resolution Photorealistic Image Translation in Real Time

Date:

Louis Bouchard Hacker Noon profile picture

@whatsaiLouis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

You can apply any design, lighting, or graphics style to your 4K image in real-time using this new machine learning-based approach! If you think this looks interesting, watch the video on this topic and read more about it from the references below 👇

Watch the video

References

►Read the full article: https://www.louisbouchard.ai/4k-image-translation-in-real-time/

►Liang, Jie and Zeng, Hui and Zhang, Lei, (2021), “High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network”, https://export.arxiv.org/pdf/2105.09188.pdf

►Code: https://github.com/csjliang/LPTN

Video Transcript

00:00

You’ve all seen these kinds of pictures where a person’s face is “toonified” into an anime

00:05

character.

00:06

Many of you must have seen other kinds of image transformations like this, where an

00:10

image is changed to follow the style of a certain artist.

00:13

Here, an even more challenging task could be something like this, where an image is

00:18

transformed into another season or time of the day.

00:21

What you have not seen yet is the time it takes to produce these results and the actual

00:25

resolutions of the produced pictures.

00:28

This new paper is completely transparent towards this as it attacks exactly this problem.

00:33

Indeed, compared to most approaches, they translate high-definition 4K images, and this

00:38

is done in real-time.

00:39

In this work, they showed their results on season translation, night and day translations,

00:45

and photo retouching, which you’ve been looking at for the last minute.

00:49

This task is also known as ‘image-to-image translation’, and all the results you see

00:53

here were produced in 4K.

00:55

Of course, this video is not in 4K, and the images were taken from their paper, so it

01:00

might not look that high-quality here.

01:03

Please look at their paper or try their code if you are not convinced!

01:07

These are the most amazing results of this paper.

01:09

Here, you can see their technique below called LPTN, which stands for Laplacian Pyramid Translation

01:16

Network.

01:17

Look at how much less time it took LPTN to produce the image translations where most

01:22

approaches cannot even do it as this amount of definition is just too computationally

01:27

demanding.

01:28

And yes, this is in seconds.

01:30

They could translate 4K images in not even a tenth of a second using a single regular

01:36

GPU.

01:37

It is faster than all these approaches on 480p image translations!

01:41

And yes, it is not eight times faster, but 80 times faster on average!

01:46

But how is that possible?

01:48

How can they be so much more efficient and still produce amazing and high-quality results?

01:53

This is achieved by optimizing the fact that illumination and color manipulation, which

01:58

relates to the style of an image, is contained in the low-frequency component of an image.

02:03

Whereas the content details, which we want to keep when translating an image into another

02:08

style, can be adaptively refined on high-frequency components.

02:13

This is where it becomes interesting.

02:15

These two components can be divided into two tasks that can be performed simultaneously

02:19

by the GPU.

02:21

Indeed, they split the image into low-resolution and high-resolution components, use a network

02:26

to process the information of the low-frequency or the style of the image,

02:30

and render a final image merging this processed style with the refined high-frequency component,

02:37

which is the details of the image but adapted by a smaller sub-network to fit the new style.

02:43

Thus dodging the unavoidable heavy computation consumption when processing the high-resolution

02:48

components in the whole network.

02:50

This has been a long-standing studied field achieved with a popular technique called Laplacian

02:55

Pyramid.

02:57

The main idea of this Laplacian Pyramid method is to decompose the image into high and low-frequency

03:02

segments and reconstruct it afterward.

03:05

First, we produce an average of the initial image, making it blurry and removing high-frequency

03:11

components.

03:12

This is done using a kernel that passes through the whole image to round batches of pixels

03:17

together.

03:18

For example, if they take a 3 by 3 kernel, it would go through the whole image averaging

03:23

3 by 3 patches removing all unique values.

03:26

They are basically blurring the image by softening the edges.

03:30

Then, the difference between this blurry image and the initial image is saved to use at the

03:35

end of the algorithm to re-introduce the details, which are the high-frequency components.

03:41

This is repeated three times with bigger and bigger averaging kernels producing smaller

03:47

and smaller low-frequency versions of the image having less and less high-frequency

03:52

details.

03:53

If you remember, these low-frequency versions of the image contain information about the

03:57

colors in the image and illumination.

03:59

Indeed, they are basically just a blurred low-quality version of our image, which is

04:03

why the model is so much more efficient.

04:06

This is convenient since they are smaller versions of the image, and this is the exact

04:11

information we are trying to change when translating the image into another style.

04:16

Meaning that using these low-frequency versions is much more computationally efficient than

04:21

using the whole image directly, but they are also focused on the information we want to

04:26

change in the image, which is why the results are so great.

04:30

This lower-quality version of the image can be easily translated using an encoder-decoder,

04:35

just like any other image translation technique we previously mentioned, but since it is done

04:40

on a much lower quality image and a much smaller image,

04:43

it is exponentially faster to process.

04:47

The best thing is that the quality of the results only depends on the initially saved

04:52

high-frequency versions of the image sent as input which is not processed throughout

04:57

the whole network.

04:59

This high-frequency information is simply merged at the end of the processing with the

05:03

low-frequency image to improve the details.

05:06

Basically, it is so much faster because the researchers split the image’s information

05:11

in two: low-frequency general information and detailed high-frequency information.

05:17

Then, they send only the computational-friendly part of the image, which is exactly what we

05:23

want to transform, the blurry, low-quality general style of the image, or in other words:

05:29

the low-frequency information.

05:30

Then, only fast and straightforward transformations are done on the high-frequency parts of the

05:36

image to resize them and merge them with the blurry newly-stylized image,

05:42

improving the results by adding details on all edges in the picture.

05:46

And voilà!

05:47

You have your results with a fraction of the time and computational power needed.

05:51

This is brilliant, and the code is publicly available if you would like to try it, which

05:56

is always cool!

05:58

As always, the links to the complete article and references are in the description of the

06:02

video.

06:03

Thank you for watching!

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://hackernoon.com/high-resolution-photorealistic-image-translation-in-real-time-5m4f34im?source=rss

spot_img

Latest Intelligence

spot_img