As I discussed in a previous post, the current loss functions that measure SR: MSE loss between HR and SR, adversarial training, the perceptual loss has some problems and aren’t fundamental objectives the model must achieve for best perceptual quality. We will shortly summarize the problems of currently used losses in measuring SR quality and propose my ideas on how to refine these methods.
I would like to note that these are personal observations and opinions by me, not a consensus of researchers. Some arguments, especially the third might be disputable. The problems are as follows.
As we have reviewed the NTIRE 2020 challenge in extreme super-resolution in this post, where we have a very large scaling factor(x16). The teams that participated in this challenge proposed various model architecture and scaling methods. Some teams made minor modifications to the loss by using an SR-suited model for the perceptual loss instead of the VGG network.
Unfortunately, the problem is far from solved. As we can see in the figures below, some methods successfully reconstruct HR texture information but fail at generating realistic structures with spatial information.
The 2020 NTIRE challenge on extreme super-resolution is about super resolving an image with a scaling factor of x16. The challenge paper reviews 19 methods that were proposed to solve this problem and compete for perceptual performance. We will take an overview of how the competition was conducted and about the intuitions of some high-scoring methods proposed by participants of the challenge.
Compared to the active research in SISR for moderate factors such as x4, not much was made on the research to extreme super-resolution. …
The paper proposes a novel model architecture made up of residual-in-residual(RIR) blocks each with channel attentions. The total pipeline is a 400 layer deep convolutional network. More specifically, the paper proposes
This paper proposes an autoencoder that learns a discrete latent space and proposes a loss and a method to backpropagate through the non-differentiable pipeline proposed in this paper. Yes, this paper proposes a discrete “0 or 1” representation of the latent space with the argmin function.
StyleGAN initially proposed in 2019 showed amazing performance in creating realistic images based on a style-based generator architecture by separating high-level attributes such as pose and facial expressions from stochastic variation in the images like hair and freckles. Also, it aims to spread out the latent space by introducing an intermediate latent space W which the generator infers from instead of the classic latent vector.
Super-resolution(SR) is the task of recovering high resolution(HR) images from their low resolution(LR) counterparts. Recent approaches on SR showed amazing reconstruction performances in terms of qualitative perceptual and quantitative benchmarks(PSNR, SSIM). Although many problems from previous approaches were resolved by further research, we still believe the current DL-based SR methods inherit fundamental problems, especially from their loss function. We will specifically focus on the problems of the current approaches to single image super-resolution(SISR) where we receive one single LR image and aim to output an HR image.
We will summarize the key concepts of ESRGAN(Enhanced Super-Resolution Generative Adversarial Networks) and the methods proposed in the paper to improve the perceptual quality of Single Image Super-Resolution. The paper proposes the following techniques:
The complete implementation and training of ESRGAN can be found here.
Refer to this post for the concepts and methods proposed by the paper “LARGE SCALE GAN TRAINING FOR HIGH FIDELITY NATURAL IMAGE SYNTHESIS”.
Apparently, the original BigGAN model was trained with an environment with enormous computation power and memory. Executing the 256x256 biggan in Colab will crash with any means, at least with a batch size larger than 4. Although it is not likely to be trainable in our local environment, we will review how to implement the techniques and model architecture proposed in the paper.
The complete code for this post is available here, although produces an OOM(Out of…
15-year-old Python enthusiast, Korean student interested in learning and implementing deep learning, machine learning.