Problems of Current SR

As I discussed in a previous post, the current loss functions that measure SR: MSE loss between HR and SR, adversarial training, the perceptual loss has some problems and aren’t fundamental objectives the model must achieve for best perceptual quality. We will shortly summarize the problems of currently used losses in measuring SR quality and propose my ideas on how to refine these methods.

  1. Because the LR->SR mapping…


As we have reviewed the NTIRE 2020 challenge in extreme super-resolution in this post, where we have a very large scaling factor(x16). The teams that participated in this challenge proposed various model architecture and scaling methods. Some teams made minor modifications to the loss by using an SR-suited model for the perceptual loss instead of the VGG network.

NTIRE Challenge

The 2020 NTIRE challenge on extreme super-resolution[1] is about super resolving an image with a scaling factor of x16. The challenge paper reviews 19 methods that were proposed to solve this problem and compete for perceptual performance. We will take an overview of how the competition was conducted and about the intuitions of some high-scoring methods proposed by participants of the challenge.


The paper proposes a novel model architecture made up of residual-in-residual(RIR) blocks each with channel attentions. The total pipeline is a 400 layer deep convolutional network. More specifically, the paper proposes

  • A residual block of residual blocks, forming an RIR structure.
  • A long and short skip-connections, which aim to convey low-frequency features together with computed high-frequency features.
  • Proposes Channel Attention(CA) mechanism for each residual block to weigh each channel-wise information differently through self-attention.
  • Demonstrates that very very deep networks with high representation capability can substantially improve perceptual and quantitative SR performance. …

Key Concepts

This paper proposes an autoencoder that learns a discrete latent space and proposes a loss and a method to backpropagate through the non-differentiable pipeline proposed in this paper. Yes, this paper proposes a discrete “0 or 1” representation of the latent space with the argmin function.

  • Again, the paper introduces a simple VQ-VAE model that uses discrete latent and as a result, does not suffer from posterior collapse and variance issues.
  • The paper proposes a loss with 3 components that will train the embedding space through Vector Quantization(VQ) loss.
  • The VQ-VAE is comparable to the classic VAE with continuous latent…

StyleGAN[1] initially proposed in 2019 showed amazing performance in creating realistic images based on a style-based generator architecture by separating high-level attributes such as pose and facial expressions from stochastic variation in the images like hair and freckles. Also, it aims to spread out the latent space by introducing an intermediate latent space W which the generator infers from instead of the classic latent vector.

Key Concepts

  • A learned “style” instead of a latent vector disentangles the complex image space and enables smooth transitions in the latent space.
  • The styles are applied to the generator in various scales of the generation process…

Super Resolution

Super-resolution(SR) is the task of recovering high resolution(HR) images from their low resolution(LR) counterparts. Recent approaches on SR showed amazing reconstruction performances in terms of qualitative perceptual and quantitative benchmarks(PSNR, SSIM). Although many problems from previous approaches were resolved by further research, we still believe the current DL-based SR methods inherit fundamental problems, especially from their loss function. We will specifically focus on the problems of the current approaches to single image super-resolution(SISR) where we receive one single LR image and aim to output an HR image.

We will first take a very quick overview of current deep learning-based SR…

Comparison of SR performance with SRGAN

Method Overview

We will summarize the key concepts of ESRGAN(Enhanced Super-Resolution Generative Adversarial Networks)[1] and the methods proposed in the paper to improve the perceptual quality of Single Image Super-Resolution. The paper proposes the following techniques:

  • Improves the model architecture using RRDB(Residual-in-residual Dense Block) without batch normalization based on the observations of EDSR[4].
  • Uses RaGAN(Relativistic GAN)[3] relative loss instead of the previous cross-entropy loss for adversarial loss.
  • Improves the perceptual VGG loss of SRGAN[2] by comparing the VGG layer before activation.

The complete implementation and training of ESRGAN can be found here.

Model Architecture

RRDB(Residual-in-residual Dense Block)

Refer to this post for the concepts and methods proposed by the paper “LARGE SCALE GAN TRAINING FOR HIGH FIDELITY NATURAL IMAGE SYNTHESIS”.

Limitations of Machine Specs

Apparently, the original BigGAN model was trained with an environment with enormous computation power and memory. Executing the 256x256 biggan in Colab will crash with any means, at least with a batch size larger than 4. Although it is not likely to be trainable in our local environment, we will review how to implement the techniques and model architecture proposed in the paper.

Captured from Reddit


The complete code for this post is available here, although produces an OOM(Out of…

Sieun Park

15-year-old Python enthusiast, Korean student interested in learning and implementing deep learning, machine learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store