Reproducing Results Of Super Slomo

1 minute read

Reproducing results of Super-SloMo

Super-SloMo
PyTorch implementation of “Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation” by Jiang H., Sun D., Jampani V., Yang M., Learned-Miller E. and Kautz J Github

RIFE
PyTorch implementation of “RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation” by Huang, Zhewei and Zhang, Tianyuan and Heng, Wen and Shi, Boxin and Zhou, Shuchang Github

ViT
Tensorflow implementation of “Vision Transformer” by Alexey Dosovitskiy†, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit and Neil Houlsby†. Github
(*) equal technical contribution, (†) equal advising.


Performance indicators

Super-SloMo




Evaluation metric: interpolation error (IE), which is defined as root-mean-squared (RMS) difference between the ground-truth image and the interpolated image.

RIFE

RIFE and SloMo evaluation reported Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) scores.
RIFE and RIFE-Large is shown to perform better than SloMo quantitatively on UCF101.


Inference on pretrained models.
Test on Video Frame Interpolation models

  • Google Colab Super-SloMo
    • Dataset: adobe240fps
    • Inference output: converts video to a slomo or high fps video
    • create_dataset.py script uses ffmpeg to extract frames from videos.
  • Google Colab RIFE
    • Model type: HDv2 pretrained model
    • Dataset: HD - Bao et al, consists of a collection of 11 high-resolution videos for evaluation. The HD dataset consists of four 1080p, three 720p and four 1280 × 544 videos. The motions in this benchmark are larger than other benchmarks.
    • inference_video.py: converts video to a high fps mp4 video
    • inference_img.py: converts images (png files) to a mp4 video or GIF

Upon testing RIFE takes significantly less time than Super-SloMo.

RIFE paper -

Current flow-based models usually need to be run twice to get the bi-directional flow. Thus our intermediate flow estimation process runs around 6 − 30 times faster than previous methods. Thus IFNet provides the possibility of developing a real-time flow-based VFI algorithm.

Test on Vision Transformer model
Google Colab ViT

requirements.txt file modified.


References

Link to paper Super-SloMo
Link to paper RIFE
Link to paper ViT

Updated: