Challenge

Motivation

The domain of image compression has traditionally used approaches discussed in forums such as ICASSP, ICIP and other very specialized venues like PCS, DCC, and ITU/MPEG expert groups. This workshop and challenge was the first computer-vision event to explicitly focus on these fields. Many techniques discussed at computer-vision meetings have relevance for lossy compression. For example, super-resolution and artifact removal can be viewed as special cases of the lossy compression problem where the encoder is fixed and only the decoder is trained. But also inpainting, colorization, optical flow, generative adversarial networks and other probabilistic models have been used as part of lossy compression pipelines. Lossy compression is therefore a potential topic that can benefit a lot from a large portion of the CVPR community.

Recent advances in machine learning have led to an increased interest in applying neural networks to the problem of compression. At CVPR 2017, for example, one of the oral presentations discussed compression using recurrent convolutional networks. In recent CVPRs, multiple lossy and lossless compression works were presented. In order to foster more growth in this area, this workshop not only encourages more development but also establishes baselines, educates, and proposes a common benchmark and protocol for evaluation. This is crucial, because without a benchmark, a common way to compare methods, it will be very difficult to measure progress.

We propose hosting a lossy image and video compression challenge which specifically targets methods which have been traditionally overlooked, with a focus on neural networks (but also welcomes traditional approaches). Such methods typically consist of an encoder subsystem, taking images/videos and producing representations which are more easily compressed than pixel representations (e.g., it could be a stack of convolutions, producing an integer feature map), which is then followed by an arithmetic coder. The arithmetic coder uses a probabilistic model of integer codes in order to generate a compressed bit stream. The compressed bit stream makes up the file to be stored or transmitted. In order to decompress this bit stream, two additional steps are needed: first, an arithmetic decoder, which has a shared probability model with the encoder. This reconstructs (losslessly) the integers produced by the encoder. The last step consists of another decoder producing a reconstruction of the original images/videos.

In the computer vision community, many authors will be familiar with a multitude of configurations which can act as either the encoder or the decoder, but probably few are familiar with the implementation of an arithmetic coder/decoder. As part of our previous challenge, we therefore released a reference arithmetic coder/decoder in order to allow researchers to focus on the parts of the system for which they are experts. For the 2nd edition, we released a drop-in range coder/decoder in Tensorflow.

While having a compression algorithm is an interesting feat by itself, it does not mean much unless the results it produces compare well against other similar algorithms and established baselines on realistic benchmarks. In order to ensure realism, we have collected a set of images which represent a much more realistic view of the types of images which are widely available (unlike the well established benchmarks which rely on the images from the Kodak PhotoCD, having a resolution of 768x512, or Tecnick, which has images of around 1.44 megapixels). We will also provide the performance results from current state-of-the-art compression systems as baselines, like WebP and BPG. For the P-frame track, we will use an existing dataset which we will provide preprocessed to allow for easier training. Additionally, we will use an existing video dataset for evaluation and will not be creating new video content for this challenge.

Prizes

TBA

We note that the organizers will not participate in the challenge and other teams from Google, Twitter and ETH Zurich are not eligible for any prizes.

Discussion forum

TBA

Challenge Tasks

We will be running two challenge tracks: one at “low bit-rate” (similar to last year) and another lossy compression task targeted at compression of video P-frames.

Participants will need to submit a decoder executable that can run in the provided Docker environment and is capable of decompressing the submission files. We will impose reasonable limitations for compute and memory of the decoder executable.

Low-rate compression

For the low bit-rate track (which is similar to the one we ran at CLIC 2018), contestants will be asked to compress the entire dataset to 0.15 bpp (or smaller). The contestants with the top entries, as determined by a human perceptual rating task, will give a short talk. We will provide last year’s Professional and Mobile datasets (all splits) as the training data for this challenge track. A new test set will be generated for this year and released as part of the test phase.

Data

We provide the same two training datasets as we did last year: Dataset P (“professional”) and Dataset M (“mobile”). The datasets are collected to be representative for images commonly used in the wild, containing around two thousand images. The challenge will allow participants to train neural networks or other methods on any amount of data (it should be possible to train on the data we provide, but we expect participants to have access to additional data, such as ImageNet).

Participants will need to submit a decoder and a file for each validation or test image. The test dataset is going to be released at a later point. To ensure that the decoder is not optimized for the test set, we will require the teams to use one of the decoders submitted in the validation phase of the challenge.

The challenge data is released by the Computer Vision Lab of ETH Zurich, and can be downloaded here:

The total size of all compressed images should not exceed 4,722,341 bytes for the validation set for the low-rate track.

P-frame compression

The P-frame challenge will require entrants to compress a video frame conditioned on the previous image frame (P-frame compression). For this track, the dataset will be released before the competition is over. The model size will be factored into the compressed size (bits per pixel). This will encourage not only high compression performance, but small models that cannot overfit / memorize the available data.

Data

TBA.

Validation and Test phases

During the validation phase, participants are free to develop their method and submit decoders to the server. After the test set has been released, we will require the teams to use one of the decoders submitted in the validation phase of the challenge. This is to ensure that the decoder is not optimized for the test set.

Developement kit

TBA

FAQ

Does my model have to reconstruct images in full resolution or can it be cropped?

The decoder has to produce PNG images where each image has the same resolution as the corresponding image in the validation or test set.

How is PSNR calculated?

We compute a single MSE value by averaging across all RGB channels of all pixels of the whole dataset, and from that calculate a PSNR value.

The evaluation server gives “ERROR: Missing image IMG_20170114_210112.png”. What am I doing wrong?

The error means that the decoder failed and did not produce all required files. This could have many reasons. If the decoder works locally using our Docker environment but fails on the server, a likely explanation is that it uses too much memory.

In which directory should the decoder save images?

The decoder can save images in the current working directory . or in any arbitrary subfolder such as images.

Important Dates

All deadlines are 23:59:59 PST.

Date	Description
November 22th 2019	Development phase & announcement. The training part of the dataset released.
January 7th, 2020	The validation part of the dataset released, online validation server is made available.
March 13th, 2020	Final decoders for the challenge are expected to be submitted.
March 16th, 2020	Test set is released for conestants to compress.
March 20th, 2020	Encoded test set submission deadline. The competition is closed at this point.
March 23th, 2020	Paper and Factsheet submission deadline.per and Factsheet submission deadline.
April 6th, 2020	Paper decision notification.
Mid April, 2020	Camera ready deadline for CVPR
Mid May, 2020	End of human evaluation on both challenges. Results will be released online before the workshop.