There are three challenge tracks. In the image compression track, images need to be compressed to 0.075 bpp, 0.15 bpp, and 0.3 bpp (bits per pixel). In the video compression track, short video clips need to be compressed to around 1 Mbit/s. Finally, in the perceptual metric track, human preferences on pairs of images will have to be predicted. The image pairs will come from the decoders submitted to the image compression track.

Image compression

For the image compression track, contestants will be asked to compress the entire dataset to three different bit-rates, namely 0.075 bpp, 0.15 bpp, and 0.3 bpp. The winners of the competition will be chosen based on a human perceptual rating task in which pairs of decoded images (from different codecs) are presented to the user. The raters compare the images to the uncompressed image and chose the preferred codec. For guidance, several objective metrics will be shown on the leaderboard but not considered for prizes.


We provide training and validation sets of high quality images collected from Unsplash. The test set will contain images of similar quality from potentially different sources. Training on additional data is allowed. Participants will need to submit a decoder and encoded image files. The test dataset is going to be released after the validation phase ends.

The challenge data is hosted by the Computer Vision Lab of ETH Zurich and can be downloaded here ("professional" set of CLIC2020):


The total size of all compressed validation images should not exceed 857,362 bytes (0.075 bpp), 1,714,724 bytes (0.15 bpp), and 3,429,448 bytes (0.3 bpp), respectively. For the test set, these numbers are 1,477,209 bytes (0.075bpp), 2,954,417 bytes (0.15 bpp), and 5,908,833 bytes (0.3 bpp), respectively. This year we are further reducing the allowed model size which should not exceed 250MB.

An example submission targeting 0.15 bpp can be downloaded here:

Video compression

The video compression track will require entrants to compress short clips of 60 frames (roughly 2 seconds at 30 fps) at 720p resolution. Instead of splitting the dataset into training and test sets, in this track the entire dataset is released before the test phase.

To discourage overfitting, the model size is added to the compressed dataset size and the sum cannot exceed the target bit-rate of 1 Mbit per 30 frames. That is, participants should try to minimize both the dataset size and the model size. The winner will be determined based on MS-SSIM.


The video dataset consists of 562 videos (314,175 frames) taken from the UGC dataset. Each video is released as a zip file, and each zip file contains PNGs representing the frames of the video. Each frame is represented by 3 PNGs, one for each channel of a YUV encoding. This format was chosen because the Y-channel has twice the resolution of the other two channels. The data is released under a Creative Commons license.

To download (a subset of) the dataset, you may find download.sh of our devkit helpful. The entire dataset contains over 200GB of data.


The file video_targets_valid.txt contains the names of the frames evaluated during the validation phase (about 1.9% of the entire dataset). A submission consists of a decoder and encoded files. The decoder should take the encoded files and reproduce the PNGs. To estimate the combined size for the entire dataset, we use the following formula:

model_size + data_size / 0.019

This number should not exceed 1,309,062,500 bytes (1Mbit per 30 frames).

Perceptual metrics

In the perceptual metric track you will need to design a metric to rank the participants of the image compression task. Given a pair of images (one being the original and the other being a distorted image) your metric will need to generate a score. We can compare methods A and B by generating scores two scores d(O, A) and d(O, B), where O is the original image. If d(O, A) < d(O, B), the metric prefers method A over method B.

To evaluate your metric, we will compare the metric's preferences with the preferences of human raters. In our image compression evaluation, human raters are presented with three images (O, A, B) and asked to pick one of A or B. The winner of the perceptual metric challenge is chosen based on who predicted the correct preference the largest number of times.

Development Kit

The development kit is provided as an example of what is expected from participants. It is in no way intended to contain data representative of the final challenge simply because that is not possible. The final test set will be created from the files uploaded by the participants in the compression challenge, and as a result it’s simply impossible for us to provide data which will match that distribution in the validation set.

To get the devkit, use:

git clone https://github.com/fab-jul/clic2021-devkit.git


The full details of the tasks are contained in the README file of the repository.

Sponsored by