Improving Content-Aware Encoding by Adaptation to “True Resolution” of the Content
Originally Aired - Sunday, April 16 | 10:20 AM - 10:40 AM PT
This Session Has Not Started Yet
Be sure to come back after the session starts to have access to session resources.
Per-title, content-aware, and context-aware encoding (CAE) techniques have already established themselves as effective for reducing bandwidth and delivery costs in OTT streaming. While initially applied only in VOD cases and full sequences, the more recent “per-scene”, or “per-shot” variants are now also fairly common. Dynamic resolution encoding (DRE) is effectively a variant of a “per-scene” technique, applying encoder-level resolution changes in a single stream. New codecs such as VVC enable resolution changes natively as part of the syntax capabilities of the elementary streams.
However, dynamic lowering of resolutions as an encoder-level optimization technique is not always safe. E.g. it may alter the artistic appearance of film-grain, background textures, or other fine details. Some encoders may handle such special cases more successfully than others, but fundamentally, this brings a question: given a scene in a mezzanine file, can we establish some safe “minimum” or “true” resolution of the content to which it can be reduced without removing essential properties of the content?
As we will show in this presentation, the answer to this question - from a classical signal processing standpoint is quite simple, and there is a number of techniques that can be successfully employed to detect the minimum sufficient sampling rate of the signal. And once such minimum sufficient or “true” resolution is detected - the rest of the content-aware encoder or content-aware ABR-profile generator becomes trivial - just limit top resolution to the “true” resolution level detected by the analysis of the scene. In other words, this brings a safe and separable method of adaptation to “true” resolution without the need to replace the encoder or CAE profile generator.
The rest of the paper is dedicated to studying the effectiveness of this technique in practice. To perform this study, we use over 500 hours of real-world video programming, provided as contribution mezzanines, and covering over 35 categories of content of various kinds (sports, news, games, movies, mixed broadcast programming, etc.) and representative of different production, post-production, and distribution processes (SD broadcast, HD broadcast, professionally mastered movies (HD and UHD), raw recordings, user-generated content, etc). We report performance results for each category by first applying traditional CAE encoding and then by repeating the encoding with CAE + “true resolution” detector. The results include comparisons using several resolution-sensitive quality metrics (SQRI, PQR) in addition to more common metrics used in video compression (PSNR, SSIM, VMAF).
The results are quite surprising and show very significant gains (up to 30% in additional reductions in bitrates and storage costs) achievable by the “true resolution” detection technique vs the reference CAE method. We conjecture that the main reason for this - is that a significant percentage of video content used in practice is oversampled!