Compression is broadly used across web and mobile applications, and Snap is no exception. In this post we’ll talk a bit about the fundamentals of compression, and then describe how we’ve recently invested in improving compression for video to achieve performant, high quality results.
Compression can be either lossless (saves on storage or network bandwidth at the expense of computation time), or lossy (more aggressive reduction in sizes, but creates imperfect results).
Lossless compression is used widely at Snap, in particular for reducing bandwidth consumption over the wire and in stable storage models. In practical applications, information payloads that are used by our business logic compress reasonably well even using standard simple libraries (e.g., Gzip, brotli and zstd are common and effective). It’s typically a net positive on overall app performance to compress these payloads - the computational cost incurred is well worth the wins in bandwidth and storage costs.
We apply these compression algorithms with limited ‘decisions’, apart from choice of algorithm, because a requirement of lossless compression is that the decompressed stream of bytes is identical to the source.
Lossy compression has a great deal more nuance, and is most commonly applied to media (video, audio, images). The ideal outcome is to create content that is perceptibly lossless, meaning that a user can’t tell the difference between the original uncompressed media content and the compressed result. In practice we make tradeoffs, as perceptually lossless content can often put undue burden on network bandwidth, and can also incur significant costs, both to the streaming service and in many cases the end user.
In most practical applications, video compression must be lossy. Without compression it would require ~1 Gbps of bandwidth to transmit uncompressed raw HD videos. In contrast, HD videos are usually served at 1 - 10 Mbps across different streaming service providers. This is achieved via lossy video codecs such as H.264/AVC, H.265/HEVC, VP9 or AV1. This is done by reducing the redundancy and subtlety of the video in the data stream. Video is a continuous sequence of similar images, so algorithms can further take advantage of this continuity between sequential images to further remove redundancies. The subtlety is the so-called “high frequency component” in signal processing, and the human perception is less sensitive to the subtle part. That is, the human visual system has its subjective way of perceiving video quality, not just differences in pixel values. Therefore, measuring the perceptual video quality of compressed video enables us to serve our Snapchatters videos with the best quality given the current circumstances.
In Snap, we use Video Multimethod Assessment Fusion (VMAF) to measure perceptual quality because it is highly correlated to human subjective ratings. VMAF compares the videos before and after compression, and its output rates quality from 0 (worst) to 100 (best). As we reduce bits while preserving quality, you may be curious about how much we can save. Fig. 1 is an example of reducing the bitrate but keeping perceptual quality.
Fig. 1: Comparison of the first frames before and after compression. The left is the original video at 3,968 Kb. The right is transcoded at 609 Kbps from the original. The bit rate is reduced to 16% of the original bitrate with VMAF = 100 (perceptually lossless).
In practice a major challenge with tuning video compression is to balance the trade off between performance and quality. If we focus on reducing bitrate without negatively impacting user engagement, we'd quickly converge to a local optimum, having optimized for performance while keeping the quality just good enough that people do not abandon the playback. Experiments that improve quality beyond this point often come back with a lot of seemingly negative results - regressions in device performance, network and battery usage. We know that better quality makes the playback experience more enjoyable, however the effect is not always immediately measurable, so we have to look beyond the obvious results and make principled trade-offs.
In addition to considerations about quality, we also have to consider both machine processing cost, which adds up when you’re processing billions of videos, and the latency incurred by the transcoding process. Newer codecs, such as H.265/HEVC, VP9 and AV1 are generally more efficient at compression, but require more resources, incurring additional cost and latency.
We covered elements of this topic in a prior Blog post about our GPU transcoding story. The Snap collaboration with NVIDIA continues to deepen and has helped us extract even more value from GPU computing for these workloads. We have had numerous technical deep dives on optimizing our transcoding pipeline for GPU based transcoding. This work has helped us to increase throughput and lower costs compared to CPU based transcoding. The NVIDIA team has also contributed to the open source project FFmpeg to ensure all the operations we need during transcoding are GPU accelerated, so that the video frames don’t have to be copied between CPU and GPU Memory. We are also collaborating with NVIDIA and Netflix to GPU accelerate VMAF measurement to enable real time video quality assessment during transcode, with very promising early results.
Video compression is one of the core technical considerations at Snap. It enables our users to share and enjoy the moments in daily lives. We share our laughs, tears, cares, loves, and many many moments in videos. We cherish these videos and want to present them in the best way. We are continuously evolving our video transcoding system.
Are you interested in technical challenges such as these? The Media Delivery team is looking for talented people around the world to join us in building out our vision. Or, you can browse all openings at careers.snap.com.