Here are my slides from Demuxed 2017.
This talk aims to shed light on colorspaces - what they are, how and why they work, why we should care about handling edge cases properly. Starting with historical design choices, venturing through current standards such as BT.709, and arriving at modern times with High Dynamic Range, the focus will be on practical applications on the web and in broadcast.
Unblocking The Main Thread Solving ANRs and Frozen Frames
Color me intrigued: A jaunt through color technology in video
1. color me intrigued
a jaunt through color technology in video
Vittorio Giovara
vittorio@vimeo.com
twitter.com/kodabb
koda on freenode irc
2. me and my team
• senior video encoding
engineer at Vimeo
• open source
contributor
• videolan association
member
• ffmpeg/libav
developer
• Vimeo is a web
streaming platform
founded in 2004
• the first one to stream
hd videos in 2007
• also the first one to
stream color-accurate
videos in 2013
3. the problem
• there is no denying colorimetry is a complex topic
• we need of some shared background and common
expressions to build upon
• unfortunately colorimetry has some obscure
terminology, often used wrongly, and as result there is
a lot of confusion around
• let's start from the very beginning
4. eyes
• colors are a wavelength of the
light or the frequency of the
light changes how we see it
• our eyes simply carry light
information to our brain, where
an image is recomposed
• the pupil is a lens that
converges light on the retina
• there are several million (~126)
cells suited to capture light and
convert it to neural impulses
• they are called rods and cones
http://wolfcrow.com/blog/notes-by-dr-optoglass-the-human-eye-part-i/
5. the RGB model
• there are three types of cone cells that measure
different wavelength of light as well as luminance and
color intensity
✓ L - red light (564-580 nm) + luminance
✓ M - green light (534-545 nm) + first color component
✓ S - blue light (420-440 nm) + second color
component
• the first model defined in 1920 is the RGB model,
followed by XYZ in 1931
6. need of a model
• light is theoretically
infinite, but it is
unpractical to use an
infinite model, a finite
Euclidean 3D field is
much better
• so we agree on how to
map certain wavelength
of light to certain colors
• most models are based
on how the eye work
8. color space
• a chromaticity diagram is a cross-section of a 3D space with
constant luminance (Y=0.5 in the example), it represents all
colors visible to the human eye on a standard cartesian
system
• there are several variations with different properties (eg. cie
1976 Yu'v' uniform diagram)
• if color models give us a language, the color space gives us
the grammar, that is, the rules for transferring color
information in a mathematically precise way
• a color space is a triangle defined by three main coordinates
(called primaries) and a white center point
9. color primaries
• the amount of color information that can be carried in a
color space is called gamut
• how to select a sensible color space?
• there are no three points within the gamut that form a
triangle including the entire gamut of human vision; or
simply put, the gamut of human vision is not a triangle
• we need some pointers
13. specifications
• color spaces are well defined in a specification
• the most prolific standardization body is the ITU with
the BT series, followed by SMPTE, CIE, and DCI
• a color space specification describes three fundamental
properties
✓ the color primaries
✓ the matrix coefficients
✓ the transfer function
14. matrix coefficients
• standard RGB is not an efficient way of transport for
analog or digital video -- there are other encoding
systems for carrying the same video information
• the most widespread one is Y'CbCr, which encodes the
three primary colors in luma (Y') and two color
components (Cb and Cr), via matrix multiplication
• the coefficients are constants defined in a specification,
converting a given RGB gamut to another encoding
system
15. matrix coefficients
• this encoding system may need additional headroom and footroom,
limiting the range of available values, however the coefficients defined
in a specification (bt709 in this example) refer to the full range
16. transfer characteristics
• function introduced to take into account the signal response to
luminance of old CRTs, resulting in more light being allocated to
darker colors
• by pure coincidence, the signal curve of a CRT follows the
property of the HVS, which makes our eyes more sensitive to
variations of shades of black
• luminance is multiplied with a function describing some sort of
curve (whose exponent is aptly called gamma), compressing a
linear signal in a non-linear one
• in order to display the correct colors (or convert them), the inverse
of the encoding function needs to be applied (gamma expansion)
18. transfer characteristics
• no actual standard until 2011 (bt1886)
• traditional functions are calibrated for screens of
100cd/m2 (or nits), with a standard dynamic range
(SDR)
• at around 16 bits there is no need of any non-linearity
(Barten threshold)
• unfortunately not very practical right now, so research
has focused on better curves
19. hdr
• high dynamic range goes past the traditional luminance
limitations expanding the amount of visible colors
• this comes at the cost of upgrading everything in the
transmission and delivery chain, from cameras to displays
20. new terminology
• while updating the chain, terminology got updated as well
✓ OETF (opto-electro transfer function) describes the
luminance-to-digital encoding function, the signal
represents the light captured by the camera
✓ EOTF (electro-optic transfer function) describes the digital-
to-luminance decoding function, the signal represents the
light displayed on a screen
✓ OOTF (opto-optic transfer function) describes the
luminance-to-luminance decoding function, the "rendering
intent"
21. specs and standards
• more colors need 10 bit coding
✓ sometimes called WCG (wide color gamut)
✓ offers more colors, and better compression at the cost of
additional encoding time and complexity
✓ a lesson learned from broadcasters is deploying new codecs
(hevc and/or vp9) at 10 bit only
• several standards emerged, the most widespread ones are
based on the HLG or PQ curves, gathered in bt2100
specification
• HDR to HDR and some generic advice for tonemapping is
present in bt2390
22. hlg
• developed by BBC and NHK, standardized as arib std-b67
• screen referred OETF, doesn't have a luminance limit
(theoretically, in practice it works well up to 5000 nits) as it
can vary its curve to use the maximum luminance available
• backward compatible with bt1886 EOTF, which is the one
bt709 and bt2020 use, at the cost of 20% less brightness
• it supports light adjustment, which helps managing the
viewing conditions at home
• mostly used in broadcast and archival
24. hlg - surround illumination
HLG behavior on a 1000 nits display
https://www.lightillusion.com/uhdtv.html
25. pq
• developed by Dolby, standardized in smtpe2084
• display referred absolute EOTF, modeled after a
theoretical reference display with a luminance of 10000
nits
• each luminance level has an equivalent bit level
https://www.lightillusion.com/uhdtv.html
26. hdr10
• there is no real hdr10 standard, just a collection of
specifications (usually bt2020 + smpte2084 +
smpte2086) statically embedded in the file
• can be streamed with HLS and DASH with the proper
tags (VIDEO-RANGE and EssentialProperty)
• since it's not backwards compatible it needs a separate
rendition with tone map
• a variant of this specification is hdr10+
27. hdr10 - mastering display colour volume
• unfortunately there is no screen capable of displaying the
amount of colors defined in bt2020
• to work around this, additional metadata is needed,
embedding the color primaries of the screen where the video
is mastered (and then usually forwarded to the screen via
HDMI for correct rendering or tonemapping)
• this essentially describes the OOTF (artistic intent)
embedded in the PQ signal
• this generates the concept of a bt2020 container: the actual
color space is within the bounds of the nominal bt2020
gamut
29. hdr10 - content light level information
• the same applies to luminance, there is no screen
capable of emitting 10000 nits of light (and it is probably
uncomfortable to watch)
• so there is additional metadata that defines the peak
light present in the video sequence (as well an average
per frame) to calibrate the display luminosity
• without this setting, the overall image will be clipped
• if this setting is misconfigured, the video will appear
desaturated
30. • any given PQ based HDR display will only use a subset
of the full signal range with the current generation of
displays
• ie. 1000 nits display will max out a 769 bits, clipping 254
possible levels
hdr10 - content light level information
https://www.lightillusion.com/uhdtv.html
519
769
nits
924
1024
levels
31. dynamic metadata
• standardized in smpte2094 which defines several
"applications", such as Dolby Vision
• works in hevc (with a different fourcc tag dvh1) or in a
frankenstein stream, h264 with hevc enhancement
layer
• comes with added patents and unclear license terms
• hard to choose a winner as there is no way to
determine visual quality for hdr (yet?)
32. Updated Oct 2017
State of HDR support
http://www.flatpanelshd.com/focus.php?subaction=showfull&id=1505390780
33. codecs
• in order to convey color information, the codec needs
to have the means to signal it correctly
• bonus points if the specification correctly explains
each property one by one
• these are frame properties, meaning that they can
change at every frame, so storing these at the
container level is not desirable
• let's see a few examples
34. mpeg/itu
• h264 and hevc are first
class citizens for
colorimetry
• all properties are listed
and described
• grouped together in
ISO/IEC 23001-8 and
23091-9 "Coding-
independent code points"
• 10 bit present in High10
and Main10
35. google VP9
• 10 bit defined a bit later, but present in Profile 2 and Profile 3,
available in 2nd generation chipsets
• only one main property, unclear interpretation in the
specification
• not much space in color_space, only few values allowed,
not at all extensible (e.g. no room for HDR)
36. google VP9
• this is worked-around by extending the values in
container
• webm/matroska got new values in the Colour
element
• mp4 got custom boxes (SmDm with 0.16 fixed points
and CoLL) for HDR and a new sample description
entry (vpcC duplicating most of colr box values)
• this may lead to potential incompatibility, prevents
using dynamic metadata and is overall messy
(codecs="vp09.02.10.10.01.09.16.09.01")
37. aom AV1
• situation is a little better (thanks Netflix) and evolving
• properties are signaled in the sequence header, no need to
tweak the container, there is room for hdr metadata
• still some peculiar colorimetry choices (ie. no primaries field,
custom value numbering, 0.16 fixed point numbers)
38. browsers
• assuming that the codec correctly tags the video with
the proper color parameters, whichever software
decodes it need to export this information and possibly
pass it along to the rendering engine
• video players usually apply these properties in
OpenGL or directX according to the screen
capabilities
• do browsers support converting color spaces
correctly?
• "it depends"
39. safari (and
quicktime)
• Apple Safari is the
browser supporting most
color space properties
regardless of platform and
architecture
• ... only thanks to
QuickTime
• 10 bit is supported for
HEVC decoding
• first to get HDR on the
desktop (with the right
configuration)
40. firefox and chrome
• firefox and chrome have only
basic color support, only
available since a year ago
• they both took quite a long
time to get it right (basically
ignoring the issues until
YouTube started serving bt709
files)
• chrome implemented it first,
but used the wrong
coefficients, so until firefox
fixed the common library
neither browser correctly
supported any color
conversion
41. firefox and chrome
• chrome has a full 10
bit pipeline for any
codec, firefox is
(slowly) working on it
for VP9
• there are some minor
efforts for bt2020 and
hdr support
notice i didn't add the apostrophe
cognitive dissonance
crt had the property of twice voltage -> twice brightness
gamma expansion -> linearize
gamma compression -> delinearize
conventionally a gamma corrected signal is denoted by an apostrophe (so Y represents luminance and Y' luma)
Rec 709
• Specifies a standard OETF that video capture devices should use
• This OETF was designed to match the natural EOTF of CRTs (described in Rec 1886)
Rec 1886
• Describes the natural EOTF of CRTs (approximately L = V^2.4)
• Used by flat-panel reference displays to output Rec 709 video
• Intended for flat-panel reference display interoperability with CRT reference displays
• Intended to be employed together with Rec 2035
Rec 709
• Specifies a standard OETF that video capture devices should use
• This OETF was designed to match the natural EOTF of CRTs (described in Rec 1886)
Rec 1886
• Describes the natural EOTF of CRTs (approximately L = V^2.4)
• Used by flat-panel reference displays to output Rec 709 video
• Intended for flat-panel reference display interoperability with CRT reference displays
• Intended to be employed together with Rec 2035
(sensor present computers and tv)
to alter the system gamma, as shown below for a 1000 Nit display
standard behind paywall
no real reason to use either (unless you have vested interests)
to further prove my point