1. Timed Text at Netflix
Rohit Puri (rpuri@netflix.com)
Engineering Manager, Video Systems
Digital Supply Chain, Netflix
February 19, 2016 0
2. The Netflix Content Processing System
• Video systems team develops cloud-scalable systems and
tools
– Audio/Video: ingestion/inspections and packaging/DRM
• e.g., IMF, QuickTime, and MP4-DASH
– Timed Text: entire processing pipeline
• e.g., W3C TTML, W3C WebVTT
1
Ingestion
and
Inspections
Packaging
and
DRM
Trans-coding
source
from
content
partner
downloadable
to CDN
February 19, 2016
3. Acknowledgements
• Dae Kim (dakim@netflix.com)
• Shinjan Tiwary (stiwary@netflix.com)
• Harold Sutherland (hsutherland@netflix.com)
• David Ronca (dronca@netflix.com)
• Glenn Adams (Skynav Inc.) - member W3C
Timed Text Working Group (TTWG)
February 19, 2016 2
4. Netflix after January 6, 2016
• > 75 million subscribers
• ~ 190 countries
• > 12,000,000,000 hours streamed in Q4 2015
• 20+ languages
3February 19, 2016
5. Talk Outline
• History and Legacy Workflow
• New Workflow
• Standards Activity and Roadmap
4February 19, 2016
6. Talk Outline
• History and Legacy Workflow
• New Workflow
• Standards Activity and Roadmap
5February 19, 2016
7. A Brief History of Timed Text
• Latin Alphabet (2014 and before)
– First subtitles delivered in 2009. Bottom-centered, yellow,
italics and underline options (dfxp-simplesdh)
– Follow-up 2012 offering. multiple colors, background,
outline, generic font family, font size, position information
(dfxp-ls-sdh) (ls => ‘less simple’)
– First WebVTT output profile was created in 2013
• Global Subtitles (2015+)
– TTML2 (Timed Text Markup Language) based Japanese
subtitles in 2015
– Bidirectional text with worldwide launch in January 2016
February 19, 2016 6
8. Legacy Workflow (2014 and before)
• Two step-procedure
– source inspection
– source conversion to output
• Sources
– CEA-608 based Scenarist Closed Captions (.scc)
– EBU Subtitling data exchange format (.stl)
– SubRip (.srt)
– Timed Text Markup Language 1 (.ttml, .dfxp)
• Outputs
– feature restricted TTML1 outputs
– feature restricted WebVTT outputs
February 19, 2016 7
11. Talk Outline
• History and Legacy Workflow
• New Workflow
• Standards Activity and Roadmap
10February 19, 2016
12. Japanese Subtitles
• Essential Features
– vertical text
– ruby annotations
– horizontal-in-vertical
– Unicode
11February 19, 2016
13. Japanese Subtitles Challenges
• New source format - Videotron Lambda (.cap)
– inaccessible specification
– ambiguity in format
– non-interoperable vendor implementations
• TTML1 does not support essential Japanese features,
TTML2 needed
• Netflix device SDK did not support essential rendering
features
– delivery of image-based subtitles
– significant investment in open source software (OSS) tools
12February 19, 2016
14. Japanese Subtitles Conversion
Workflow (1-off)
• .cap sources converted to TTML2, and then to image subtitles or
webVTT (webVTT specification appears incomplete w.r.t. Japanese)
• Image subtitles archive contains .png images + manifest with timing
and positioning information
• Green modules available as OSS (https://github.com/skynav/ttt)
13
cap2tt
TTPETTX
WebVTT writer
Archiver
ttml2
ttml2 ISD .png
images
February 19, 2016
15. New Workflow: Inspections
14
TTML2
based
canonical
model
Semantic
inspections
STL parser
and syntax
inspections
SCC parser
and syntax
inspections
CAP parser
and syntax
inspections
TTML1,
TTML2,
IMSC1
SRT parser
and syntax
inspections
TTML family
parser and
syntax
inspections
model
serialized
to disk
Source
format and
charset
detection
input
file
February 19, 2016
16. New Workflow: Conversions
• Configurable filter-based architecture
• Bank of model-domain filters
• Output writer generates text or images
15
model-domain filter chain
deserialized
model FilterN
Output generator
Filter1
output
specific
filters
output
writer
output
file
February 19, 2016
17. Global Subtitles
• i18n-grade timed text processing workflow
• All languages of this world (+Klingon)
• New TTML2-based output profile (“nflx-ttml-gsdh”)
– vertical text
– horizontal-in-vertical
– ruby annotations
– bidirectional text
• Both text and image delivery options
16February 19, 2016
18. Talk Outline
• History and Legacy Workflow
• New Workflow
• Standards Activity and Roadmap
17February 19, 2016
19. Our Experience with Source Formats
• SCC
– limited to Latin alphabet
– non-standard use of SMPTE timecode
• EBU-STL
– dated1
, ambiguous, non-interoperable industry practices
– no support for Asian character set
• SRT
– Too simple - positioning information not used in practice
• LambdaCAP
– ambiguous format - hard to find official specification
• TTML1
– not self-contained - needs “Document Processing Context”
– no support for Japanese rendering features
18
1
“The medium for exchange is a 3.5-inch high-density portable magnetic disk (microfloppy). The disk is formatted for 1.44
Mbytes (2 sides, 80 tracks, 18 sectors/track).”
20. TTWG Standards
• IMSC1 (Internet Media Subtitles and Captions) is TTML1-
based W3C candidate recommendation; mandatory for IMF
– multiple Netflix sponsored implementations were announced to
TTWG on February 1, 2016
– Netflix plans 100% support for IMF
– Netflix ingest implementation will support IMSC-T (text profile)
• We have multiple TTML2 implementations in development -
will support TTML2-based IMSC2
• Netflix is enthusiastic about HTMLCue
19February 19, 2016
21. Open Source Activity
• regxmllib (https://github.com/sandflow/regxmllib)
– sponsored by Netflix
– tools that provide essential building blocks for authoring of
IMF CPL
• ttt (https://github.com/skynav/ttt)
– sponsored by Netflix
– tools for validation and rendering of TTML1/2
• photon (https://github.com/Netflix/photon) (December 2015)
– developed at Netflix
– complete set of tools for validation of IMF packages
February 19, 2016 20
22. Talk Summary
• Subtitles experience core to Netflix business
• Netflix committed to TTWG standards
– multiple IMSC1 and TTML2 implementations declared or
in flight
• Netflix plans to be 100% IMF
• Netflix is actively involved in OSS efforts around
timed text as well as IMF
21February 19, 2016
23. 2015 Tech Emmy for Netflix
• Netflix was a co-recipient of 2015 Technology and
Engineering Emmy Award “Standardization and Pioneering
Development of Non-Live Broadcast Captioning”
22