SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Timed Text at Netflix
Rohit Puri (rpuri@netflix.com)
Engineering Manager, Video Systems
Digital Supply Chain, Netflix
February 19, 2016 0
The Netflix Content Processing System
• Video systems team develops cloud-scalable systems and
tools
– Audio/Video: ingestion/inspections and packaging/DRM
• e.g., IMF, QuickTime, and MP4-DASH
– Timed Text: entire processing pipeline
• e.g., W3C TTML, W3C WebVTT
1
Ingestion
and
Inspections
Packaging
and
DRM
Trans-coding
source
from
content
partner
downloadable
to CDN
February 19, 2016
Acknowledgements
• Dae Kim (dakim@netflix.com)
• Shinjan Tiwary (stiwary@netflix.com)
• Harold Sutherland (hsutherland@netflix.com)
• David Ronca (dronca@netflix.com)
• Glenn Adams (Skynav Inc.) - member W3C
Timed Text Working Group (TTWG)
February 19, 2016 2
Netflix after January 6, 2016
• > 75 million subscribers
• ~ 190 countries
• > 12,000,000,000 hours streamed in Q4 2015
• 20+ languages
3February 19, 2016
Talk Outline
• History and Legacy Workflow
• New Workflow
• Standards Activity and Roadmap
4February 19, 2016
Talk Outline
• History and Legacy Workflow
• New Workflow
• Standards Activity and Roadmap
5February 19, 2016
A Brief History of Timed Text
• Latin Alphabet (2014 and before)
– First subtitles delivered in 2009. Bottom-centered, yellow,
italics and underline options (dfxp-simplesdh)
– Follow-up 2012 offering. multiple colors, background,
outline, generic font family, font size, position information
(dfxp-ls-sdh) (ls => ‘less simple’)
– First WebVTT output profile was created in 2013
• Global Subtitles (2015+)
– TTML2 (Timed Text Markup Language) based Japanese
subtitles in 2015
– Bidirectional text with worldwide launch in January 2016
February 19, 2016 6
Legacy Workflow (2014 and before)
• Two step-procedure
– source inspection
– source conversion to output
• Sources
– CEA-608 based Scenarist Closed Captions (.scc)
– EBU Subtitling data exchange format (.stl)
– SubRip (.srt)
– Timed Text Markup Language 1 (.ttml, .dfxp)
• Outputs
– feature restricted TTML1 outputs
– feature restricted WebVTT outputs
February 19, 2016 7
Legacy Workflow: Inspections
8
Proprietary
Canonical
model
Semantic
inspections
STL parser
and syntax
inspections
SCC parser
and syntax
inspections
TTML1 parser
and syntax
inspections
.scc
.stl
.ttml
SRT parser
and syntax
inspections
.srt
Legacy Workflow: Conversions
9
Proprietary
Canonical
model
STL parser
SCC parser
TTML1 parser
.stl
.ttml
SRT parser
.srt
TTML
writer
WebVTT
writer
.scc
Talk Outline
• History and Legacy Workflow
• New Workflow
• Standards Activity and Roadmap
10February 19, 2016
Japanese Subtitles
• Essential Features
– vertical text
– ruby annotations
– horizontal-in-vertical
– Unicode
11February 19, 2016
Japanese Subtitles Challenges
• New source format - Videotron Lambda (.cap)
– inaccessible specification
– ambiguity in format
– non-interoperable vendor implementations
• TTML1 does not support essential Japanese features,
TTML2 needed
• Netflix device SDK did not support essential rendering
features
– delivery of image-based subtitles
– significant investment in open source software (OSS) tools
12February 19, 2016
Japanese Subtitles Conversion
Workflow (1-off)
• .cap sources converted to TTML2, and then to image subtitles or
webVTT (webVTT specification appears incomplete w.r.t. Japanese)
• Image subtitles archive contains .png images + manifest with timing
and positioning information
• Green modules available as OSS (https://github.com/skynav/ttt)
13
cap2tt
TTPETTX
WebVTT writer
Archiver
ttml2
ttml2 ISD .png
images
February 19, 2016
New Workflow: Inspections
14
TTML2
based
canonical
model
Semantic
inspections
STL parser
and syntax
inspections
SCC parser
and syntax
inspections
CAP parser
and syntax
inspections
TTML1,
TTML2,
IMSC1
SRT parser
and syntax
inspections
TTML family
parser and
syntax
inspections
model
serialized
to disk
Source
format and
charset
detection
input
file
February 19, 2016
New Workflow: Conversions
• Configurable filter-based architecture
• Bank of model-domain filters
• Output writer generates text or images
15
model-domain filter chain
deserialized
model FilterN
Output generator
Filter1
output
specific
filters
output
writer
output
file
February 19, 2016
Global Subtitles
• i18n-grade timed text processing workflow
• All languages of this world (+Klingon)
• New TTML2-based output profile (“nflx-ttml-gsdh”)
– vertical text
– horizontal-in-vertical
– ruby annotations
– bidirectional text
• Both text and image delivery options
16February 19, 2016
Talk Outline
• History and Legacy Workflow
• New Workflow
• Standards Activity and Roadmap
17February 19, 2016
Our Experience with Source Formats
• SCC
– limited to Latin alphabet
– non-standard use of SMPTE timecode
• EBU-STL
– dated1
, ambiguous, non-interoperable industry practices
– no support for Asian character set
• SRT
– Too simple - positioning information not used in practice
• LambdaCAP
– ambiguous format - hard to find official specification
• TTML1
– not self-contained - needs “Document Processing Context”
– no support for Japanese rendering features
18
1
“The medium for exchange is a 3.5-inch high-density portable magnetic disk (microfloppy). The disk is formatted for 1.44
Mbytes (2 sides, 80 tracks, 18 sectors/track).”
TTWG Standards
• IMSC1 (Internet Media Subtitles and Captions) is TTML1-
based W3C candidate recommendation; mandatory for IMF
– multiple Netflix sponsored implementations were announced to
TTWG on February 1, 2016
– Netflix plans 100% support for IMF
– Netflix ingest implementation will support IMSC-T (text profile)
• We have multiple TTML2 implementations in development -
will support TTML2-based IMSC2
• Netflix is enthusiastic about HTMLCue
19February 19, 2016
Open Source Activity
• regxmllib (https://github.com/sandflow/regxmllib)
– sponsored by Netflix
– tools that provide essential building blocks for authoring of
IMF CPL
• ttt (https://github.com/skynav/ttt)
– sponsored by Netflix
– tools for validation and rendering of TTML1/2
• photon (https://github.com/Netflix/photon) (December 2015)
– developed at Netflix
– complete set of tools for validation of IMF packages
February 19, 2016 20
Talk Summary
• Subtitles experience core to Netflix business
• Netflix committed to TTWG standards
– multiple IMSC1 and TTML2 implementations declared or
in flight
• Netflix plans to be 100% IMF
• Netflix is actively involved in OSS efforts around
timed text as well as IMF
21February 19, 2016
2015 Tech Emmy for Netflix
• Netflix was a co-recipient of 2015 Technology and
Engineering Emmy Award “Standardization and Pioneering
Development of Non-Live Broadcast Captioning”
22
Questions?
• Rohit Puri (rpuri@netflix.com)
• https://www.linkedin.com/in/rohit-puri-0a13b02
February 19, 2016 23

Mais conteúdo relacionado

Semelhante a Timed Text At Netflix

Fuzzing Janus @ IPTComm 2019
Fuzzing Janus @ IPTComm 2019Fuzzing Janus @ IPTComm 2019
Fuzzing Janus @ IPTComm 2019Lorenzo Miniero
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...
WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...
WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...Amir Zmora
 
LORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositLORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositFIAT/IFTA
 
WebRTC standards update - November 2014
WebRTC standards update - November 2014WebRTC standards update - November 2014
WebRTC standards update - November 2014Victor Pascual Ávila
 
Upperside Webinar - WebRTC Standards Update
Upperside Webinar - WebRTC Standards UpdateUpperside Webinar - WebRTC Standards Update
Upperside Webinar - WebRTC Standards UpdateUppersideConferences
 
Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019Lorenzo Miniero
 
What is Software Development by Thesys Tech Head of Development
What is Software Development by Thesys Tech Head of DevelopmentWhat is Software Development by Thesys Tech Head of Development
What is Software Development by Thesys Tech Head of DevelopmentProduct School
 
Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!All Things Open
 
MPEG DASH at IMTC
MPEG DASH at IMTCMPEG DASH at IMTC
MPEG DASH at IMTCIMTC
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinalProf. Wim Van Criekinge
 
DevOps and 5G cloud native solutions supported by Computaris automated testin...
DevOps and 5G cloud native solutions supported by Computaris automated testin...DevOps and 5G cloud native solutions supported by Computaris automated testin...
DevOps and 5G cloud native solutions supported by Computaris automated testin...Computaris
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsMonal Daxini
 
Web Services for the Internet of Things
Web Services for the Internet of ThingsWeb Services for the Internet of Things
Web Services for the Internet of ThingsMarkku Laine
 

Semelhante a Timed Text At Netflix (20)

Fuzzing Janus @ IPTComm 2019
Fuzzing Janus @ IPTComm 2019Fuzzing Janus @ IPTComm 2019
Fuzzing Janus @ IPTComm 2019
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...
WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...
WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...
 
LORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositLORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal deposit
 
WebRTC standards update - November 2014
WebRTC standards update - November 2014WebRTC standards update - November 2014
WebRTC standards update - November 2014
 
TechTalk: Connext DDS 5.2.
TechTalk: Connext DDS 5.2.TechTalk: Connext DDS 5.2.
TechTalk: Connext DDS 5.2.
 
Upperside Webinar - WebRTC Standards Update
Upperside Webinar - WebRTC Standards UpdateUpperside Webinar - WebRTC Standards Update
Upperside Webinar - WebRTC Standards Update
 
Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019
 
What is Software Development by Thesys Tech Head of Development
What is Software Development by Thesys Tech Head of DevelopmentWhat is Software Development by Thesys Tech Head of Development
What is Software Development by Thesys Tech Head of Development
 
Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!
 
MPEG DASH at IMTC
MPEG DASH at IMTCMPEG DASH at IMTC
MPEG DASH at IMTC
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
 
cv
cvcv
cv
 
cv
cvcv
cv
 
DevOps and 5G cloud native solutions supported by Computaris automated testin...
DevOps and 5G cloud native solutions supported by Computaris automated testin...DevOps and 5G cloud native solutions supported by Computaris automated testin...
DevOps and 5G cloud native solutions supported by Computaris automated testin...
 
Understanding linport
Understanding linportUnderstanding linport
Understanding linport
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
Data streaming
Data streamingData streaming
Data streaming
 
Web Services for the Internet of Things
Web Services for the Internet of ThingsWeb Services for the Internet of Things
Web Services for the Internet of Things
 
FDMEE Custom Reports
FDMEE Custom ReportsFDMEE Custom Reports
FDMEE Custom Reports
 

Timed Text At Netflix

  • 1. Timed Text at Netflix Rohit Puri (rpuri@netflix.com) Engineering Manager, Video Systems Digital Supply Chain, Netflix February 19, 2016 0
  • 2. The Netflix Content Processing System • Video systems team develops cloud-scalable systems and tools – Audio/Video: ingestion/inspections and packaging/DRM • e.g., IMF, QuickTime, and MP4-DASH – Timed Text: entire processing pipeline • e.g., W3C TTML, W3C WebVTT 1 Ingestion and Inspections Packaging and DRM Trans-coding source from content partner downloadable to CDN February 19, 2016
  • 3. Acknowledgements • Dae Kim (dakim@netflix.com) • Shinjan Tiwary (stiwary@netflix.com) • Harold Sutherland (hsutherland@netflix.com) • David Ronca (dronca@netflix.com) • Glenn Adams (Skynav Inc.) - member W3C Timed Text Working Group (TTWG) February 19, 2016 2
  • 4. Netflix after January 6, 2016 • > 75 million subscribers • ~ 190 countries • > 12,000,000,000 hours streamed in Q4 2015 • 20+ languages 3February 19, 2016
  • 5. Talk Outline • History and Legacy Workflow • New Workflow • Standards Activity and Roadmap 4February 19, 2016
  • 6. Talk Outline • History and Legacy Workflow • New Workflow • Standards Activity and Roadmap 5February 19, 2016
  • 7. A Brief History of Timed Text • Latin Alphabet (2014 and before) – First subtitles delivered in 2009. Bottom-centered, yellow, italics and underline options (dfxp-simplesdh) – Follow-up 2012 offering. multiple colors, background, outline, generic font family, font size, position information (dfxp-ls-sdh) (ls => ‘less simple’) – First WebVTT output profile was created in 2013 • Global Subtitles (2015+) – TTML2 (Timed Text Markup Language) based Japanese subtitles in 2015 – Bidirectional text with worldwide launch in January 2016 February 19, 2016 6
  • 8. Legacy Workflow (2014 and before) • Two step-procedure – source inspection – source conversion to output • Sources – CEA-608 based Scenarist Closed Captions (.scc) – EBU Subtitling data exchange format (.stl) – SubRip (.srt) – Timed Text Markup Language 1 (.ttml, .dfxp) • Outputs – feature restricted TTML1 outputs – feature restricted WebVTT outputs February 19, 2016 7
  • 9. Legacy Workflow: Inspections 8 Proprietary Canonical model Semantic inspections STL parser and syntax inspections SCC parser and syntax inspections TTML1 parser and syntax inspections .scc .stl .ttml SRT parser and syntax inspections .srt
  • 10. Legacy Workflow: Conversions 9 Proprietary Canonical model STL parser SCC parser TTML1 parser .stl .ttml SRT parser .srt TTML writer WebVTT writer .scc
  • 11. Talk Outline • History and Legacy Workflow • New Workflow • Standards Activity and Roadmap 10February 19, 2016
  • 12. Japanese Subtitles • Essential Features – vertical text – ruby annotations – horizontal-in-vertical – Unicode 11February 19, 2016
  • 13. Japanese Subtitles Challenges • New source format - Videotron Lambda (.cap) – inaccessible specification – ambiguity in format – non-interoperable vendor implementations • TTML1 does not support essential Japanese features, TTML2 needed • Netflix device SDK did not support essential rendering features – delivery of image-based subtitles – significant investment in open source software (OSS) tools 12February 19, 2016
  • 14. Japanese Subtitles Conversion Workflow (1-off) • .cap sources converted to TTML2, and then to image subtitles or webVTT (webVTT specification appears incomplete w.r.t. Japanese) • Image subtitles archive contains .png images + manifest with timing and positioning information • Green modules available as OSS (https://github.com/skynav/ttt) 13 cap2tt TTPETTX WebVTT writer Archiver ttml2 ttml2 ISD .png images February 19, 2016
  • 15. New Workflow: Inspections 14 TTML2 based canonical model Semantic inspections STL parser and syntax inspections SCC parser and syntax inspections CAP parser and syntax inspections TTML1, TTML2, IMSC1 SRT parser and syntax inspections TTML family parser and syntax inspections model serialized to disk Source format and charset detection input file February 19, 2016
  • 16. New Workflow: Conversions • Configurable filter-based architecture • Bank of model-domain filters • Output writer generates text or images 15 model-domain filter chain deserialized model FilterN Output generator Filter1 output specific filters output writer output file February 19, 2016
  • 17. Global Subtitles • i18n-grade timed text processing workflow • All languages of this world (+Klingon) • New TTML2-based output profile (“nflx-ttml-gsdh”) – vertical text – horizontal-in-vertical – ruby annotations – bidirectional text • Both text and image delivery options 16February 19, 2016
  • 18. Talk Outline • History and Legacy Workflow • New Workflow • Standards Activity and Roadmap 17February 19, 2016
  • 19. Our Experience with Source Formats • SCC – limited to Latin alphabet – non-standard use of SMPTE timecode • EBU-STL – dated1 , ambiguous, non-interoperable industry practices – no support for Asian character set • SRT – Too simple - positioning information not used in practice • LambdaCAP – ambiguous format - hard to find official specification • TTML1 – not self-contained - needs “Document Processing Context” – no support for Japanese rendering features 18 1 “The medium for exchange is a 3.5-inch high-density portable magnetic disk (microfloppy). The disk is formatted for 1.44 Mbytes (2 sides, 80 tracks, 18 sectors/track).”
  • 20. TTWG Standards • IMSC1 (Internet Media Subtitles and Captions) is TTML1- based W3C candidate recommendation; mandatory for IMF – multiple Netflix sponsored implementations were announced to TTWG on February 1, 2016 – Netflix plans 100% support for IMF – Netflix ingest implementation will support IMSC-T (text profile) • We have multiple TTML2 implementations in development - will support TTML2-based IMSC2 • Netflix is enthusiastic about HTMLCue 19February 19, 2016
  • 21. Open Source Activity • regxmllib (https://github.com/sandflow/regxmllib) – sponsored by Netflix – tools that provide essential building blocks for authoring of IMF CPL • ttt (https://github.com/skynav/ttt) – sponsored by Netflix – tools for validation and rendering of TTML1/2 • photon (https://github.com/Netflix/photon) (December 2015) – developed at Netflix – complete set of tools for validation of IMF packages February 19, 2016 20
  • 22. Talk Summary • Subtitles experience core to Netflix business • Netflix committed to TTWG standards – multiple IMSC1 and TTML2 implementations declared or in flight • Netflix plans to be 100% IMF • Netflix is actively involved in OSS efforts around timed text as well as IMF 21February 19, 2016
  • 23. 2015 Tech Emmy for Netflix • Netflix was a co-recipient of 2015 Technology and Engineering Emmy Award “Standardization and Pioneering Development of Non-Live Broadcast Captioning” 22
  • 24. Questions? • Rohit Puri (rpuri@netflix.com) • https://www.linkedin.com/in/rohit-puri-0a13b02 February 19, 2016 23