The iPhone is the best iPod Apple's ever made, and the iPad has replaced the TV for many users. And while developers can use documentation and books master the media frameworks (AV Foundation, Core Audio, and the rest), there's nothing in Xcode that will keep your audio from dropping out, fix artifacting on video with a lot of motion, or properly balance performance on the most-capable new Retina devices with backwards-compatibility with older ones. This session offers a ground-level intro to what's actually in your iTunes songs and streaming videos, and how to best encode them for the realities of iOS devices, their storage capacities and the networks they live on. We'll shoot, compress, and stream, all from a MacBook Air, and take a close look and listen to the results.
5. Glitches
• Bitrate too high for network
• Bitrate too low for contents
• Keyframe interval too low /
encoder error
6. More Glitches
• Audio and Video out of sync
• Media doesn’t play at all
• …or plays on some devices but not
others
• Media consumes too much of a resource:
battery, filesystem storage, etc.
7. Beating the Glitches
(what we’ll learn today)
• How digital media works: tradeoffs
• Codecs, compression, and containers
• Different approaches for different needs
• iOS / Mac encoding APIs
9. A/V Encoding
• Representing time-based media digitally
• “Show this image at this time”
• “Codec” – from “coder / decoder”
10. Analog media
• Telephone – air pressure against mic / from
speaker reproduced as line voltage
• Radio – amplitude of sound wave
modulated atop carrier signal
• Film – series of distinct images presented
for fraction of a second each
11. Simple Digital Media
• Captions / Subtitles – Series of samples that
indicate text / color / location and timing
• PCM audio – audio wave form reproduced
as numeric samples
• M-JPEG – Series of JPEG frames with timing
information
12. Compression
• Advanced codecs can reduce bandwidth by
• Eliminating redundant information within
groups of samples
• Eliminating data that won’t be missed by
human eyes/ears
• “Lossless” codecs reproduce their source
media perfectly; “lossy” codecs don’t
15. Blu-Ray Author
• Cares most about image quality, and fitting
into a specific size range ("bitrate
budgeting")
• Does not care about render time, CPU
requirements, or expense
• The author's pay may itself be an expense
for someone else
16. Streaming Site
• Server-side transcoder cares most about
bitrates that work for clients
• Cares somewhat about time for uploaded
files; critically important for livestreams
• Cost/CPU/storage/bandwidth may be issues
as the site scales, but they're the kinds of
problems you want to have
17. Video Editor /
Effect Artist
• Cares most about image quality (don't want
to degrade with each edit or effect), and
CPU (heavily-compressed video is slow to
scrub through, composite, etc.)
• Does not care about storage/bandwidth or
cost
18. Facetime Users
• Cares most about encoding time (must be
in real-time) and cost (end-users expect
services to be free)
• Care about CPU only to the degree it
works at all on their device
• Don't care about image quality; it's
expected to scale with available bandwidth
20. One Size Doesn't Fit All
• Editors and artists need uncompressed files,
or "mezzanine" codecs that have high quality
with light (preferably lossless) compression
• Blu-Ray author will take uncompressed or
mezzanine and crunch to a highly-efficient
delivery codec
• Facetime users need something that can be
compressed in realtime on consumer devices
21. iOS / Mac video codecs
• For capture / editing / effects:
Uncompressed or ProRes
• For end-user distribution: H.264
• On iOS, pretty much H.264 for
everything
22. Codec Frame Types
• Intra (I) frame – all image data is included in
the frame
• Predicted (P) frame – some image data
depends on one or more earlier frames
• Bi-directionally predicted (B) frame – some
image data depends on one or more earlier
and/or later frames
23. I/P/B Frames
I frame
P frame
B frame
I frame
From http://en.wikipedia.org/wiki/Video_compression_picture_types
24. Codec Configurations
• Bitrate: how much data is consumed per
second
• More bits = higher image quality
• Keyframe interval: can force an I-frame at a
specific interval to "clean up" the image
• Image size, frame rate, etc.
25. H.264 “Profiles”
• Define which parts of the H.264 video
specs are / aren’t available
• On Apple devices: baseline, main, and high
• Baseline: iPhone 4 and earlier, video iPods
• High: Apple TV 2, iPad 3rd Gen, iPhone 5
• Biggest difference: baseline doesn’t have Bframes
33. More Compression
Considerations
• May want to filter media before encoding
• Audio: dynamic compression and
normalization of levels (see "The
Levelator")
• Video: some codecs change your colors
and luminence ("crushed blacks"); you can
adjust them prior to compression to
lessen this effect
38. What Do The Following
Have In Common?
•
•
•
Most “Second
Doctor” (Patrick
Troughton) episodes of
Doctor Who
Most US soap operas
prior to 1970
Most US game shows
prior to 1975
•
Nearly all Dumont
Network (1946-1956)
programming
•
Television broadcasts of
Super Bowls I & II
•
…and much more
40. Loss
• Encoding never makes media better. When
image or sound data is lost, it is lost forever
• When master tapes or films are destroyed,
they can never be brought back, and copies
are inherently inferior
• In previous decades, reuse of video tape
(“wiping”) and destruction of film
(“junking”) were common practice
41. Case Study: Filmation
• Major US producer of TV/movie animation
(Superman, Fat Albert,The Archies, He-Man)
• Bought and shut down in 1989. Archive
converted to PAL and films destroyed
• Due to framerate differences, PAL-to-NTSC
conversions will always have sped-up audio
• Can never be released in HD
44. Containers
• Allow you to combine and synchronize
multiple audio/video/other media streams
• Files: QuickTime (.mov), MPEG-4
(.mp4, .m4v, .m4a, .aac, etc.), Flash (.flv),
Windows Media (.wmv), Ogg (.ogg), etc.
• Network streams: Shoutcast, RTSP, HLS,
MPEG-2 Transport, etc.
45. QuickTime File Format
• Content agnostic: can handle any kind of
codecs
• Internal tree structure of "atoms": 4 byte
size counter, 4 character code type, and
then internals specific to that type
• "moov" atom at the top level, contains
"trak" atoms, which contain "mdia" atoms,
which point to media samples
46. Editing with QuickTime
• Sample references may or may not be in
the same file
• If they are, it's a "self-contained movie",
suitable for distribution to end users
• If not, it's a "reference movie", suitable for
non-destructive editing
47. Streaming Containers
• Streams can't offer random access like files
• Simple example: Shoutcast
• Just an endless stream of MP3 data over a
socket, rate-controlled by server
• Metadata (song titles) are inserted
periodically in stream and must be
removed by client before passing to audio
decoder
48. HTTP Live Streaming
• Required format for most iOS streaming
• Not actually a stream, but a series of small
(~10 sec) files and a periodically-refreshed
playlist of segment files
• Can provide different bitrates via a playlistof-playlists; client will figure out if it's
getting data fast enough and switch up or
down as needed
50. Consider Your Goals
• Playback: cut scenes, media player
• Capture/editing
• Messaging:VoIP, video chat
• Other: livestreaming, screencasting, etc.
52. Consider Your
Constraints
• Device storage / network bandwidth
• iOS apps over 100MB cannot be
downloaded over cellular network
• CPU/GPU performance
• Device support
• Are you encoding for non-Apple devices
too?
55. Encoding APIs
• Core Audio
• Audio Converter Services, Extended
Audio File Services
• AV Foundation
• AVAssetExportSession, AVAssetWriter
• Video Toolbox (Mac only)
56. Core Audio codecs
• LPCM (uncompressed)
• MP3 (read-only)
• AAC
• iLBC
• Apple Lossless
• Audible
Not a complete list. Not all Mac types available on iOS.
57. AVAssetExportSession
• Used to export an AVAsset (one or more
audio/video tracks) to a file
• Takes a “preset” for configurations. Can be:
• QuickTime at various quality settings
• QuickTime at various sizes
• iTunes-compatible .m4a
• “Pass Through”
58. AVAssetWriter
• Lets you write one of several file formats
(.mov, .mp4, Core Audio types) and specify
encoding parameters (codec, size, bitrate,
keyframe interval, etc)
• You manually append CMSampleBuffers,
usually single frames of video or small audio
buffers
60. AVF Video Codecs
• H.264
• JPEG
• ProRes 4:4:4:4 or 4:2:2 (Mac only)
• iFrame – H.264 I-Frame-only (as a format
received from AVCaptureSession only, iOS
only)
The numbers in ProRes codecs refer to the color/alpha fidelity; see http://
en.wikipedia.org/wiki/Chroma_subsampling for more information
61. HTTP Live Streaming
• Create with command-line tools or Pro apps
(Compressor, FCPX, Motion, etc.)
• Or use a server-side service to do it for
you (UStream, Wowza, etc.)
• Use variant playlists to target different
devices and network conditions
• Must provide a 64 kbps variant, either
audio-only or audio with a single image
62. TN2224 – 16:9
“Best Practices for Creating and Deploying HTTP Live
Streaming Media for the iPhone and iPad”
http://developer.apple.com/library/ios/#technotes/tn2224/_index.html
64. Additional Streaming
Considerations
• Are you encoding for other platforms?
• Macs, Roku, Android (4.1+) support HLS
• Desktops get Flash instead of <video>
(but H.264 in .flv works great too!)
65. Takeaways
• Encoding is about tradeoffs: know what
matters to you, and what you can
compromise on
• CPU, storage/bandwidth, cost, time,
quality
66. Q&A
Slides will be posted to the CocoaConf Glassboard, and
announced on my Twitter & app.net (@invalidname)