AV Foundation -- introduced in iOS 4, ported to Lion, and enhanced further in iOS 5 -- delivers a comprehensive framework for audio and video capture and playback. The capture functionality is so good, it's now the preferred option for still photography applications. In this session, we'll focus squarely on AV Foundation as a media capture framework. Attendees will learn:
* How to get the most out of the device for still photography, by using AV Foundation to access the flash, white-balance, and image resolution.
* How to capture audio and video to the file system
* How to process incoming audio and video capture buffers in memory, to create real-time effects or pick out interesting parts of the scene on the fly
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
Capturing Stills, Sounds, and Scenes with AV Foundation
1. Capturing Stills, Sounds,
and Scenes with AV
Foundation
Chris Adamson • @invalidname
Voices That Matter: iOS Developer Conference
Nov. 12, 2011 • Boston, MA
Tuesday, November 15, 11
2. Road Map
• Media capture technologies in iOS
• AV Foundation capture concepts
• Device-specific concerns
• Doing stuff with captured media
Tuesday, November 15, 11
3. Capture?
• Digital media encoding of some real-world source,
such as still images, moving images, and/or sound
• Contrast with synthetic media: musical
synthesizers, CG animation
• Not the same as "recording", which implies storage
• Capture devices include cameras and
microphones
Tuesday, November 15, 11
5. Accessing Capture
Devices
• Simple shoot-and-save -
UIImagePickerController
• Core Audio - low level capture and real-time
processing
• More info in my talk tomorrow
• AV Foundation
Tuesday, November 15, 11
6. AV Foundation
• Introduced in iPhone OS 2.3 as Obj-C
wrapper for Core Audio playback, added
capture in 3.0
• Repurposed in iOS 4 as audio/video capture,
editing, export, and playback framework
• Ported to OS X in Lion, heir apparent to
QuickTime
Tuesday, November 15, 11
8. Core Media
• C-based helper framework for AVF
• Structures to represent media buffers and
queues of buffers, media times and time
ranges
• Low-level conversion and calculation functions
• Does not provide capture, editing, or
playback functionality
Tuesday, November 15, 11
9. AV Foundation
• Editing / Playback classes
• Assets, compositions, and tracks. Player
and player layer. Asset readers and writers
• Capture classes
• Devices, inputs, outputs, and the session
Tuesday, November 15, 11
22. AVCaptureSession
• Coordinates the flow of capture from inputs to
outputs
• Create, add inputs and outputs, start running
captureSession = [[AVCaptureSession alloc] init];
Tuesday, November 15, 11
23. AVCaptureDevice
• Represents a device that can perform media
capture (cameras, microphones)
• Could be connected as external accessory or
Bluetooth
• You cannot make assumptions based on
device model
Tuesday, November 15, 11
24. Discovering Devices
• AVCaptureDevice class methods devices,
deviceWithUniqueID, devicesWithMediaType,
defaultDeviceWithMediaType
• Media types include audio, video, muxed
(audio and video in one stream), plus some
outliers (timecode, etc.)
Tuesday, November 15, 11
25. Inspecting Devices
• position (property): is the camera on the front
or the back?
• supportsAVCaptureSessionPreset: allows you
to inspect whether it can copy at one of
several predefined image resolutions
Tuesday, November 15, 11
26. Photo traits
• Focus & exposure
• isFocusModeSupported:, focusMode,
focusPointOfInterestSupported, focusPointOfInterest,
focusAdjusting
• isExposureModeSupported:, exposureMode,
exposurePointOfInterestSupported, etc.
• White balance
• isWhiteBalanceModeSupported:, whiteBalanceMode,
whiteBalanceModeAdjusting
Tuesday, November 15, 11
27. Light up
• Flash and Torch
• hasFlash, isFlashModeSupported:
flashMode, flashActive, flashAvailable
• hasTorch, isTorchModeSupported:,
torchMode, torchLevel, torchAvailable
Tuesday, November 15, 11
28. AVCaptureSession
sessionPreset
• Constants for video capture quality. Allows
you to inspect capabilities, trade
performance/framerate for resolution
• Default is AVCaptureSessionPresetHigh
• For still photos:
AVCaptureSessionPresetPhoto
Tuesday, November 15, 11
29. iFrame
• Session presets for use when capturing video
intended for subsequent editing
• AVCaptureSessionPresetiFrame960x540,
AVCaptureSessionPresetiFrame1280x720
• No P- or B-frames; files are much larger than
typical H.264.
http://en.wikipedia.org/wiki/Video_compression_picture_types
Tuesday, November 15, 11
30. Capture inputs
• Connect a device to the capture session
• Instances of AVCaptureDeviceInput
• create with -initWithDevice:error: or
deviceInputForDevice:error
Tuesday, November 15, 11
32. Capture preview
• AVCapturePreviewLayer: A CALayer that
shows what's currently being captured from
video input
• Remember: CALayer, not UIView
• videoGravity property determines how it will
deal with preview that doesn't match bounds:
aspect, fill, or resize
Tuesday, November 15, 11
34. Capture Outputs
• File output: AVCaptureMovieFileOutput and
AVCaptureAudioFileOutput
• Photo output: AVCaptureStillImageOutput
• Image processing: AVCaptureDataOutput
• More on this one later…
Tuesday, November 15, 11
35. AVCaptureFileOutput
• startRecordingToOutputURL:recordingDelegate:
• The delegate must be set and must implement two
callbacks:
• captureOutput:didStartRecordingToOutputFileAt
URL:fromConnections:
• captureOutput:didFinishRecordingToOutputFileAt
URL:fromConnections:
• Then connect to capture session
Tuesday, November 15, 11
37. Cranking it up
• -[AVCaptureSession startRunning] starts
capturing from all connected inputs
• If you have a preview layer, it will start
getting updated
• File outputs do not start writing to filesystem
until you call startRecording on them
Tuesday, November 15, 11
38. Demo
AVRecPlay
http://dl.dropbox.com/u/12216224/conferences/vtm10/mastering-media-with-av-
foundation/VTM_AVRecPlay.zip
Tuesday, November 15, 11
39. Orientation issues
• Default orientation of an iOS device is portrait
• The AVCaptureConnections between the
device inputs and the session have a read-
write videoOrientation property.
• Capture layer's orientation property should
match
Tuesday, November 15, 11
40. Capture Processing
• Analyzing or manipulating capture data as it
comes in
• Audio: real-time effects ("I Am T-Pain"),
oscilloscopes, etc.
• May make more sense to use Audio Units
• Video: bar code readers, face-finders, etc.
Tuesday, November 15, 11
41. Data Outputs
• Connects your code to the capture session
via a delegate callback
• Delegate callback occurs on a serial GCD
queue that you provide (can be
dispatch_get_main_queue(), should not be
dispatch_get_current_queue(), must not be
NULL).
Tuesday, November 15, 11
42. Creating the data
output
AVCaptureVideoDataOutput *captureOutput =
! [[AVCaptureVideoDataOutput alloc] init];
captureOutput.alwaysDiscardsLateVideoFrames =
! YES;
[captureOutput setSampleBufferDelegate:self
! ! ! ! queue:dispatch_get_main_queue()];
Tuesday, November 15, 11
43. Configuring the data
output
NSString* key =
! (NSString*)kCVPixelBufferPixelFormatTypeKey;
NSNumber* value =
! [NSNumber numberWithUnsignedInt:
! ! kCVPixelFormatType_32BGRA];
NSDictionary* videoSettings = [NSDictionary
! dictionaryWithObject:value forKey:key];
[captureOutput setVideoSettings:videoSettings];
Tuesday, November 15, 11
44. Analyzing the data
• You get the callback captureOutput:
didOutputSampleBuffer:fromConnection:
• Second parameter is a CMSampleBufferRef,
Core Media's opaque type for sample buffers
• Could be video… could be audio… (but you
can tell from the connection and its input
and output ports)
Tuesday, November 15, 11
45. Analyzing frames with
Core Video
CVImageBufferRef imageBuffer =
! CMSampleBufferGetImageBuffer(sampleBuffer);
/*Lock the image buffer*/
CVPixelBufferLockBaseAddress(imageBuffer,0);
/*Get information about the image*/
size_t bytesPerRow =
! CVPixelBufferGetBytesPerRow(imageBuffer);
size_t width = CVPixelBufferGetWidth(imageBuffer);
size_t height = CVPixelBufferGetHeight(imageBuffer);
This example is from the ZXing barcode reader
http://code.google.com/p/zxing/
Tuesday, November 15, 11
46. Demo
ZXing
http://code.google.com/p/zxing/
Tuesday, November 15, 11
47. Audio considerations
• Can process CMSampleBufferRef by using
CMSampleBufferGetAudioStreamPacket
Descriptions() and CMSampleBufferGet
AudioBufferListWithRetainedBlockBuffer()
• Then use Core Audio call that take these types
• May make more sense to just capture in Core Audio
in the first place, especially if you're playing
captured data through an audio queue or audio units
Tuesday, November 15, 11
48. Face Finding in
iOS 5
• iOS 5 introduces Core Image, which allows us
to chain effects on images
• Also includes some interesting image
processing classes
Tuesday, November 15, 11
49. CIDetector
• Core Image class to find features in a Core
Image buffer
• Only supported detector type in iOS 5 is
CIDetectorTypeFace
• featuresInImage: returns an NSArray of all
detected features in the image
Tuesday, November 15, 11
50. Convert CM to CV to CI
CVPixelBufferRef cvPixelBuffer =
! CMSampleBufferGetImageBuffer(sampleBuffer);
CFDictionaryRef attachmentsDict =
! CMCopyDictionaryOfAttachments(
! ! kCFAllocatorSystemDefault,
! ! sampleBuffer,
! ! kCMAttachmentMode_ShouldPropagate);
CIImage *ciImage = [[CIImage alloc]
! ! initWithCVPixelBuffer:cvPixelBuffer
! ! options:(__bridge NSDictionary*)
! ! ! ! attachmentsDict];
Tuesday, November 15, 11
52. Demo
VTMFaceFinder
http://dl.dropbox.com/u/12216224/conferences/vtm11/VTMFaceFinder.zip
Tuesday, November 15, 11
53. Boxing the faces
for (CIFaceFeature *faceFeature in self.facesArray) {
CGRect boxRect = CGRectMake(
faceFeature.bounds.origin.x * self.scaleToApply,
faceFeature.bounds.origin.y * self.scaleToApply,
faceFeature.bounds.size.width * self.scaleToApply,
faceFeature.bounds.size.height * self.scaleToApply);
CGContextSetStrokeColorWithColor(cgContext,
[UIColor yellowColor].CGColor);
CGContextStrokeRect(cgContext, boxRect);
}
Tuesday, November 15, 11
54. CIFaceFeature
• Inherits bounds from CIFeature
• Adds CGPoint properties leftEyePosition,
rightEyePosition, and mouthPosition (with
"has" properties for each of these)
Tuesday, November 15, 11
55. Image Processing on
the fly
• New CVOpenGLESTextureCache makes it
possible to render Core Video buffers in real
time
• These are what you get in the callback
• See ChromaKey example from WWDC 2011
session 419. Requires mad OpenGL ES skillz.
Tuesday, November 15, 11
57. Recap
• Start with an AVCaptureSession
• Discover devices and create inputs
• Create and configure outputs
• Start the session
• Start recording or wait to start handling
callbacks
Tuesday, November 15, 11
58. Recap: Easy parts
• Basic capture apps (preview-only or record to
file) will require little or no Core Media or other
C APIs.
• Default devices are usually the one you want
(back megapixel camera on the iPhone, best
available microphone, etc.)
• Capture API is pretty easy to understand and
remember (compare to the editing API)
Tuesday, November 15, 11
59. Recap: Hard parts
• Core Media calls require high comfort level
with C, Core Foundation, functions that take 8
or more parameters, etc.
• Lots of bit-munging when you parse a CV
buffer (pixel formats, strides)
• Callbacks do not have an infinite amount of
time or resources to finish their work
Tuesday, November 15, 11
60. Resources
• devforums.apple.com
• No mailing list at lists.apple.com
• WWDC session videos and slides (four in
2011, three in 2010)
• Stack Overflow
Tuesday, November 15, 11
61. Q&A
Watch my blog for updated sample code:
http://www.subfurther.com/blog
@invalidname
Tuesday, November 15, 11