1. 0PhD. Thesis Defense – 9th April, 2013
Novel Applications for Emerging Markets
Using Television as a Ubiquitous Device
CTIF, Department of Electronic Systems
Arpan Pal
Principal Scientist and Head of Research
TCS Innovation Labs, Kolkata
Supervisor
Prof. Ramjee Prasad
2. 1
Motivation - TV as a Ubiquitous Computing Device
“Ubiquitous computing enhances computer use by making many computers available throughout
the physical environment, but making them effectively invisible to the user” – Mark Weiser
Can TV be the Ubiquitous Screen for Home?
• It is available
• It is easy to use
Market Facts (India) Source – IMRB, ITU, IMAI (2009, 2010, 2011)
PC – Low penetration - cost, skill and usability issues
Mobile screen - good for social networking - not for
Information-heavy content
Tablets and large-screen smartphones – still costly
Television - low-cost of ownership, large screen real-estate
, familiarity of usage
Device Penetration (%) Internet (%)
PC 6.1% 4.2% (Home)
Mobile 61.4% 3.2%
Television 60% (Household) ??
Internet Browser
Media Player (Audio, Video, Image)
Video Chat, SMS
Remote Healthcare
Distance Education
Pilot Deployment in India and Philippines
Home Infotainment Platform
3. 2
User Study
User survey with HIP (Pilot version of “Dialog” launched by Tata Teleservices Ltd.) was conducted among the
city users in India – by TNS
Sample - 50 middle-class and lower middle-class families involving 50 working adults and 50 students (12-18 years age).
Confidence score measure - % of respondents
responding with a score of 4 or 5
4. 3
User Study Findings
Slow to very-slow internet connection affecting the user experience
• Video chat could not be tested – how to do acceptable quality video chat under low-QoS networks?
Browser - Ease of use Issue and not much liking for TV-Internet blending
• How to make experience acceptable through TV-Internet Mash-ups?
Preference of remote control for non-computer-savvy users and dislike of the QWERTY
layout for on-screen keyboard
• How to design a better text entry method using remote control?
Additional multimedia content security requirement for remote healthcare and distance
education applications
• How to have low computational complexity Access Control and DRM schemes capable of
running on the constrained low-cost platform?
Problem Statement
Improving the user experience of TV as low-cost Internet-access device
through
QoS-aware Video Transmissions
Low-complexity Video Security
TV context-aware Intelligent TV-Internet Mash-ups
TV Remote-based on-screen keyboard for text entry
Challenges
Resource Constrained
Platform
Non-Computer-Savvy
Users
Low-QoS Network
5. Video Chat over low-QoS Networks
Low-QoS
Network
Resource
Constrained
Platform
Low-cost
Infotainment
Platform
ICT
Infrastructure
Limitations
User
Experience
Requirements
Challenges
6. 5
Problem Definition
Video Chat
• Bandwidth-hungry with real-time packet delivery requirement
2G wireless networks (CDMA1xRTT / GPRS)
• Poor bandwidth and high latency – fluctuates with time and place
Need for a adaptive rate control based video chat system with preference to
audio
State-of-the-Art Analysis
Network Condition Estimation and Adaptation on WiFi and IP networks, not on
2G wireless where the latency is higher
Adequate work not reported on low computational complexity rate control
algorithms for streaming video
Adequate work not reported for optimal design for audio/video streaming
systems addressing the latency and perceptual quality issue
7. 6
Proposed System
Contribution
Network Sensing
• An experimental heuristics
based mapping of effective
bandwidth to probe packet delay
Adaptation
• A low complexity video rate
control algorithm for H.264 CBR -
automatic switching between
frame/MB as basic units for
quantization based on video
complexity
• An adaptive fragmentation
scheme with frame-level
sequencing that minimizes
packet-delay based discards,
prioritizes audio and improves
perceived video quality
Video Chat Application
Middleware Framework
Rate Control - Encode Rate Control - Decode
Video
Encode
Audio
Encode
Audio
Decode
Video
Decode
Network
Sensing
Fragmentation Re-assembly
Underlying Network Stack (UDP/IP)
8. 7
Results
Grandma @ 30kbpsAkiyo @ 25kbps
PSNR(indB)
PSNR(indB)
Network Type
Mean (kbps) Stdev (kbps)
ADSL-ADSL 596.14 203.45
Modem-
Modem 26.96 19.23
Modem-ADSL 18.13 3.21
Network Sensing
Adaptive Rate Control (QCIF, 5fps)
Adaptive Fragmentation
Feedback from 20 users about
overall experience using
a) Standard RTP based system
b) Proposed system
100% reported better audio
quality and better perceived
video quality
9. Low Complexity Video Security
Watermarking for DRM (Education)
Resource
Constrained
Platform
Low-cost
Infotainment
Platform
Socially Value-
Adding Apps
User Experience
Requirements
Challenges
10. 9
Problem Definition
Digital Watermarking
Requirement
Needs to be imperceptible and robust at the same time
Needs to have low-computational complexity
State-of-the-Art
Initial and classical works are on MPEG2 and not on H.264
Reported H.264 watermarking systems have high computational overhead
Reported works lack perceptual quality analysis and attack-robustness
analysis
.
Video Encryption
Requirement
Needs to have low-computational complexity, yet adequate security
State-of-the-Art
Uncompressed domain encryption - high decryption computational overhead
Reported H.264 video encryption works – no focus on computational
complexity
No work reported on Video Quality assessment after encryption/decryption
11. 10
Digital Watermarking – Proposed System
Contribution
Robust, imperceptible yet low-
computational-complexity H.264
watermarking system
Hash based integrity check
Embed watermark reusing the
quantizer
Embedding location carefully chosen
for imperceptibility
Evaluation methodology for
watermark attack and perceptual video
quality
Peak-Signal-to-Noise-Ratio (PSNR)
based Imperceptibility
Evaluation against 10 known attacks
Mean-Opinion-Score (MoS) based
post-attack measurement of Video
Quality, Retrieved Image Quality and
Retrieved Text Quality
Input video
(YUV 4:2:0
format)
F/ n-1
(Reference)
F/ n
(Reconstructe
d)
De-blocking
Filter
Entropy
Encoder
ReorderQT
Inter (P)
Intra (I)
Q-1T-1
NAL
+
+
+
+
Extract
watermark
Embed
watermark
Best
Predictio
n Mode
and block
size
selection
Text
And
Image
Retrieved
Text
And
Image
12. 11
Digital Watermarking – Results
Operation No. of Operations per GOP
ADD 2779
MULTIPLY 3564
DIVIDE 1980
MODULO 3564
CONDIONAL 7524
MEMORY I/O 1584
Function CPU Mega Cycles taken per
GOP (1 second)
Watermark Embedding 6.8
Watermark Extraction 3.8
Computational Complexity Video Quality after Watermark
Attack Video Quality Image Quality Text Quality Overall Performance against Attack
AA Excellent Excellent Excellent Excellent
CAA Poor Medium Bad Attack Degrades Video Quality
FFA Poor Poor Bad Attack Degrades Video Quality
GCA Poor Good Good Attack Degrades Video Quality
GA Bad Bad Bad Attack Degrades Video Quality
HEA Poor Good Poor Attack Degrades Video Quality
LEA Poor Good Bad Attack Degrades Video Quality
NLFA Poor Medium Bad Attack Degrades Video Quality
RsA Excellent Excellent Excellent Excellent
RoA Poor Excellent Excellent Attack Degrades Video Quality
Perceptual Video Quality after Attacks + Retrieved Image / Text Quality (14 streams, 20 users)
13. Context-aware Intelligent TV-Internet Mash-ups
TV Channel Identity as Context
Textual Context in Static TV pages
Textual Context embedded in Broadcast Video
Resource
Constrained
Platform
Low-cost
Infotainment
Platform
Television as the
Ubiquitous
Access Device
User
Experience
Non-Computer-
savvy Users
Requirements
Challenges
14. 13
Requirement (Analog TV Context)
DTH / Cable
Set top Box
HIP
Video
Capture
&
Context
Extraction
Information
Mash-up
Engine
Television
Internet
RF in A/V in
Video
Graphics
A/V Out
ALPHA
Blending
Channel Identity – EPG, Viewership Rating Text on Static Pages – Return path on DTH
Text on Broadcast Video – News Mash-up
15. 14
Problem Definition
TV Channel Identity
Audio watermarking and audio signature based - need content modification or
non-real-time offline processing
TV channel logo detection based
Reported works use PCA and ICA – computationally intensive and works only
on Static, opaque and rectangular logos, not on non-rectangular /
transparent / dynamic logos
Text on Broadcast Video
Challenge is in identifying the text against dynamically changing video background
Pixel domain and Compressed domain
• Region Based (Connected Component Based (CC) and Edge Based (EB))
Different methods work for varying kinds of texts against varying video
backgrounds
• Need for Hybrid Approach (Region and Texture, CC and EB)
• Text Area Localization and pre-processing remain the biggest challenge
Text on Static Pages
Noisy Data with fixed fonts.
Efficient pre-processing techniques is the main challenge
.
16. 15
TV Channel Identity – Results
110 channels tested: r = 0.96 and p = 0.95.
• The channel logos with very small number of pixels are missed in 1% cases.
• For rest 3% misses – moving channel logo / changed logo.
• 3% false positive from small size logo – removed from template
• 2% false positive due to highly transparent logos
TV Channel Identity Recall and Precision
Text Detection on Static Pages
Textual Context from Broadcast TV
• 20 News Channels (5 min. duration each)
• Recall of 100% and Precision of 78% for the text
localization and OCR
• Precision Improves to 88.57% after Heuristics-based
keyword spotting
• Almost 100% precision after Google Dictionary
based post-processing
18. 17
Problem Definition
Requirements
• To provide cost-effective and easy-to-use text-entry mechanism for accessing
services like internet, email and short message services (SMS) from television.
• Full-fledged separate wireless keyboard (Bluetooth or RF) is costly
• Explore option of using infra-red remotes with on-screen keyboard on TV screen
State-of-the-Art
Traditional “QWERTY” on-screen keyboards require a large number of keystrokes
to navigate which makes it cumbersome to use.
Available on-screen keyboards do not address the usability aspect for a non-
computer-savvy user.
Most of them are designed for cursor-based systems and not for traditional key-
based infra-red remotes which have relatively slow response time on key press.
Insufficient user study and modeling for on-screen layouts
19. 18
Proposed System
Contribution
A novel formulation of on-screen layout
• Significantly reduces the number of
key strokes while typing (19 for
QWERTY, 9 for proposed)
Formal methodology for user study
evaluation
• Popular Keystroke-level-model
(KLM) and Goal-Operator-Methods
(GOMS) model for formal
evaluation
Extend the standard KLM operator set
to model the remote based operations
and hierarchical layouts.
• A finger movement operator
replacing standard pointing device
operator
Aa Ab
Ag Ah
20. 19
Results -User Study 1 (Basic Benchmarking)
Users
25 users (diverse age / keyboard exposure)
Tasks
Users were asked to type - “The quick brown fox jumps over a lazy
dog” on SMS/Email and “www.google.com” as URL
Layout 1
QWERTY
QWERTY vs. Layout-1
Layout 2
Layouts % improvement
(Experiment)
% improvement
(predicted from KLM-
GOMS)
Layout 1 over
QWERTY
44.23 45.75
Layout 2 over
QWERTY
45 46.75
Layout 2 over
layout1
2 1.84
Layouts % improvement
(Experiment)
% improvement
(predicted from KLM-
GOMS)
Layout1
over QWERTY
42.2 35.23
Layout2
over QWERTY
43.4 37.2
Layout 2
over Layout1
3.18 2
21. 20
Conclusions
Motivation – Why Television
Validated through Market Data from India
Background – TCS Home Infotainment Platform (HIP)
Internet Browser, Media Player, SMS, Video Chat, Remote Healthcare, Distance
Education
Requirement Analysis – Field Study
Challenges - Slow Internet Speed, Non-computer-savvy users, Resource
Constrained Platform
How to Improve user experience of TV as low-cost Internet-access device
Network-adaptive Video Rate Control and Packet Fragmentation Protocols for
improved Video Chat experience
Low-computational-complexity Video Watermarking and Encryption for secure
multimedia content sharing
TV context-aware Intelligent TV-Internet Mash-ups for improving the Internet
Browsing Experience on TV
Remote-based on-screen keyboard for improved text entry on TV using remote
control
22. 21
Publications
1. Arpan Pal, M. Prashant, Avik Ghose, Chirabrata Bhaumik, “Home Infotainment Platform – A Ubiquitous
Access Device for Masses”, Proceedings on Ubiquitous Computing and Multimedia Applications (UCMA),
Miyazaki, Japan, March 2010.
2. Dhiman Chattopadhyay, Aniruddha Sinha, T. Chattopadhyay, Arpan Pal, “Adaptive Rate Control for H.264
Based Video Conferencing Over a Low Bandwidth Wired and Wireless Channel”, IEEE International
Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Bilbao, Spain, May 2009.
3. Arpan Pal and T. Chattopadhyay, “A Novel, Low-Complexity Video Watermarking Scheme for H.264”,
Texas Instruments Developers Conference, Dallas, Texas, March 2007.
4. T. Chattopadhyay and Arpan Pal, “Two fold video encryption technique applicable to H.264 AVC”, IEEE
International Advance Computing Conference (IACC), Patiala, India, March 2009.
5. T. Chattopadhyay, Aniruddha Sinha, Arpan Pal, Debabrata Pradhan, Soumali Roychowdhury,
“Recognition of Channel Logos From Streamed Videos for Value Added Services in Connected TV”, IEEE
International Conference for Consumer Electronics (ICCE), Las Vegas, USA , January 2011.
6. T. Chattopadhyay, Arpan Pal, Utpal Garain, “Mash up of Breaking News and Contextual Web Information:
A Novel Service for Connected Television”, Proceedings of 19th International Conference on Computer
Communications and Networks (ICCCN), Zurich, Switzerland, August 2010.
7. T. Chattopadhyay, Aniruddha Sinha, Arpan Pal, “TV Video Context Extraction”, IEEE Trends and
Developments in Converging Technology towards 2020 (TENCON 2011), Bali, INDONESIA, November
21-24, 2011.
8. Arpan Pal, Chirabrata Bhaumik, Debnarayan Kar, Somnath Ghoshdastidar, Jasma Shukla, “A Novel On-
Screen Keyboard for Hierarchical Navigation with Reduced Number of Key Strokes”, IEEE International
Conference on Systems, Man and Cybernetics (SMC), San Antonio, Texas, October 2009.
9. Arpan Pal, Debatri Chatterjee, Debnarayan Kar, “Evaluation and Improvements of on-screen keyboard for
Television and Set-top Box”, IEEE International Symposium for Consumer Electronics (ISCE), Singapore,
June 2011.
23. 22
Publications (contd…)
10. Arpan Pal, M. Prashant, Avik Ghose, Chirabrata Bhaumik, “Home Infotainment Platform – A Ubiquitous
Access Device for Masses”, Book Chapter in Springer Communications in Computer and Information
Science, Volume 75, 2010, Pages 11-19.DOI: 10.1007/978-3-642-13467-8.
11. Arpan Pal, Ramjee Prasad, Rohit Gupta, “A low-cost Connected TV platform for Emerging Markets–
Requirement Analysis through User Study”, Engineering Science and Technology: An International
Journal (ESTIJ), ISSN: 2250-3498, Vol.2, No.6, December 2012.
12. T. Chattopadhyay and Arpan Pal, “Watermarking for H.264 Video”, EE Times Design, Signal Processing
Design Line, November 2007.
13. Arpan Pal, Aniruddha Sinha and Tanushyam Chattopadhyay, “Recognition of Characters from Streaming
Videos”, Book Chapter in book: Character Recognition, Edited by Minoru Mori, Sciyo Publications, ISBN:
978-953-307-105-3, September 2010.
14. Arpan Pal, Tanushyam Chattopadhyay, Aniruddha Sinha and Ramjee Prasad, “The Context-aware
Television using Logo Detection and Character Recognition”, (Submitted) Springer Journal of Pattern
Analysis and Applications
15. Debatri Chatterjee, Aniruddha Sinha, Arpan Pal, Anupam Basu,, “An Iterative Methodology to Improve TV
Onscreen Keyboard Layout Design Through Evaluation of user Study”, Journal of Advances in
Computing, Vol.2, No.5, October 2012), Scientific and Academic Publishing (SAP), p-ISSN:2163-2944, e-
ISSN:2163-2979.
24. 23
Future Work
Adaptive Video Chat
Network sensing via IP statistics, Support multicast
Low-complexity Video Security
Extending security analysis, Audio Watermarking
Internet-TV mash-up
Extend to second-screens (mobile, tablets), seamless transfer of streaming content
to/from TV from/to mobile/tablets, Integrating social media into broadcast TV.
On-screen Keyboard
Predictive keyboard integration, incorporating voice interfaces and gesture controls.
Explore using mobile phones as the remote control for TV.
Generic
Optimization the contradicting triad of requirements in form of cost, feature and
performance, in the face of ever improving hardware functionalities.
25. 24
Learning from the CTIF-GISFI PhD Program
Access to excellent and state-of-the-art technical course content from Aalborg
University
Very useful Research Soft Skill courses
Problem based Learning
Professional Networking
Qualitative Research Approaches
Innovation Methodology
Writing Scientific Papers
Technical Support from CTIF faculty
Culture of Practical Problem based Research
Attending relevant conferences and EU FP7 related programs
Networking with experts in related areas
28. 27
Introduction
Objective
Improving the user experience of TV as a low-cost Internet-access device for
masses in emerging markets like India
• Motivation
• Background
• User Study Based Requirement Analysis
• Contributions
1) Improve the Quality of Experience (QoE) of Video Chat under poor network
conditions using network sensing
2) Provide computationally efficient yet sufficiently secure algorithms for
efficient access control and digital rights management (DRM) for sensitive
multimedia content
3) Improve the experience of browsing the Internet on TV through intelligently
understanding the context of the TV program being watched and blending
related information from the Internet (known as TV-Internet mash-up)
4) Improve the experience of text entry on TV using the remote control through
a novel design of an on-screen keyboard layout
29. 28
Thesis Organization
ICT Infrastructure
Limitations
User
Experience
Resource Constrained
Platform
Low-QoS
Network
Non computer-savvy
users
TV context-aware
Intelligent TV-
Internet Mash-ups
Multimedia
Framework for Quick
Application
Development
Low complexity
Video Encryption and
Watermarking
Algorithms
Improved on-screen
keyboard layout for
text entry on TV
using remote
QoS-aware Video
Transmission
Low-cost
Infotainment
Platform
Television as a Ubiquitous
Display Device
Applications of Social
Value-add
Chapter 2 Chapter 4 Chapter 5 Chapter 6 Chapter 3
Scientific Contributions
Chapter 2
Chapter 2
Appendix A Appendix A
Appendix A
Engineering Contribution
Requirements
Challenges
30. 29
Background - Home Infotainment Platform (HIP )
A/V Out
A/V in
Internet Browser
Media Player (Audio, Video, Image)
Video Chat, SMS
Remote Healthcare
Distance Education
Pilot Deployment in India and Philippines
31. 30
• In Philippines it is launched through Smart
Communications under the brand name of
“SmartBro SurfTV”.
HIP Deployment
Tata Teleservices Ltd. launched HIP
under the brand name of “Dialog” in
Tamil Nadu and West Bengal.
TCS’ Home Infotainment Platform launched successfully in India and Philippines.
32. 31
HIP Applications
Chipset - TI Da-Vinci
DM6446 (297 MHz
ARM9 core and 594 MHz
DSP)
RAM - 256MB DDR
Flash - 64MB NAND
OS - Embedded Linux
2.6.x
Browser - Opera for
devices 9.6 with Flash
Multimedia Codecs
Video -
H.264,MPEG1,
MPEG2, MPEG4
Audio - MP3,
AMR-NB, AAC,
OGG, FLAC
Image - JPEG
33. 32
Socially Value-adding Apps – Healthcare and Education
ECG
Blood Pressure Monitor
Pulse OxyMeter
HIP
Patient
Records
Health Center / Home
Expert Doctor
Digital Stethoscope
34. 33
HIP Application Development Framework
Network
Microphone
Camera
A/V in
Storage
USB
Compress
Decompress
Multiplex
Demux
Render
Blend
Network
VGA
TV Video
TV Audio
Headphone
Storage
Control APIs
SRC PROC SINK
Applications
Application SRC PROC SINK
Video from Internet Network Demux – Decompress TV Video / Audio
Media player Storage Demux – Decompress TV Video / Audio
Video Chat (Far View) Network Demux - Decompress TV Video /
Headphone
Video Chat (Near View) Camera and Microphone Compress – Multiplex Network
Internet Browser Network Render TV Video
Remote Healthcare USB Compress - Multiplex Network
Distance Education -
Lecture Recording
A/V in Demux – Decompress,
Remux - Compress
Storage
Lecture Playback Storage Demux – Decompress TV Video / Audio
QA and Course Guide Network Demux Parser TV Video / Audio
35. 34
Sensing of Network Condition
T = average (RTT (P1, t1i), RTT (P2, t2i)) .
Heuristic based mapping of Effective bandwidth of the network based on
experimentation on the real network (CDMA1xRTT using Tata Docomo Photon) at
different times of the day and at different places
RTT(P1, t1i)
RTT(P2, t2i)
P1(t1i) P2 (t2i)
Transmitter
End
T (msec) BWeff (kbps)
T < 300 50
300 < T < 800 13
800 < T < 1600 4
1600 < T < 1900 2
T > 1900 1
36. 35
Rate Control in Audio and Video
Audio Codec (AMR-NB)
• If Effective Bandwidth > 4 kbps, bit rate = 5.15 kbps, else bit rate = 4.75 kbps
Video Codec (H.264)
• For complex scenes, only frame level control exhausts bit budget at the initial
frames as MAD increases.
• MB level control makes sure that bit budget is not exhausted because the whole
frame is not complex throughout.
Estimation of complexity of the video scene –
• Bit rate:
• Mean Absolute Difference (MAD) prediction model –
• Threshold for frame T (n) is defined as 80% of MADavg(n). Threshold selection
done based on 20 different test sequences of different resolutions using classical
decision theory.
• If in a frame, MADcb > T(n), for at least one MB in a frame, the frame is declared
as complex and MB is chosen as the basic unit for that frame.
• If in a frame, MADcb <= T(n), for all MBs, then basic unit is chosen as frame.
37. 36
Adaptive Packetization of Encoded Video/Audio
• Fragment Size N = (fr+H) = 1440 bytes (optimal size through experimentation)
• Transmission interval dt = (fr +H) / (1000*BWeff)
• A 9-byte header is added to each video fragment.
Frame type (1 byte) – I (Independent) or P (Predictive) frame.
Total sub-sequence number (1 byte) – total number of fragments in a video frame.
Sub sequence number (1 byte) – fragment number.
Sequence number (4 bytes) – video frame number.
Video payload size (2 bytes) – video bytes in the current fragment.
• Drop a frame iff a newer frame fragment arrives before all the fragments of the
current frame (improvement over RTP)
• Do not discard packets based on transit delay - probability of receiving good
packets on a slow network is increased (improvement over TCP).
• 20 msec AMR_NB audio frames - M (=10, matching 5 fps video) audio frames
aggregated and sent in a single UDP fragment.
• DTX (Discontinuous Transmission) enabled in the AMR-NB encode (VAD) - If the
silence period is more than D seconds then the audio transmission is
discontinued.
For a good channel (BWeff > 4 kbps), D > 10 seconds
For a bad channel ((BWeff <= 4 kbps), D = 3 to 5 seconds
38. 37
Problem Definition
Digital Watermarking
Requirement
Needs to be imperceptible and robust at the same time
Needs to have low-computational complexity
State-of-the-Art
Initial and classical works are on MPEG2 and not on H.264
Reported H.264 watermarking systems have high computational overhead
Reported works lack perceptual quality analysis and attack-robustness
analysis
.
Video Encryption
Requirement
Needs to have low-computational complexity, yet adequate security
State-of-the-Art
Uncompressed domain encryption - high decryption computational overhead
Reported H.264 video encryption works – no focus on computational
complexity
No work reported on Video Quality assessment after encryption/decryption
39. 38
Watermarking Algorithm Flow
Is the frame
IDR?
Y
N
N
Is it an even IDR?
Y
CONTINUE
Hash the previous GOP Check for Message Size
Embed Watermark (Hash Number
of previous GOP)
Find location for embedding
(image + data)
Embed Watermark (actual
message)
Find location for embedding (data)
40. 39
Digital Watermarking – Algorithm Details
Embed information in corresponding coefficient (10th or 15th bit
depending on bit-index being odd or even)
Data
Image
Message is image or
data
Diagonal SB? Ab-Diagonal
SB?
SKIP
Y YN N
• HxW logo image in binary format
and K byte text data information.
• Total number of bits to embed N =
HxW + K*8 - stored in an N byte
binary array (called wn).
• Wn quantized using same
quantization parameter (qp) used in
H.264 - quantized values stored in
array (wqn ).
• For each wqn, find the location of
embedding inside the image - Image
location mapped wqn is depicted as
M(u,v), where (u,v) denotes the
position in the DCT domain.
41. 40
Digital Watermarking - Attacks
.264 with
WM .YUV .YUV
.264
.YUV
Retrieved
Image / Text
H.264 Decoder
(without WM
detector)
H.264 Encoder
(without WM
embedding)
H.264 Decoder
(with WM
detector)
Measure of quality
for retrieved
Image / Text
Original
Image / Text
Attack
Report
Generation
6. Non-linear filtering attack
(NLFA)
7. Gaussian attack (GA)
8. Gama correction attack (GCA)
9. Histogram equalization attack
(HEA)
10.Laplacian attack (LEA)
1. Averaging attack (AA)
2. Circular averaging attack
(CAA)
3. Rotate attack (RoA)
4. Resize attack (RsA)
5. Frequency filtering attack
(FFA),
Video Quality
Comparison
42. 41
Watermarking Perceptual Video Quality after Attack
• Ten Quality Measures - Average Absolute Difference (AAD), Mean Square Error (MSE),
Normalised Mean Square Error (NMSE), Laplacian Mean Square Error (LMSE), Signal to
Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR), Image Fidelity (IF), Structural
Content (SC), Global Sigma Signal to Noise Ratio (GSSNR), Histogram Similarity (HS)
• Three pairs of videos – a) Two identical videos (high quality), b) Two completely different
videos (poor quality) and c) original video and compressed / decompressed video (average
quality)
• Weighted Quality Metric
W_VAL = ((AAD+GSSNR+LMSE+MSE+PSNR)*3+HS+IF+NMSE+SC+SNR)
• 14 test video streams were taken, subjected to different kinds to attacks and W_VAL
calculated 20 users requested to judge attacked and original watermarked video based on
perception.
• Judgement based on human vision psychology (HVS) – converted to a fuzzy Mean Opinion
Score (MOS based parameter Cqual
IF (W_VAL >= 90), Cqual = Excellent
ELSEIF (W_VAL >= 80), Cqual = Good
ELSEIF (W_VAL >= 75), Cqual = Average
ELSEIF (W_VAL >= 70), Cqual = Bad
ELSE Cqual = Poor
43. 42
Watermarking Retrieved Image and Text Quality
Image
• Normalized deviation parameter (de) from Euclidian distance d
• Bit error (be) - % of bits differing between retrieved and original binary image
• c - difference in crossing count of 0 to 1 of original and retrieved binary image,
Crossing count error
• Final error in retrieved image
• Based on MOS, decision logic for the quality of the retrieved image (Cimg)
IF e < 0.5 Cimg = Excellent
IF 5 > e > 0.5 Cimg = Good
IF 10 > e > 5 Cimg = Medium
IF 15 > e > 10 Cimg = Bad
ELSE Cimg = Poor
Text
• Compute mean error (te) of the Hamming distance and Levensthein distance
• MOS based retrieved text quality Ctxt -
IF te< 0.5 Ctxt = Excellent
IF 1 > te >0.5 Ctxt = Good
IF 3> te > 1 Ctxt = Medium
IF 5> te > 3 Ctxt = Bad
ELSE Ctxt = Poor
44. 43
Watermarking Perceptual Video Quality - Results
Attack W_VAL Cqual
AA 100 Excellent
CAA 52 Poor
FFA 25 Poor
GCA 27 Poor
GA 71 Bad
HEA 27 Poor
LEA 28 Poor
NLFA 25 Poor
RsA 100 Excellent
RoA 37 Poor Original Logo
AA
FFA
GCA
NLFA
HEA
LEA
RoA
RsA
Original Video
45. 44
Watermarking Decision Logic on Perceptual
Quality after Attacks
Video Quality Retrieved
Image Quality
Retrieved Text
Quality
Overall Measure of Goodness
Excellent or Good Excellent Excellent Excellent
Excellent or Good Excellent Good Good
Excellent or Good Good Excellent Good
Excellent or Good Good Good Good
Excellent or Good Medium Medium Medium
Excellent or Good Bad or Poor Medium Bad
Excellent or Good Medium Bad or Poor Bad
Excellent or Good Bad or Poor Bad or Poor Poor
Medium, Bad or Poor Any Any Attack degrades video quality
46. 45
Watermarking Results on Retrieved Quality after
Attacks
Attack be ce de e Image Quality
(CImg)
AA 0.000 0.000 0.000 0.000 Excellent
CAA 5.469 9.896 3.448 6.271 Medium
FFA 5.469 10.938 55.17223.860 Poor
GCA 0.781 1.563 3.448 1.931 Good
GA 4.948 9.896 24.13812.994 Bad
HEA 1.563 1.563 3.448 2.191 Good
LEA 1.823 2.083 0.000 1.302 Good
NLFA 5.729 10.417 13.793 9.980 Medium
RsA 0.000 0.000 0.000 0.000 Excellent
RoA 0.781 0.521 0.000 0.434 Excellent
Attack L H te Text Quality
(Ctxt)
AA 0 0 0.000 Excellent
CAA 6 1 3.5 Bad
FFA 6 1 3.5 Bad
GCA 0 1 .5 Good
GA 5 1 3 Bad
HEA 6 7 6.5 Poor
LEA 4 5 4.5 Bad
NLFA 6 1 3.5 Bad
RsA 0 0 0.000 Excellent
RoA 0 0 0.000 Excellent
47. 46
Encryption – Proposed System
Contribution
Low-computational-complexity
two-stage H.264 video encryption
algorithm
Separate Header Encryption
Reuse of the flexible macro-block
re-ordering (FMO) of H.264/AVC
as the encryption operator
Analysis of the effect of the
encryption-decryption chain on the
video quality
Important from end-user
experience perspective.
PSNR used as Quality Measure
End of Slice?
N
Encode Next frame
End of
Sequence?
Y
N
Encode Next
MB
Encode Next Slice
End of Frame?
Y
N
End
FMO to Get Next
MB Number
Proceed to next
Macroblock
Y
FMO
Modify MB ordering
using key based look-up
Get next MB number
48. 47
Encryption – Two Stage Algorithm
Read NALU
Read Control data Read Video data
Read Macroblock
NALU type = Control
Y N
SPS PPS IDR P SPS PPSP …. ….. IDRP
SPS PPS P P IDR PP …. ….. PP
Header Encryption
• First encrypt the SPS (Seq. Param. Set), PPS (Pict. Param. Set), IDR
• Encode the first frame using conventional H.264 encoder and take a 16-bit Key
(KU) and
• Take the length of IDR (lIDR) - It is a 16-bit number for QCIF resolution
• Define encryption key value KP using a Hash function of lIDR and KU
Modify macro-block ordering using key based look-up
• Use KP as seed to generate random sequence Le (between 0 to 97). This is used
for the first GOP. For subsequent GOPs, KP of the previous GOP is used as KU for
a new KP.
• MBs in an IDR frame encoded in the order specified by the look-up table Le.
49. 48
Encryption – Results
Security Analysis
Brute force attack – 98 possible Macroblock orders
98! = 9.42 x 10153 attempts
Actually, this is restricted by the 32 bit key used for generating the MB order - can
be generated in 232 ways, requiring half that number of attempts to decrypt
Key changed every GOP - proposed method is robust enough in comparison to
other reported similar methods
Operation # per GOP % Increase
ADD 2*24*97 + 3 = 4659 0.004
MULTIPLICATION 5*24*97 = 11,640 0.200
DIVISION 5 0
MODULO 4*24*97 + 4 = 9316 0
Resolution (wxh) Picture size in
MBs
Memory (bytes)
QCIF (176x144) 99 198
CIF (352x288) 396 792
VGA (640x480) 1200 2400
SDTV-525 (720x480) 1350 2700
SDTV-625 (720x576) 1620 3240
Video
Sequence
Size / frame (in bytes) PSNR (of Y component)
Without encryption With encryption Without encryption With encryption
Claire 155.125 158.16 39.03 39.03
Foreman 668.975 700.3 35.14 35.16
Hall monitor 264.82 268.705 36.97 36.96
Computational Complexity Analysis
Video Quality Analysis
50. 49
Textual Context from Broadcast TV –
Requirement
ANDHRA CM’S MISSING
CHOPPER MYSTERY
Andhra, CM, Missing
Chopper
Keyword
spotting
Search for
RSS feed
containing
related
information
Search
through
any Engine
for related
information
Display
Related
Information
on TV
Missing Chopper Found, CM Dead
51. 50
Textual Context in Static Pages – Proposed System for DTH
Contribution
Pre-processing and Enhancement
Noise Removal through low-pass
filtering on Y
Resolution Enhancement through
Interpolation-based zooming
Binarization and Touching Character
Segmentation
Adaptive Thresholding based
Binarization
Touching character segmentation
using width outlier detection
Use standard OCR tools like GOCR and
Tesseract
A-priori ROI Mapping
Pre-processing for noise removal and image
enhancement
Binarization and Touching Character
Segmentation
OCR using standard engines
52. 51
TV Channel Identity – Results
110 channels tested: r = 0.96 and p = 0.95.
• The channel logos with very small number of pixels are missed in 1% cases.
• For rest 3% misses – moving channel logo / changed logo.
• 3% false positive from small size logo – removed from template
• 2% false positive due to highly transparent logos
Recall and Precision Measures
Original Channel Detected As
Zee Trendz DD Ne
Zee Punjabi TV9 Gujarati
DD News DD Ne
Nick DD Ne
Nepal 1 Zee Cinema
Module Time (msec)
YUV to HSV 321.09
ROI mapping 0.08
Mean SAD
matching
293.65
Correlation 847.55
Changed Logo Examples False Positive Examples
.
Computational Complexity
53. 52
Textual Context from Broadcast TV– Proposed
System for News Mash-up from Internet
Breaking News Heuristics
• Breaking news always comes in Capital
Letter.
• Font size of breaking news is larger than
that of the ticker text
• They tend to appear on the central to
central-bottom part of the screen
Localization of suspected text regions
Text region confirmation using Temporal Consistency
Binarization
Text Recognition
Keyword Selection
Contribution
An improved method for text region
localization and screen layout
segmentation
Pre-processing techniques on the
text region (same as previous
section)
Heuristics based keyword spotting
algorithm followed by Google’s in
built dictionary-based correction
54. 53
Imag
e
Output of GOCR Output of Tesseract After Applying Proposed Algorithms
GOCR Tesseract
(a)
Sta_ring Govind_. Reem_ _n.
RajpaI Yadav. Om Puri.
Starring Guvinda, Rcema
Sen, Raipal Yadav, Om Puri.
Starring Govind_. Reem_ _n.
RajpaI Yadav. Om Puri.
Starring Guvinda. Reema Sen,
Raipal Yadav. Om Puri.
(b) _____ ___ ___ _________
____ __ __
Pluww SMS thu fnlluwmg
(adn In 56633
___ SMS th_ folIcmng cod_ to
S__
Planta SMS tha Iullmmng
mda tn 56633
(c) SmS YR SH to SMS YR SH in 56633 SmS YR SH to _____ SMS YR SH to 56533
(d) _m_ BD to _____ SMS BD to 56633 SMS BD to S____ SMS BD to 56633
(e) AM t___o_,_b _q____ AM 2048eb 141117 AM tOa_gb _q____ AM 2048eb 141117
(f) _M_= _ _A___ to Sd___ SMS: SC 34393 tn 56533 _M_= _ _A___ to Sd___ SMS: SC34393 tn 56633
g) _W _ ' _b _ Ib_lb _a W6.} 048abl;lbwzIb1a ___ __Y_b yIbw_Ib_a WP 2048ab Mlbwzlb 1 a
(h) ADD Ed_J to S____ ADD Eau to $6633 ADD Ed_J to S____ ADD Edu to 56633
(i) AIC STAlUSlS/OUO_
t_;OS;t_
AIC STATUS25/02/09 1
9:05:1 4
mlC S_ATUSlS/OUO_ t_;OS=tA A/C STATUS 25/02/09 1
9:05:14
(j) _ ________'__ Sub ID 1005681893 WbID_OOS_B_B__ Sub ID 1005681893
Screenshots of candidate ROI’s Accuracy of Text Detection
OCR Example Results
Source: Tata Sky DTH Service in India
Textual Context in Static Pages – Results
55. 54
Textual Context from Broadcast TV - Text Region
Localization
• Filter out low-contrast components - intensity based thresholding (output Vcont.).
• Count the number of Black pixels in a row in each row of Vcont.
• Let the number of Black pixels in ith row be defined as cntblack(i)
• Compute the average ( avg black ) number of Black pixels in a row as
avg 𝒃𝒍𝒂𝒄𝒌
= i=1
ht 𝒄𝒏𝒕 𝒃𝒍𝒂𝒄𝒌 i ht
where ht is the height of the frame.
• Compute the absolute variation av(i) in number of black pixels in a row from avgblack as
av(i) = abs(cntblack(i) – avgblack)
• Compute the average absolute variation ( aav ) as
𝒂𝒂𝒗= i=1
ht 𝒂𝒗 i ht
• Compute the threshold for marking the textual region as
𝑻𝑯 𝒕𝒙𝒕_𝒓𝒆𝒈 = 𝒂𝒗𝒈 𝒃𝒍𝒂𝒄𝒌 + 𝒂𝒂𝒗
• Mark all pixels in ith row of Vcont as white if
𝒄𝒏𝒕 𝒃𝒍𝒂𝒄𝒌 𝒊 < 𝑻𝑯 𝒕𝒙𝒕_𝒓𝒆𝒈
Confirmation Of The Text Regions Using Temporal Consistency
• Assumption that texts in the breaking news persist for some time.
• Vcont sometime contains noise because of some high contrast regions in the video frame
• In a typical video sequence with 30 FPS, one frame gets displayed for 33 msec.
• Assuming breaking news to be persistent for at least 2 seconds, all regions which are not
persistently present for more than 2 seconds can be filtered out.
56. 55
Textual Context from Broadcast TV - Post-
processing Heuristics
• Operate the OCR only in upper case
• If the number of words in a text line is above a heuristically obtained
threshold value they are considered as candidate text region.
• If multiple such text lines are obtained, chose a line near the bottom
• Remove the stop words (like a, an, the, for, of etc.) and correct the words
using a dictionary.
• Concatenate the remaining words to generate the search string for internet
search engine
Selected keyword can be given to Internet search engines using Web
APIs to fetch related news, which can be blended on top of TV video to
create a mash up between TV and Web.
Since search engines like Google already provide word correction,
thereby eliminating the requirement of dictionary based correction of
keywords.
57. 56
Textual Context from Broadcast TV – Results
Accuracy Of Text Localization
• Experimental results show a recall
rate of 100% and precision of 78%
• The reason behind a low precision
rate is tuning the parameters and
threshold values in a manner so
that the probability of false
negative (misses) is minimized.
• The final precision performance
can be only seen after applying
text recognition and keyword
selection algorithms
Accuracy Of Text Recognition
• in case of false positives a number of special
characters are coming as output of OCR.
• So the candidate texts having special character/
alphabet ratio > 1 are discarded.
• Moreover proposed keyword detection method
suggests that concentrating more on capital letters.
• So only the words in all capitals are kept under
consideration.
• It is found that character level accuracy of the
selected OCR for those cases in improves to
86.57%.
Accuracy of information retrieval
• Limitations of the OCR module can be overcome by having a strong dictionary or language
model.
• But in the proposed method this constraint is bypassed as the Google search engine itself
has one such strong module.
• So one simply gives the output of OCR to Google search engine and in turn Google gives the
option with actual text
58. 57
Proposed Algorithm for Keyboard Layout
Algorithm
• Total No. of Character Cells = T
• Total no. of rows of key-blocks = R
• Total no. of columns of key-blocks = C
• Total no. of Cells in a key-block = 4
• T = R x C x 4, Max. Keystrokes in worst
case K = (R+C+1) keystrokes.
• Hence, the desired solution boils down to
finding R and C for which K is minimum.
Start
numCharsSqrt = Square root of (T)
sqRootInt = Ceiling of (numCharsSqrt)
sqRootInt
even no. ?
N=sqRootInt N=sqRootInt + 1
Yes
No
N2= N - 2
C= N/ 2
(N* N2)>= T
R = N2/2 R = C
Yes
No
output R,C
Stop
Input T
Example
• “QWERTY” - T=54 : 4 rows, 14 columns
(14+4=18) keystrokes max.
• “PROPSED” - T=54, sqrtInt = 8, N=8.,
N2=6, C=4, (N*N2=48 < 54), R = C = 4.
Max 9 keystrokes
• Final Layout used - T=48, sqrtInt = 7, N=8,
N2=6, C=4,. (N*N2=48 == T=48) , R =
N2/2 = 3. Finally, R=3 and C=4. Max 8
keystrokes
59. 58
Onscreen Keyboard Results – User Study 1
QWERTY vs. Layout-1 (after practice)
QWERTY vs. Layout-1 (before practice)
1. Does the on-screen keyboard provide
enough assistance?
2. How is the ease of use of the on-
screen keyboard?
60. 59
Results – User Study 2 (KLM-GOMS Modelling)
Layouts % improvement
(Experiment)
% improvement
(predicted from KLM-
GOMS)
Layout 1 over
QWERTY
44.23 45.75
Layout 2 over
QWERTY
45 46.75
Layout 2 over
layout1
2 1.84
Layouts % improvement
(Experiment)
% improvement
(predicted from KLM-
GOMS)
Layout1 over
QWERTY
42.2 35.23
Layout2 over
QWERTY
43.4 37.2
Layout 2
over Layout1
3.18 2
• A total of 20 users
• Simple Text Entry and
Email Sending Task.
• Six phrase sets selected
randomly from standard
MacKenzie’s phrase set.
• Users were given initial
familiarization phrase and
then asked to enter six
phrases at one go.
• Time taken by each user
and the number of
keystrokes required to type
the phrase were recorded.
Simple Text Entry
Complete Email Typing and Sending Task
61. 60
Onscreen Keyboard Results – User Study 2
P - redefined as the total time taken in
finding a key and moving the focus to select
the block containing that particular key.
Layout 1 – 1.77 sec, Layout 2 – 1.73 sec
and QWERTY – 1.10 sec
H – not used
New parameter F - the time required for
finger movement – 0.22 sec
.
Op
erators
Description Time in
sec
P Pointing a pointing device 1.10
K Key or button press 0.20
H Move from mouse to
keyboard and back
0.40
M Mental preparation and
thinking time.
1.35
Operations Time
for
Layout-1
in sec
Time
for Layout-
2 in sec
Time
for
QWERTY
in sec
Open/close
onscreen keyboard
layout.
0.4 0.4 0.4
Find any key 1.07 1.03 1.1
Move focus to
select a key
0.7 0.7 2.6
Move finger
to the corner keys
0.2 0.2 0.2
Enter a
character using
keyboard
2.17 2.13 4.0
Sub-
goals
Time
for layout
1 in sec
Tim
e for
layout 2
in sec
Time
for
QWERTY
in sec
Open
browser
0.5 0.5 0.5
Open
gmail
server &
login
45.3 44.5 82.1
Compose
mail
68.0 65.8 115.1
Dispatch 0.4 0.4 0.4
KLM-GOMS Operators
KLM-GOMS sub-goals for Email Task