SlideShare uma empresa Scribd logo
1 de 61
0PhD. Thesis Defense – 9th April, 2013
Novel Applications for Emerging Markets
Using Television as a Ubiquitous Device
CTIF, Department of Electronic Systems
Arpan Pal
Principal Scientist and Head of Research
TCS Innovation Labs, Kolkata
Supervisor
Prof. Ramjee Prasad
1
Motivation - TV as a Ubiquitous Computing Device
“Ubiquitous computing enhances computer use by making many computers available throughout
the physical environment, but making them effectively invisible to the user” – Mark Weiser
Can TV be the Ubiquitous Screen for Home?
• It is available
• It is easy to use
Market Facts (India) Source – IMRB, ITU, IMAI (2009, 2010, 2011)
 PC – Low penetration - cost, skill and usability issues
 Mobile screen - good for social networking - not for
Information-heavy content
 Tablets and large-screen smartphones – still costly
 Television - low-cost of ownership, large screen real-estate
, familiarity of usage
Device Penetration (%) Internet (%)
PC 6.1% 4.2% (Home)
Mobile 61.4% 3.2%
Television 60% (Household) ??
 Internet Browser
 Media Player (Audio, Video, Image)
 Video Chat, SMS
 Remote Healthcare
 Distance Education
Pilot Deployment in India and Philippines
Home Infotainment Platform
2
User Study
User survey with HIP (Pilot version of “Dialog” launched by Tata Teleservices Ltd.) was conducted among the
city users in India – by TNS
Sample - 50 middle-class and lower middle-class families involving 50 working adults and 50 students (12-18 years age).
Confidence score measure - % of respondents
responding with a score of 4 or 5
3
User Study Findings
 Slow to very-slow internet connection affecting the user experience
• Video chat could not be tested – how to do acceptable quality video chat under low-QoS networks?
 Browser - Ease of use Issue and not much liking for TV-Internet blending
• How to make experience acceptable through TV-Internet Mash-ups?
 Preference of remote control for non-computer-savvy users and dislike of the QWERTY
layout for on-screen keyboard
• How to design a better text entry method using remote control?
 Additional multimedia content security requirement for remote healthcare and distance
education applications
• How to have low computational complexity Access Control and DRM schemes capable of
running on the constrained low-cost platform?
Problem Statement
Improving the user experience of TV as low-cost Internet-access device
through
 QoS-aware Video Transmissions
 Low-complexity Video Security
 TV context-aware Intelligent TV-Internet Mash-ups
 TV Remote-based on-screen keyboard for text entry
Challenges
 Resource Constrained
Platform
 Non-Computer-Savvy
Users
 Low-QoS Network
Video Chat over low-QoS Networks
Low-QoS
Network
Resource
Constrained
Platform
Low-cost
Infotainment
Platform
ICT
Infrastructure
Limitations
User
Experience
Requirements
Challenges
5
Problem Definition
 Video Chat
• Bandwidth-hungry with real-time packet delivery requirement
 2G wireless networks (CDMA1xRTT / GPRS)
• Poor bandwidth and high latency – fluctuates with time and place
Need for a adaptive rate control based video chat system with preference to
audio
State-of-the-Art Analysis
 Network Condition Estimation and Adaptation on WiFi and IP networks, not on
2G wireless where the latency is higher
 Adequate work not reported on low computational complexity rate control
algorithms for streaming video
 Adequate work not reported for optimal design for audio/video streaming
systems addressing the latency and perceptual quality issue
6
Proposed System
Contribution
 Network Sensing
• An experimental heuristics
based mapping of effective
bandwidth to probe packet delay
 Adaptation
• A low complexity video rate
control algorithm for H.264 CBR -
automatic switching between
frame/MB as basic units for
quantization based on video
complexity
• An adaptive fragmentation
scheme with frame-level
sequencing that minimizes
packet-delay based discards,
prioritizes audio and improves
perceived video quality
Video Chat Application
Middleware Framework
Rate Control - Encode Rate Control - Decode
Video
Encode
Audio
Encode
Audio
Decode
Video
Decode
Network
Sensing
Fragmentation Re-assembly
Underlying Network Stack (UDP/IP)
7
Results
Grandma @ 30kbpsAkiyo @ 25kbps
PSNR(indB)
PSNR(indB)
Network Type
Mean (kbps) Stdev (kbps)
ADSL-ADSL 596.14 203.45
Modem-
Modem 26.96 19.23
Modem-ADSL 18.13 3.21
Network Sensing
Adaptive Rate Control (QCIF, 5fps)
Adaptive Fragmentation
Feedback from 20 users about
overall experience using
a) Standard RTP based system
b) Proposed system
100% reported better audio
quality and better perceived
video quality
Low Complexity Video Security
Watermarking for DRM (Education)
Resource
Constrained
Platform
Low-cost
Infotainment
Platform
Socially Value-
Adding Apps
User Experience
Requirements
Challenges
9
Problem Definition
Digital Watermarking
Requirement
 Needs to be imperceptible and robust at the same time
 Needs to have low-computational complexity
State-of-the-Art
 Initial and classical works are on MPEG2 and not on H.264
 Reported H.264 watermarking systems have high computational overhead
 Reported works lack perceptual quality analysis and attack-robustness
analysis
.
Video Encryption
Requirement
 Needs to have low-computational complexity, yet adequate security
State-of-the-Art
 Uncompressed domain encryption - high decryption computational overhead
 Reported H.264 video encryption works – no focus on computational
complexity
 No work reported on Video Quality assessment after encryption/decryption
10
Digital Watermarking – Proposed System
Contribution
Robust, imperceptible yet low-
computational-complexity H.264
watermarking system
 Hash based integrity check
 Embed watermark reusing the
quantizer
 Embedding location carefully chosen
for imperceptibility
Evaluation methodology for
watermark attack and perceptual video
quality
 Peak-Signal-to-Noise-Ratio (PSNR)
based Imperceptibility
 Evaluation against 10 known attacks
 Mean-Opinion-Score (MoS) based
post-attack measurement of Video
Quality, Retrieved Image Quality and
Retrieved Text Quality
Input video
(YUV 4:2:0
format)
F/ n-1
(Reference)
F/ n
(Reconstructe
d)
De-blocking
Filter
Entropy
Encoder
ReorderQT
Inter (P)
Intra (I)
Q-1T-1
NAL
+
+
+
+
Extract
watermark
Embed
watermark
Best
Predictio
n Mode
and block
size
selection
Text
And
Image
Retrieved
Text
And
Image
11
Digital Watermarking – Results
Operation No. of Operations per GOP
ADD 2779
MULTIPLY 3564
DIVIDE 1980
MODULO 3564
CONDIONAL 7524
MEMORY I/O 1584
Function CPU Mega Cycles taken per
GOP (1 second)
Watermark Embedding 6.8
Watermark Extraction 3.8
Computational Complexity Video Quality after Watermark
Attack Video Quality Image Quality Text Quality Overall Performance against Attack
AA Excellent Excellent Excellent Excellent
CAA Poor Medium Bad Attack Degrades Video Quality
FFA Poor Poor Bad Attack Degrades Video Quality
GCA Poor Good Good Attack Degrades Video Quality
GA Bad Bad Bad Attack Degrades Video Quality
HEA Poor Good Poor Attack Degrades Video Quality
LEA Poor Good Bad Attack Degrades Video Quality
NLFA Poor Medium Bad Attack Degrades Video Quality
RsA Excellent Excellent Excellent Excellent
RoA Poor Excellent Excellent Attack Degrades Video Quality
Perceptual Video Quality after Attacks + Retrieved Image / Text Quality (14 streams, 20 users)
Context-aware Intelligent TV-Internet Mash-ups
TV Channel Identity as Context
Textual Context in Static TV pages
Textual Context embedded in Broadcast Video
Resource
Constrained
Platform
Low-cost
Infotainment
Platform
Television as the
Ubiquitous
Access Device
User
Experience
Non-Computer-
savvy Users
Requirements
Challenges
13
Requirement (Analog TV Context)
DTH / Cable
Set top Box
HIP
Video
Capture
&
Context
Extraction
Information
Mash-up
Engine
Television
Internet
RF in A/V in
Video
Graphics
A/V Out
ALPHA
Blending
Channel Identity – EPG, Viewership Rating Text on Static Pages – Return path on DTH
Text on Broadcast Video – News Mash-up
14
Problem Definition
TV Channel Identity
 Audio watermarking and audio signature based - need content modification or
non-real-time offline processing
 TV channel logo detection based
Reported works use PCA and ICA – computationally intensive and works only
on Static, opaque and rectangular logos, not on non-rectangular /
transparent / dynamic logos
Text on Broadcast Video
Challenge is in identifying the text against dynamically changing video background
 Pixel domain and Compressed domain
• Region Based (Connected Component Based (CC) and Edge Based (EB))
 Different methods work for varying kinds of texts against varying video
backgrounds
• Need for Hybrid Approach (Region and Texture, CC and EB)
• Text Area Localization and pre-processing remain the biggest challenge
Text on Static Pages
 Noisy Data with fixed fonts.
 Efficient pre-processing techniques is the main challenge
.
15
TV Channel Identity – Results
110 channels tested: r = 0.96 and p = 0.95.
• The channel logos with very small number of pixels are missed in 1% cases.
• For rest 3% misses – moving channel logo / changed logo.
• 3% false positive from small size logo – removed from template
• 2% false positive due to highly transparent logos
TV Channel Identity Recall and Precision
Text Detection on Static Pages
Textual Context from Broadcast TV
• 20 News Channels (5 min. duration each)
• Recall of 100% and Precision of 78% for the text
localization and OCR
• Precision Improves to 88.57% after Heuristics-based
keyword spotting
• Almost 100% precision after Google Dictionary
based post-processing
Novel On-screen Keyboard
Low-cost
Infotainment
Platform
Television as the
Ubiquitous
Access Device
User
Experience
Non-Computer-
savvy Users
Requirements
Challenges
17
Problem Definition
Requirements
• To provide cost-effective and easy-to-use text-entry mechanism for accessing
services like internet, email and short message services (SMS) from television.
• Full-fledged separate wireless keyboard (Bluetooth or RF) is costly
• Explore option of using infra-red remotes with on-screen keyboard on TV screen
State-of-the-Art
 Traditional “QWERTY” on-screen keyboards require a large number of keystrokes
to navigate which makes it cumbersome to use.
 Available on-screen keyboards do not address the usability aspect for a non-
computer-savvy user.
 Most of them are designed for cursor-based systems and not for traditional key-
based infra-red remotes which have relatively slow response time on key press.
 Insufficient user study and modeling for on-screen layouts
18
Proposed System
Contribution
 A novel formulation of on-screen layout
• Significantly reduces the number of
key strokes while typing (19 for
QWERTY, 9 for proposed)
 Formal methodology for user study
evaluation
• Popular Keystroke-level-model
(KLM) and Goal-Operator-Methods
(GOMS) model for formal
evaluation
 Extend the standard KLM operator set
to model the remote based operations
and hierarchical layouts.
• A finger movement operator
replacing standard pointing device
operator
Aa Ab
Ag Ah
19
Results -User Study 1 (Basic Benchmarking)
Users
25 users (diverse age / keyboard exposure)
Tasks
Users were asked to type - “The quick brown fox jumps over a lazy
dog” on SMS/Email and “www.google.com” as URL
Layout 1
QWERTY
QWERTY vs. Layout-1
Layout 2
Layouts % improvement
(Experiment)
% improvement
(predicted from KLM-
GOMS)
Layout 1 over
QWERTY
44.23 45.75
Layout 2 over
QWERTY
45 46.75
Layout 2 over
layout1
2 1.84
Layouts % improvement
(Experiment)
% improvement
(predicted from KLM-
GOMS)
Layout1
over QWERTY
42.2 35.23
Layout2
over QWERTY
43.4 37.2
Layout 2
over Layout1
3.18 2
20
Conclusions
Motivation – Why Television
 Validated through Market Data from India
Background – TCS Home Infotainment Platform (HIP)
 Internet Browser, Media Player, SMS, Video Chat, Remote Healthcare, Distance
Education
Requirement Analysis – Field Study
 Challenges - Slow Internet Speed, Non-computer-savvy users, Resource
Constrained Platform
How to Improve user experience of TV as low-cost Internet-access device
 Network-adaptive Video Rate Control and Packet Fragmentation Protocols for
improved Video Chat experience
 Low-computational-complexity Video Watermarking and Encryption for secure
multimedia content sharing
 TV context-aware Intelligent TV-Internet Mash-ups for improving the Internet
Browsing Experience on TV
 Remote-based on-screen keyboard for improved text entry on TV using remote
control
21
Publications
1. Arpan Pal, M. Prashant, Avik Ghose, Chirabrata Bhaumik, “Home Infotainment Platform – A Ubiquitous
Access Device for Masses”, Proceedings on Ubiquitous Computing and Multimedia Applications (UCMA),
Miyazaki, Japan, March 2010.
2. Dhiman Chattopadhyay, Aniruddha Sinha, T. Chattopadhyay, Arpan Pal, “Adaptive Rate Control for H.264
Based Video Conferencing Over a Low Bandwidth Wired and Wireless Channel”, IEEE International
Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Bilbao, Spain, May 2009.
3. Arpan Pal and T. Chattopadhyay, “A Novel, Low-Complexity Video Watermarking Scheme for H.264”,
Texas Instruments Developers Conference, Dallas, Texas, March 2007.
4. T. Chattopadhyay and Arpan Pal, “Two fold video encryption technique applicable to H.264 AVC”, IEEE
International Advance Computing Conference (IACC), Patiala, India, March 2009.
5. T. Chattopadhyay, Aniruddha Sinha, Arpan Pal, Debabrata Pradhan, Soumali Roychowdhury,
“Recognition of Channel Logos From Streamed Videos for Value Added Services in Connected TV”, IEEE
International Conference for Consumer Electronics (ICCE), Las Vegas, USA , January 2011.
6. T. Chattopadhyay, Arpan Pal, Utpal Garain, “Mash up of Breaking News and Contextual Web Information:
A Novel Service for Connected Television”, Proceedings of 19th International Conference on Computer
Communications and Networks (ICCCN), Zurich, Switzerland, August 2010.
7. T. Chattopadhyay, Aniruddha Sinha, Arpan Pal, “TV Video Context Extraction”, IEEE Trends and
Developments in Converging Technology towards 2020 (TENCON 2011), Bali, INDONESIA, November
21-24, 2011.
8. Arpan Pal, Chirabrata Bhaumik, Debnarayan Kar, Somnath Ghoshdastidar, Jasma Shukla, “A Novel On-
Screen Keyboard for Hierarchical Navigation with Reduced Number of Key Strokes”, IEEE International
Conference on Systems, Man and Cybernetics (SMC), San Antonio, Texas, October 2009.
9. Arpan Pal, Debatri Chatterjee, Debnarayan Kar, “Evaluation and Improvements of on-screen keyboard for
Television and Set-top Box”, IEEE International Symposium for Consumer Electronics (ISCE), Singapore,
June 2011.
22
Publications (contd…)
10. Arpan Pal, M. Prashant, Avik Ghose, Chirabrata Bhaumik, “Home Infotainment Platform – A Ubiquitous
Access Device for Masses”, Book Chapter in Springer Communications in Computer and Information
Science, Volume 75, 2010, Pages 11-19.DOI: 10.1007/978-3-642-13467-8.
11. Arpan Pal, Ramjee Prasad, Rohit Gupta, “A low-cost Connected TV platform for Emerging Markets–
Requirement Analysis through User Study”, Engineering Science and Technology: An International
Journal (ESTIJ), ISSN: 2250-3498, Vol.2, No.6, December 2012.
12. T. Chattopadhyay and Arpan Pal, “Watermarking for H.264 Video”, EE Times Design, Signal Processing
Design Line, November 2007.
13. Arpan Pal, Aniruddha Sinha and Tanushyam Chattopadhyay, “Recognition of Characters from Streaming
Videos”, Book Chapter in book: Character Recognition, Edited by Minoru Mori, Sciyo Publications, ISBN:
978-953-307-105-3, September 2010.
14. Arpan Pal, Tanushyam Chattopadhyay, Aniruddha Sinha and Ramjee Prasad, “The Context-aware
Television using Logo Detection and Character Recognition”, (Submitted) Springer Journal of Pattern
Analysis and Applications
15. Debatri Chatterjee, Aniruddha Sinha, Arpan Pal, Anupam Basu,, “An Iterative Methodology to Improve TV
Onscreen Keyboard Layout Design Through Evaluation of user Study”, Journal of Advances in
Computing, Vol.2, No.5, October 2012), Scientific and Academic Publishing (SAP), p-ISSN:2163-2944, e-
ISSN:2163-2979.
23
Future Work
Adaptive Video Chat
 Network sensing via IP statistics, Support multicast
Low-complexity Video Security
 Extending security analysis, Audio Watermarking
Internet-TV mash-up
 Extend to second-screens (mobile, tablets), seamless transfer of streaming content
to/from TV from/to mobile/tablets, Integrating social media into broadcast TV.
On-screen Keyboard
 Predictive keyboard integration, incorporating voice interfaces and gesture controls.
 Explore using mobile phones as the remote control for TV.
Generic
 Optimization the contradicting triad of requirements in form of cost, feature and
performance, in the face of ever improving hardware functionalities.
24
Learning from the CTIF-GISFI PhD Program
 Access to excellent and state-of-the-art technical course content from Aalborg
University
 Very useful Research Soft Skill courses
 Problem based Learning
 Professional Networking
 Qualitative Research Approaches
 Innovation Methodology
 Writing Scientific Papers
 Technical Support from CTIF faculty
 Culture of Practical Problem based Research
 Attending relevant conferences and EU FP7 related programs
 Networking with experts in related areas
Thank You
Questions?
arpan.pal@tcs.com
Supporting Slides
27
Introduction
Objective
 Improving the user experience of TV as a low-cost Internet-access device for
masses in emerging markets like India
• Motivation
• Background
• User Study Based Requirement Analysis
• Contributions
1) Improve the Quality of Experience (QoE) of Video Chat under poor network
conditions using network sensing
2) Provide computationally efficient yet sufficiently secure algorithms for
efficient access control and digital rights management (DRM) for sensitive
multimedia content
3) Improve the experience of browsing the Internet on TV through intelligently
understanding the context of the TV program being watched and blending
related information from the Internet (known as TV-Internet mash-up)
4) Improve the experience of text entry on TV using the remote control through
a novel design of an on-screen keyboard layout
28
Thesis Organization
ICT Infrastructure
Limitations
User
Experience
Resource Constrained
Platform
Low-QoS
Network
Non computer-savvy
users
TV context-aware
Intelligent TV-
Internet Mash-ups
Multimedia
Framework for Quick
Application
Development
Low complexity
Video Encryption and
Watermarking
Algorithms
Improved on-screen
keyboard layout for
text entry on TV
using remote
QoS-aware Video
Transmission
Low-cost
Infotainment
Platform
Television as a Ubiquitous
Display Device
Applications of Social
Value-add
Chapter 2 Chapter 4 Chapter 5 Chapter 6 Chapter 3
Scientific Contributions
Chapter 2
Chapter 2
Appendix A Appendix A
Appendix A
Engineering Contribution
Requirements
Challenges
29
Background - Home Infotainment Platform (HIP )
A/V Out
A/V in
 Internet Browser
 Media Player (Audio, Video, Image)
 Video Chat, SMS
 Remote Healthcare
 Distance Education
Pilot Deployment in India and Philippines
30
• In Philippines it is launched through Smart
Communications under the brand name of
“SmartBro SurfTV”.
HIP Deployment
 Tata Teleservices Ltd. launched HIP
under the brand name of “Dialog” in
Tamil Nadu and West Bengal.
TCS’ Home Infotainment Platform launched successfully in India and Philippines.
31
HIP Applications
Chipset - TI Da-Vinci
DM6446 (297 MHz
ARM9 core and 594 MHz
DSP)
RAM - 256MB DDR
Flash - 64MB NAND
OS - Embedded Linux
2.6.x
Browser - Opera for
devices 9.6 with Flash
Multimedia Codecs
Video -
H.264,MPEG1,
MPEG2, MPEG4
Audio - MP3,
AMR-NB, AAC,
OGG, FLAC
Image - JPEG
32
Socially Value-adding Apps – Healthcare and Education
ECG
Blood Pressure Monitor
Pulse OxyMeter
HIP
Patient
Records
Health Center / Home
Expert Doctor
Digital Stethoscope
33
HIP Application Development Framework
Network
Microphone
Camera
A/V in
Storage
USB
Compress
Decompress
Multiplex
Demux
Render
Blend
Network
VGA
TV Video
TV Audio
Headphone
Storage
Control APIs
SRC PROC SINK
Applications
Application SRC PROC SINK
Video from Internet Network Demux – Decompress TV Video / Audio
Media player Storage Demux – Decompress TV Video / Audio
Video Chat (Far View) Network Demux - Decompress TV Video /
Headphone
Video Chat (Near View) Camera and Microphone Compress – Multiplex Network
Internet Browser Network Render TV Video
Remote Healthcare USB Compress - Multiplex Network
Distance Education -
Lecture Recording
A/V in Demux – Decompress,
Remux - Compress
Storage
Lecture Playback Storage Demux – Decompress TV Video / Audio
QA and Course Guide Network Demux Parser TV Video / Audio
34
Sensing of Network Condition
T = average (RTT (P1, t1i), RTT (P2, t2i)) .
Heuristic based mapping of Effective bandwidth of the network based on
experimentation on the real network (CDMA1xRTT using Tata Docomo Photon) at
different times of the day and at different places
RTT(P1, t1i)
RTT(P2, t2i)
P1(t1i) P2 (t2i)
Transmitter
End
T (msec) BWeff (kbps)
T < 300 50
300 < T < 800 13
800 < T < 1600 4
1600 < T < 1900 2
T > 1900 1
35
Rate Control in Audio and Video
Audio Codec (AMR-NB)
• If Effective Bandwidth > 4 kbps, bit rate = 5.15 kbps, else bit rate = 4.75 kbps
Video Codec (H.264)
• For complex scenes, only frame level control exhausts bit budget at the initial
frames as MAD increases.
• MB level control makes sure that bit budget is not exhausted because the whole
frame is not complex throughout.
Estimation of complexity of the video scene –
• Bit rate:
• Mean Absolute Difference (MAD) prediction model –
• Threshold for frame T (n) is defined as 80% of MADavg(n). Threshold selection
done based on 20 different test sequences of different resolutions using classical
decision theory.
• If in a frame, MADcb > T(n), for at least one MB in a frame, the frame is declared
as complex and MB is chosen as the basic unit for that frame.
• If in a frame, MADcb <= T(n), for all MBs, then basic unit is chosen as frame.
36
Adaptive Packetization of Encoded Video/Audio
• Fragment Size N = (fr+H) = 1440 bytes (optimal size through experimentation)
• Transmission interval dt = (fr +H) / (1000*BWeff)
• A 9-byte header is added to each video fragment.
 Frame type (1 byte) – I (Independent) or P (Predictive) frame.
 Total sub-sequence number (1 byte) – total number of fragments in a video frame.
 Sub sequence number (1 byte) – fragment number.
 Sequence number (4 bytes) – video frame number.
 Video payload size (2 bytes) – video bytes in the current fragment.
• Drop a frame iff a newer frame fragment arrives before all the fragments of the
current frame (improvement over RTP)
• Do not discard packets based on transit delay - probability of receiving good
packets on a slow network is increased (improvement over TCP).
• 20 msec AMR_NB audio frames - M (=10, matching 5 fps video) audio frames
aggregated and sent in a single UDP fragment.
• DTX (Discontinuous Transmission) enabled in the AMR-NB encode (VAD) - If the
silence period is more than D seconds then the audio transmission is
discontinued.
 For a good channel (BWeff > 4 kbps), D > 10 seconds
 For a bad channel ((BWeff <= 4 kbps), D = 3 to 5 seconds
37
Problem Definition
Digital Watermarking
Requirement
 Needs to be imperceptible and robust at the same time
 Needs to have low-computational complexity
State-of-the-Art
 Initial and classical works are on MPEG2 and not on H.264
 Reported H.264 watermarking systems have high computational overhead
 Reported works lack perceptual quality analysis and attack-robustness
analysis
.
Video Encryption
Requirement
 Needs to have low-computational complexity, yet adequate security
State-of-the-Art
 Uncompressed domain encryption - high decryption computational overhead
 Reported H.264 video encryption works – no focus on computational
complexity
 No work reported on Video Quality assessment after encryption/decryption
38
Watermarking Algorithm Flow
Is the frame
IDR?
Y
N
N
Is it an even IDR?
Y
CONTINUE
Hash the previous GOP Check for Message Size
Embed Watermark (Hash Number
of previous GOP)
Find location for embedding
(image + data)
Embed Watermark (actual
message)
Find location for embedding (data)
39
Digital Watermarking – Algorithm Details
Embed information in corresponding coefficient (10th or 15th bit
depending on bit-index being odd or even)
Data
Image
Message is image or
data
Diagonal SB? Ab-Diagonal
SB?
SKIP
Y YN N
• HxW logo image in binary format
and K byte text data information.
• Total number of bits to embed N =
HxW + K*8 - stored in an N byte
binary array (called wn).
• Wn quantized using same
quantization parameter (qp) used in
H.264 - quantized values stored in
array (wqn ).
• For each wqn, find the location of
embedding inside the image - Image
location mapped wqn is depicted as
M(u,v), where (u,v) denotes the
position in the DCT domain.
40
Digital Watermarking - Attacks
.264 with
WM .YUV .YUV
.264
.YUV
Retrieved
Image / Text
H.264 Decoder
(without WM
detector)
H.264 Encoder
(without WM
embedding)
H.264 Decoder
(with WM
detector)
Measure of quality
for retrieved
Image / Text
Original
Image / Text
Attack
Report
Generation
6. Non-linear filtering attack
(NLFA)
7. Gaussian attack (GA)
8. Gama correction attack (GCA)
9. Histogram equalization attack
(HEA)
10.Laplacian attack (LEA)
1. Averaging attack (AA)
2. Circular averaging attack
(CAA)
3. Rotate attack (RoA)
4. Resize attack (RsA)
5. Frequency filtering attack
(FFA),
Video Quality
Comparison
41
Watermarking Perceptual Video Quality after Attack
• Ten Quality Measures - Average Absolute Difference (AAD), Mean Square Error (MSE),
Normalised Mean Square Error (NMSE), Laplacian Mean Square Error (LMSE), Signal to
Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR), Image Fidelity (IF), Structural
Content (SC), Global Sigma Signal to Noise Ratio (GSSNR), Histogram Similarity (HS)
• Three pairs of videos – a) Two identical videos (high quality), b) Two completely different
videos (poor quality) and c) original video and compressed / decompressed video (average
quality)
• Weighted Quality Metric
W_VAL = ((AAD+GSSNR+LMSE+MSE+PSNR)*3+HS+IF+NMSE+SC+SNR)
• 14 test video streams were taken, subjected to different kinds to attacks and W_VAL
calculated 20 users requested to judge attacked and original watermarked video based on
perception.
• Judgement based on human vision psychology (HVS) – converted to a fuzzy Mean Opinion
Score (MOS based parameter Cqual
IF (W_VAL >= 90), Cqual = Excellent
ELSEIF (W_VAL >= 80), Cqual = Good
ELSEIF (W_VAL >= 75), Cqual = Average
ELSEIF (W_VAL >= 70), Cqual = Bad
ELSE Cqual = Poor
42
Watermarking Retrieved Image and Text Quality
Image
• Normalized deviation parameter (de) from Euclidian distance d
• Bit error (be) - % of bits differing between retrieved and original binary image
• c - difference in crossing count of 0 to 1 of original and retrieved binary image,
Crossing count error
• Final error in retrieved image
• Based on MOS, decision logic for the quality of the retrieved image (Cimg)
IF e < 0.5 Cimg = Excellent
IF 5 > e > 0.5 Cimg = Good
IF 10 > e > 5 Cimg = Medium
IF 15 > e > 10 Cimg = Bad
ELSE Cimg = Poor
Text
• Compute mean error (te) of the Hamming distance and Levensthein distance
• MOS based retrieved text quality Ctxt -
IF te< 0.5 Ctxt = Excellent
IF 1 > te >0.5 Ctxt = Good
IF 3> te > 1 Ctxt = Medium
IF 5> te > 3 Ctxt = Bad
ELSE Ctxt = Poor
43
Watermarking Perceptual Video Quality - Results
Attack W_VAL Cqual
AA 100 Excellent
CAA 52 Poor
FFA 25 Poor
GCA 27 Poor
GA 71 Bad
HEA 27 Poor
LEA 28 Poor
NLFA 25 Poor
RsA 100 Excellent
RoA 37 Poor Original Logo
AA
FFA
GCA
NLFA
HEA
LEA
RoA
RsA
Original Video
44
Watermarking Decision Logic on Perceptual
Quality after Attacks
Video Quality Retrieved
Image Quality
Retrieved Text
Quality
Overall Measure of Goodness
Excellent or Good Excellent Excellent Excellent
Excellent or Good Excellent Good Good
Excellent or Good Good Excellent Good
Excellent or Good Good Good Good
Excellent or Good Medium Medium Medium
Excellent or Good Bad or Poor Medium Bad
Excellent or Good Medium Bad or Poor Bad
Excellent or Good Bad or Poor Bad or Poor Poor
Medium, Bad or Poor Any Any Attack degrades video quality
45
Watermarking Results on Retrieved Quality after
Attacks
Attack be ce de e Image Quality
(CImg)
AA 0.000 0.000 0.000 0.000 Excellent
CAA 5.469 9.896 3.448 6.271 Medium
FFA 5.469 10.938 55.17223.860 Poor
GCA 0.781 1.563 3.448 1.931 Good
GA 4.948 9.896 24.13812.994 Bad
HEA 1.563 1.563 3.448 2.191 Good
LEA 1.823 2.083 0.000 1.302 Good
NLFA 5.729 10.417 13.793 9.980 Medium
RsA 0.000 0.000 0.000 0.000 Excellent
RoA 0.781 0.521 0.000 0.434 Excellent
Attack L H te Text Quality
(Ctxt)
AA 0 0 0.000 Excellent
CAA 6 1 3.5 Bad
FFA 6 1 3.5 Bad
GCA 0 1 .5 Good
GA 5 1 3 Bad
HEA 6 7 6.5 Poor
LEA 4 5 4.5 Bad
NLFA 6 1 3.5 Bad
RsA 0 0 0.000 Excellent
RoA 0 0 0.000 Excellent
46
Encryption – Proposed System
Contribution
Low-computational-complexity
two-stage H.264 video encryption
algorithm
 Separate Header Encryption
 Reuse of the flexible macro-block
re-ordering (FMO) of H.264/AVC
as the encryption operator
Analysis of the effect of the
encryption-decryption chain on the
video quality
 Important from end-user
experience perspective.
 PSNR used as Quality Measure
End of Slice?
N
Encode Next frame
End of
Sequence?
Y
N
Encode Next
MB
Encode Next Slice
End of Frame?
Y
N
End
FMO to Get Next
MB Number
Proceed to next
Macroblock
Y
FMO
Modify MB ordering
using key based look-up
Get next MB number
47
Encryption – Two Stage Algorithm
Read NALU
Read Control data Read Video data
Read Macroblock
NALU type = Control
Y N
SPS PPS IDR P SPS PPSP …. ….. IDRP
SPS PPS P P IDR PP …. ….. PP
Header Encryption
• First encrypt the SPS (Seq. Param. Set), PPS (Pict. Param. Set), IDR
• Encode the first frame using conventional H.264 encoder and take a 16-bit Key
(KU) and
• Take the length of IDR (lIDR) - It is a 16-bit number for QCIF resolution
• Define encryption key value KP using a Hash function of lIDR and KU
Modify macro-block ordering using key based look-up
• Use KP as seed to generate random sequence Le (between 0 to 97). This is used
for the first GOP. For subsequent GOPs, KP of the previous GOP is used as KU for
a new KP.
• MBs in an IDR frame encoded in the order specified by the look-up table Le.
48
Encryption – Results
Security Analysis
 Brute force attack – 98 possible Macroblock orders
98! = 9.42 x 10153 attempts
 Actually, this is restricted by the 32 bit key used for generating the MB order - can
be generated in 232 ways, requiring half that number of attempts to decrypt
 Key changed every GOP - proposed method is robust enough in comparison to
other reported similar methods
Operation # per GOP % Increase
ADD 2*24*97 + 3 = 4659 0.004
MULTIPLICATION 5*24*97 = 11,640 0.200
DIVISION 5 0
MODULO 4*24*97 + 4 = 9316 0
Resolution (wxh) Picture size in
MBs
Memory (bytes)
QCIF (176x144) 99 198
CIF (352x288) 396 792
VGA (640x480) 1200 2400
SDTV-525 (720x480) 1350 2700
SDTV-625 (720x576) 1620 3240
Video
Sequence
Size / frame (in bytes) PSNR (of Y component)
Without encryption With encryption Without encryption With encryption
Claire 155.125 158.16 39.03 39.03
Foreman 668.975 700.3 35.14 35.16
Hall monitor 264.82 268.705 36.97 36.96
Computational Complexity Analysis
Video Quality Analysis
49
Textual Context from Broadcast TV –
Requirement
ANDHRA CM’S MISSING
CHOPPER MYSTERY
Andhra, CM, Missing
Chopper
Keyword
spotting
Search for
RSS feed
containing
related
information
Search
through
any Engine
for related
information
Display
Related
Information
on TV
Missing Chopper Found, CM Dead
50
Textual Context in Static Pages – Proposed System for DTH
Contribution
Pre-processing and Enhancement
 Noise Removal through low-pass
filtering on Y
 Resolution Enhancement through
Interpolation-based zooming
Binarization and Touching Character
Segmentation
 Adaptive Thresholding based
Binarization
 Touching character segmentation
using width outlier detection
Use standard OCR tools like GOCR and
Tesseract
A-priori ROI Mapping
Pre-processing for noise removal and image
enhancement
Binarization and Touching Character
Segmentation
OCR using standard engines
51
TV Channel Identity – Results
110 channels tested: r = 0.96 and p = 0.95.
• The channel logos with very small number of pixels are missed in 1% cases.
• For rest 3% misses – moving channel logo / changed logo.
• 3% false positive from small size logo – removed from template
• 2% false positive due to highly transparent logos
Recall and Precision Measures
Original Channel Detected As
Zee Trendz DD Ne
Zee Punjabi TV9 Gujarati
DD News DD Ne
Nick DD Ne
Nepal 1 Zee Cinema
Module Time (msec)
YUV to HSV 321.09
ROI mapping 0.08
Mean SAD
matching
293.65
Correlation 847.55
Changed Logo Examples False Positive Examples
.
Computational Complexity
52
Textual Context from Broadcast TV– Proposed
System for News Mash-up from Internet
Breaking News Heuristics
• Breaking news always comes in Capital
Letter.
• Font size of breaking news is larger than
that of the ticker text
• They tend to appear on the central to
central-bottom part of the screen
Localization of suspected text regions
Text region confirmation using Temporal Consistency
Binarization
Text Recognition
Keyword Selection
Contribution
 An improved method for text region
localization and screen layout
segmentation
 Pre-processing techniques on the
text region (same as previous
section)
 Heuristics based keyword spotting
algorithm followed by Google’s in
built dictionary-based correction
53
Imag
e
Output of GOCR Output of Tesseract After Applying Proposed Algorithms
GOCR Tesseract
(a)
Sta_ring Govind_. Reem_ _n.
RajpaI Yadav. Om Puri.
Starring Guvinda, Rcema
Sen, Raipal Yadav, Om Puri.
Starring Govind_. Reem_ _n.
RajpaI Yadav. Om Puri.
Starring Guvinda. Reema Sen,
Raipal Yadav. Om Puri.
(b) _____ ___ ___ _________
____ __ __
Pluww SMS thu fnlluwmg
(adn In 56633
___ SMS th_ folIcmng cod_ to
S__
Planta SMS tha Iullmmng
mda tn 56633
(c) SmS YR SH to SMS YR SH in 56633 SmS YR SH to _____ SMS YR SH to 56533
(d) _m_ BD to _____ SMS BD to 56633 SMS BD to S____ SMS BD to 56633
(e) AM t___o_,_b _q____ AM 2048eb 141117 AM tOa_gb _q____ AM 2048eb 141117
(f) _M_= _ _A___ to Sd___ SMS: SC 34393 tn 56533 _M_= _ _A___ to Sd___ SMS: SC34393 tn 56633
g) _W _ ' _b _ Ib_lb _a W6.} 048abl;lbwzIb1a ___ __Y_b yIbw_Ib_a WP 2048ab Mlbwzlb 1 a
(h) ADD Ed_J to S____ ADD Eau to $6633 ADD Ed_J to S____ ADD Edu to 56633
(i) AIC STAlUSlS/OUO_
t_;OS;t_
AIC STATUS25/02/09 1
9:05:1 4
mlC S_ATUSlS/OUO_ t_;OS=tA A/C STATUS 25/02/09 1
9:05:14
(j) _ ________'__ Sub ID 1005681893 WbID_OOS_B_B__ Sub ID 1005681893
Screenshots of candidate ROI’s Accuracy of Text Detection
OCR Example Results
Source: Tata Sky DTH Service in India
Textual Context in Static Pages – Results
54
Textual Context from Broadcast TV - Text Region
Localization
• Filter out low-contrast components - intensity based thresholding (output Vcont.).
• Count the number of Black pixels in a row in each row of Vcont.
• Let the number of Black pixels in ith row be defined as cntblack(i)
• Compute the average ( avg black ) number of Black pixels in a row as
avg 𝒃𝒍𝒂𝒄𝒌
= i=1
ht 𝒄𝒏𝒕 𝒃𝒍𝒂𝒄𝒌 i ht
where ht is the height of the frame.
• Compute the absolute variation av(i) in number of black pixels in a row from avgblack as
av(i) = abs(cntblack(i) – avgblack)
• Compute the average absolute variation ( aav ) as
𝒂𝒂𝒗= i=1
ht 𝒂𝒗 i ht
• Compute the threshold for marking the textual region as
𝑻𝑯 𝒕𝒙𝒕_𝒓𝒆𝒈 = 𝒂𝒗𝒈 𝒃𝒍𝒂𝒄𝒌 + 𝒂𝒂𝒗
• Mark all pixels in ith row of Vcont as white if
𝒄𝒏𝒕 𝒃𝒍𝒂𝒄𝒌 𝒊 < 𝑻𝑯 𝒕𝒙𝒕_𝒓𝒆𝒈
Confirmation Of The Text Regions Using Temporal Consistency
• Assumption that texts in the breaking news persist for some time.
• Vcont sometime contains noise because of some high contrast regions in the video frame
• In a typical video sequence with 30 FPS, one frame gets displayed for 33 msec.
• Assuming breaking news to be persistent for at least 2 seconds, all regions which are not
persistently present for more than 2 seconds can be filtered out.
55
Textual Context from Broadcast TV - Post-
processing Heuristics
• Operate the OCR only in upper case
• If the number of words in a text line is above a heuristically obtained
threshold value they are considered as candidate text region.
• If multiple such text lines are obtained, chose a line near the bottom
• Remove the stop words (like a, an, the, for, of etc.) and correct the words
using a dictionary.
• Concatenate the remaining words to generate the search string for internet
search engine
 Selected keyword can be given to Internet search engines using Web
APIs to fetch related news, which can be blended on top of TV video to
create a mash up between TV and Web.
 Since search engines like Google already provide word correction,
thereby eliminating the requirement of dictionary based correction of
keywords.
56
Textual Context from Broadcast TV – Results
Accuracy Of Text Localization
• Experimental results show a recall
rate of 100% and precision of 78%
• The reason behind a low precision
rate is tuning the parameters and
threshold values in a manner so
that the probability of false
negative (misses) is minimized.
• The final precision performance
can be only seen after applying
text recognition and keyword
selection algorithms
Accuracy Of Text Recognition
• in case of false positives a number of special
characters are coming as output of OCR.
• So the candidate texts having special character/
alphabet ratio > 1 are discarded.
• Moreover proposed keyword detection method
suggests that concentrating more on capital letters.
• So only the words in all capitals are kept under
consideration.
• It is found that character level accuracy of the
selected OCR for those cases in improves to
86.57%.
Accuracy of information retrieval
• Limitations of the OCR module can be overcome by having a strong dictionary or language
model.
• But in the proposed method this constraint is bypassed as the Google search engine itself
has one such strong module.
• So one simply gives the output of OCR to Google search engine and in turn Google gives the
option with actual text
57
Proposed Algorithm for Keyboard Layout
Algorithm
• Total No. of Character Cells = T
• Total no. of rows of key-blocks = R
• Total no. of columns of key-blocks = C
• Total no. of Cells in a key-block = 4
• T = R x C x 4, Max. Keystrokes in worst
case K = (R+C+1) keystrokes.
• Hence, the desired solution boils down to
finding R and C for which K is minimum.
Start
numCharsSqrt = Square root of (T)
sqRootInt = Ceiling of (numCharsSqrt)
sqRootInt
even no. ?
N=sqRootInt N=sqRootInt + 1
Yes
No
N2= N - 2
C= N/ 2
(N* N2)>= T
R = N2/2 R = C
Yes
No
output R,C
Stop
Input T
Example
• “QWERTY” - T=54 : 4 rows, 14 columns
(14+4=18) keystrokes max.
• “PROPSED” - T=54, sqrtInt = 8, N=8.,
N2=6, C=4, (N*N2=48 < 54), R = C = 4.
Max 9 keystrokes
• Final Layout used - T=48, sqrtInt = 7, N=8,
N2=6, C=4,. (N*N2=48 == T=48) , R =
N2/2 = 3. Finally, R=3 and C=4. Max 8
keystrokes
58
Onscreen Keyboard Results – User Study 1
QWERTY vs. Layout-1 (after practice)
QWERTY vs. Layout-1 (before practice)
1. Does the on-screen keyboard provide
enough assistance?
2. How is the ease of use of the on-
screen keyboard?
59
Results – User Study 2 (KLM-GOMS Modelling)
Layouts % improvement
(Experiment)
% improvement
(predicted from KLM-
GOMS)
Layout 1 over
QWERTY
44.23 45.75
Layout 2 over
QWERTY
45 46.75
Layout 2 over
layout1
2 1.84
Layouts % improvement
(Experiment)
% improvement
(predicted from KLM-
GOMS)
Layout1 over
QWERTY
42.2 35.23
Layout2 over
QWERTY
43.4 37.2
Layout 2
over Layout1
3.18 2
• A total of 20 users
• Simple Text Entry and
Email Sending Task.
• Six phrase sets selected
randomly from standard
MacKenzie’s phrase set.
• Users were given initial
familiarization phrase and
then asked to enter six
phrases at one go.
• Time taken by each user
and the number of
keystrokes required to type
the phrase were recorded.
Simple Text Entry
Complete Email Typing and Sending Task
60
Onscreen Keyboard Results – User Study 2
P - redefined as the total time taken in
finding a key and moving the focus to select
the block containing that particular key.
Layout 1 – 1.77 sec, Layout 2 – 1.73 sec
and QWERTY – 1.10 sec
H – not used
New parameter F - the time required for
finger movement – 0.22 sec
.
Op
erators
Description Time in
sec
P Pointing a pointing device 1.10
K Key or button press 0.20
H Move from mouse to
keyboard and back
0.40
M Mental preparation and
thinking time.
1.35
Operations Time
for
Layout-1
in sec
Time
for Layout-
2 in sec
Time
for
QWERTY
in sec
Open/close
onscreen keyboard
layout.
0.4 0.4 0.4
Find any key 1.07 1.03 1.1
Move focus to
select a key
0.7 0.7 2.6
Move finger
to the corner keys
0.2 0.2 0.2
Enter a
character using
keyboard
2.17 2.13 4.0
Sub-
goals
Time
for layout
1 in sec
Tim
e for
layout 2
in sec
Time
for
QWERTY
in sec
Open
browser
0.5 0.5 0.5
Open
gmail
server &
login
45.3 44.5 82.1
Compose
mail
68.0 65.8 115.1
Dispatch 0.4 0.4 0.4
KLM-GOMS Operators
KLM-GOMS sub-goals for Email Task

Mais conteúdo relacionado

Mais procurados

Nemo handy s-3.60_display_guide_for_n97_c503_c504_c7_e7_500
Nemo handy s-3.60_display_guide_for_n97_c503_c504_c7_e7_500Nemo handy s-3.60_display_guide_for_n97_c503_c504_c7_e7_500
Nemo handy s-3.60_display_guide_for_n97_c503_c504_c7_e7_500Serkan Kahveci
 
Automated Voice And Audio Quality Test Measurement
Automated Voice And Audio Quality Test MeasurementAutomated Voice And Audio Quality Test Measurement
Automated Voice And Audio Quality Test Measurementguest5a90cfc
 
System aspects of the 3GPP evolution towards enhanced voice services
System aspects of the 3GPP evolution towards enhanced voice services System aspects of the 3GPP evolution towards enhanced voice services
System aspects of the 3GPP evolution towards enhanced voice services Ericsson
 
VoLTE video telephony – a communication evolution
VoLTE video telephony – a communication evolution VoLTE video telephony – a communication evolution
VoLTE video telephony – a communication evolution IMTC
 
Automotive Infotainment Test Solution or In-Vehicle Infotainment Testing (IVI...
Automotive Infotainment Test Solution or In-Vehicle Infotainment Testing (IVI...Automotive Infotainment Test Solution or In-Vehicle Infotainment Testing (IVI...
Automotive Infotainment Test Solution or In-Vehicle Infotainment Testing (IVI...MaxEye Technologies Private Limited
 
Analysis and reporting
Analysis and reportingAnalysis and reporting
Analysis and reportingKadar setiawan
 
Circuitos de Video Vigilancia IP
Circuitos de Video Vigilancia IPCircuitos de Video Vigilancia IP
Circuitos de Video Vigilancia IPguest022763
 
Sales Presentation 10 01 09
Sales Presentation 10 01 09Sales Presentation 10 01 09
Sales Presentation 10 01 09thomaspullen
 
Collaborative conferencing options available to LTER Network ...
Collaborative conferencing options available to LTER Network ...Collaborative conferencing options available to LTER Network ...
Collaborative conferencing options available to LTER Network ...Videoguy
 
IBM VideoCharger and Digital Library MediaBase.doc
IBM VideoCharger and Digital Library MediaBase.docIBM VideoCharger and Digital Library MediaBase.doc
IBM VideoCharger and Digital Library MediaBase.docVideoguy
 
Voice, Video and LAN solutions for the Educational Market
Voice, Video and LAN solutions for the Educational MarketVoice, Video and LAN solutions for the Educational Market
Voice, Video and LAN solutions for the Educational MarketVideoguy
 

Mais procurados (17)

Cm1000 brochure
Cm1000 brochureCm1000 brochure
Cm1000 brochure
 
Cambium networks presentation
Cambium networks presentationCambium networks presentation
Cambium networks presentation
 
Nemo handy s-3.60_display_guide_for_n97_c503_c504_c7_e7_500
Nemo handy s-3.60_display_guide_for_n97_c503_c504_c7_e7_500Nemo handy s-3.60_display_guide_for_n97_c503_c504_c7_e7_500
Nemo handy s-3.60_display_guide_for_n97_c503_c504_c7_e7_500
 
Automated Voice And Audio Quality Test Measurement
Automated Voice And Audio Quality Test MeasurementAutomated Voice And Audio Quality Test Measurement
Automated Voice And Audio Quality Test Measurement
 
System aspects of the 3GPP evolution towards enhanced voice services
System aspects of the 3GPP evolution towards enhanced voice services System aspects of the 3GPP evolution towards enhanced voice services
System aspects of the 3GPP evolution towards enhanced voice services
 
VoLTE video telephony – a communication evolution
VoLTE video telephony – a communication evolution VoLTE video telephony – a communication evolution
VoLTE video telephony – a communication evolution
 
Rws 120008
Rws 120008Rws 120008
Rws 120008
 
Automotive Infotainment Test Solution or In-Vehicle Infotainment Testing (IVI...
Automotive Infotainment Test Solution or In-Vehicle Infotainment Testing (IVI...Automotive Infotainment Test Solution or In-Vehicle Infotainment Testing (IVI...
Automotive Infotainment Test Solution or In-Vehicle Infotainment Testing (IVI...
 
Analysis and reporting
Analysis and reportingAnalysis and reporting
Analysis and reporting
 
Circuitos de Video Vigilancia IP
Circuitos de Video Vigilancia IPCircuitos de Video Vigilancia IP
Circuitos de Video Vigilancia IP
 
moip
moipmoip
moip
 
Portfolio
PortfolioPortfolio
Portfolio
 
Wideband AMR HD Voice
Wideband AMR HD VoiceWideband AMR HD Voice
Wideband AMR HD Voice
 
Sales Presentation 10 01 09
Sales Presentation 10 01 09Sales Presentation 10 01 09
Sales Presentation 10 01 09
 
Collaborative conferencing options available to LTER Network ...
Collaborative conferencing options available to LTER Network ...Collaborative conferencing options available to LTER Network ...
Collaborative conferencing options available to LTER Network ...
 
IBM VideoCharger and Digital Library MediaBase.doc
IBM VideoCharger and Digital Library MediaBase.docIBM VideoCharger and Digital Library MediaBase.doc
IBM VideoCharger and Digital Library MediaBase.doc
 
Voice, Video and LAN solutions for the Educational Market
Voice, Video and LAN solutions for the Educational MarketVoice, Video and LAN solutions for the Educational Market
Voice, Video and LAN solutions for the Educational Market
 

Semelhante a Thesis arpan pal_gisfi

Монетизация сетевой инфраструктуры
Монетизация сетевой инфраструктурыМонетизация сетевой инфраструктуры
Монетизация сетевой инфраструктурыBAKOTECH
 
New coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsNew coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsTouradj Ebrahimi
 
Delivering Great WebRTC on Mobile Devices
Delivering Great WebRTC on Mobile DevicesDelivering Great WebRTC on Mobile Devices
Delivering Great WebRTC on Mobile DevicesWeemo, Inc.
 
Advantage of IP system & Panasonic Security_Ver1.ppt
Advantage of IP system & Panasonic Security_Ver1.pptAdvantage of IP system & Panasonic Security_Ver1.ppt
Advantage of IP system & Panasonic Security_Ver1.pptPawachMetharattanara
 
Gathering of State Networks
Gathering of State NetworksGathering of State Networks
Gathering of State NetworksVideoguy
 
Radvision webinar: Making Real Time Video Work Over The Internet
Radvision webinar: Making Real Time Video Work Over The InternetRadvision webinar: Making Real Time Video Work Over The Internet
Radvision webinar: Making Real Time Video Work Over The InternetRADVISION Ltd.
 
AETA_2004_VOIP.ppt
AETA_2004_VOIP.pptAETA_2004_VOIP.ppt
AETA_2004_VOIP.pptVideoguy
 
Video Coding for Large-Scale HTTP Adaptive Streaming Deployments: State of th...
Video Coding for Large-Scale HTTP Adaptive Streaming Deployments: State of th...Video Coding for Large-Scale HTTP Adaptive Streaming Deployments: State of th...
Video Coding for Large-Scale HTTP Adaptive Streaming Deployments: State of th...Alpen-Adria-Universität
 
Monitoring whole mpeg transport stream
Monitoring whole mpeg transport streamMonitoring whole mpeg transport stream
Monitoring whole mpeg transport streamVolicon
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsQualcomm Research
 
JP Keynote Nikkei Embedded Processor Symposium 2002
JP Keynote Nikkei Embedded Processor Symposium 2002JP Keynote Nikkei Embedded Processor Symposium 2002
JP Keynote Nikkei Embedded Processor Symposium 2002Lee Flanagin
 
Effective and Secure Scheme for Video Multicasting using Real Time Transport ...
Effective and Secure Scheme for Video Multicasting using Real Time Transport ...Effective and Secure Scheme for Video Multicasting using Real Time Transport ...
Effective and Secure Scheme for Video Multicasting using Real Time Transport ...IRJET Journal
 
The impact of jitter on the HEVC video streaming with Multiple Coding
The impact of jitter on the HEVC video streaming with  Multiple CodingThe impact of jitter on the HEVC video streaming with  Multiple Coding
The impact of jitter on the HEVC video streaming with Multiple CodingHakimSahour
 
Live, Low Delay, High Quality – How?
Live, Low Delay, High Quality – How?Live, Low Delay, High Quality – How?
Live, Low Delay, High Quality – How?Bitmovin Inc
 
Optimizing Real Time Interactive Video Delivery from the Cloud
Optimizing Real Time Interactive Video Delivery from the CloudOptimizing Real Time Interactive Video Delivery from the Cloud
Optimizing Real Time Interactive Video Delivery from the CloudIMTC
 
Comparisons of QoS in VoIP over WIMAX by Varying the Voice codes and Buffer size
Comparisons of QoS in VoIP over WIMAX by Varying the Voice codes and Buffer sizeComparisons of QoS in VoIP over WIMAX by Varying the Voice codes and Buffer size
Comparisons of QoS in VoIP over WIMAX by Varying the Voice codes and Buffer sizeEditor IJCATR
 

Semelhante a Thesis arpan pal_gisfi (20)

Монетизация сетевой инфраструктуры
Монетизация сетевой инфраструктурыМонетизация сетевой инфраструктуры
Монетизация сетевой инфраструктуры
 
UAIPD 13-0020
UAIPD 13-0020UAIPD 13-0020
UAIPD 13-0020
 
New coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsNew coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metrics
 
Delivering Great WebRTC on Mobile Devices
Delivering Great WebRTC on Mobile DevicesDelivering Great WebRTC on Mobile Devices
Delivering Great WebRTC on Mobile Devices
 
Advantage of IP system & Panasonic Security_Ver1.ppt
Advantage of IP system & Panasonic Security_Ver1.pptAdvantage of IP system & Panasonic Security_Ver1.ppt
Advantage of IP system & Panasonic Security_Ver1.ppt
 
Gathering of State Networks
Gathering of State NetworksGathering of State Networks
Gathering of State Networks
 
Radvision webinar: Making Real Time Video Work Over The Internet
Radvision webinar: Making Real Time Video Work Over The InternetRadvision webinar: Making Real Time Video Work Over The Internet
Radvision webinar: Making Real Time Video Work Over The Internet
 
AETA_2004_VOIP.ppt
AETA_2004_VOIP.pptAETA_2004_VOIP.ppt
AETA_2004_VOIP.ppt
 
Video Coding for Large-Scale HTTP Adaptive Streaming Deployments: State of th...
Video Coding for Large-Scale HTTP Adaptive Streaming Deployments: State of th...Video Coding for Large-Scale HTTP Adaptive Streaming Deployments: State of th...
Video Coding for Large-Scale HTTP Adaptive Streaming Deployments: State of th...
 
Monitoring whole mpeg transport stream
Monitoring whole mpeg transport streamMonitoring whole mpeg transport stream
Monitoring whole mpeg transport stream
 
Intro to Video Conferencing
Intro to Video ConferencingIntro to Video Conferencing
Intro to Video Conferencing
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecs
 
Ip Cam
Ip CamIp Cam
Ip Cam
 
JP Keynote Nikkei Embedded Processor Symposium 2002
JP Keynote Nikkei Embedded Processor Symposium 2002JP Keynote Nikkei Embedded Processor Symposium 2002
JP Keynote Nikkei Embedded Processor Symposium 2002
 
Effective and Secure Scheme for Video Multicasting using Real Time Transport ...
Effective and Secure Scheme for Video Multicasting using Real Time Transport ...Effective and Secure Scheme for Video Multicasting using Real Time Transport ...
Effective and Secure Scheme for Video Multicasting using Real Time Transport ...
 
The impact of jitter on the HEVC video streaming with Multiple Coding
The impact of jitter on the HEVC video streaming with  Multiple CodingThe impact of jitter on the HEVC video streaming with  Multiple Coding
The impact of jitter on the HEVC video streaming with Multiple Coding
 
Live, Low Delay, High Quality – How?
Live, Low Delay, High Quality – How?Live, Low Delay, High Quality – How?
Live, Low Delay, High Quality – How?
 
R43019698
R43019698R43019698
R43019698
 
Optimizing Real Time Interactive Video Delivery from the Cloud
Optimizing Real Time Interactive Video Delivery from the CloudOptimizing Real Time Interactive Video Delivery from the Cloud
Optimizing Real Time Interactive Video Delivery from the Cloud
 
Comparisons of QoS in VoIP over WIMAX by Varying the Voice codes and Buffer size
Comparisons of QoS in VoIP over WIMAX by Varying the Voice codes and Buffer sizeComparisons of QoS in VoIP over WIMAX by Varying the Voice codes and Buffer size
Comparisons of QoS in VoIP over WIMAX by Varying the Voice codes and Buffer size
 

Mais de Arpan Pal

Mobisys io t_health_arpanpal
Mobisys io t_health_arpanpalMobisys io t_health_arpanpal
Mobisys io t_health_arpanpalArpan Pal
 
Tcs tele rehab-hod-0.4
Tcs tele rehab-hod-0.4Tcs tele rehab-hod-0.4
Tcs tele rehab-hod-0.4Arpan Pal
 
Io t standard_bis_arpanpal
Io t standard_bis_arpanpalIo t standard_bis_arpanpal
Io t standard_bis_arpanpalArpan Pal
 
Healthcare arpan pal gws
Healthcare arpan pal gwsHealthcare arpan pal gws
Healthcare arpan pal gwsArpan Pal
 
Io t of actuating things
Io t of actuating thingsIo t of actuating things
Io t of actuating thingsArpan Pal
 
Arpan pal u-world
Arpan pal   u-worldArpan pal   u-world
Arpan pal u-worldArpan Pal
 
Arpan pal csi2012
Arpan pal csi2012Arpan pal csi2012
Arpan pal csi2012Arpan Pal
 
Arpan pal ncccs
Arpan pal ncccsArpan pal ncccs
Arpan pal ncccsArpan Pal
 
Arpan pal tac tics2012
Arpan pal tac tics2012Arpan pal tac tics2012
Arpan pal tac tics2012Arpan Pal
 
Arpan pal u world2012
Arpan pal u world2012Arpan pal u world2012
Arpan pal u world2012Arpan Pal
 
Arpan pal gridcomputing_iot_uworld2013
Arpan pal gridcomputing_iot_uworld2013Arpan pal gridcomputing_iot_uworld2013
Arpan pal gridcomputing_iot_uworld2013Arpan Pal
 
Arpan pal besu
Arpan pal besuArpan pal besu
Arpan pal besuArpan Pal
 
Bitm2003 802.11g
Bitm2003 802.11gBitm2003 802.11g
Bitm2003 802.11gArpan Pal
 
Contest presentation ocr
Contest presentation ocrContest presentation ocr
Contest presentation ocrArpan Pal
 
Contest presentation epg
Contest presentation epgContest presentation epg
Contest presentation epgArpan Pal
 
Grid computing iot_sci_bbsr
Grid computing iot_sci_bbsrGrid computing iot_sci_bbsr
Grid computing iot_sci_bbsrArpan Pal
 
Euro india2006 wirelessradioembeddedchallenges
Euro india2006 wirelessradioembeddedchallengesEuro india2006 wirelessradioembeddedchallenges
Euro india2006 wirelessradioembeddedchallengesArpan Pal
 

Mais de Arpan Pal (20)

Mobisys io t_health_arpanpal
Mobisys io t_health_arpanpalMobisys io t_health_arpanpal
Mobisys io t_health_arpanpal
 
Tcs tele rehab-hod-0.4
Tcs tele rehab-hod-0.4Tcs tele rehab-hod-0.4
Tcs tele rehab-hod-0.4
 
Io t standard_bis_arpanpal
Io t standard_bis_arpanpalIo t standard_bis_arpanpal
Io t standard_bis_arpanpal
 
Healthcare arpan pal gws
Healthcare arpan pal gwsHealthcare arpan pal gws
Healthcare arpan pal gws
 
Io t of actuating things
Io t of actuating thingsIo t of actuating things
Io t of actuating things
 
Arpan pal u-world
Arpan pal   u-worldArpan pal   u-world
Arpan pal u-world
 
Arpan pal csi2012
Arpan pal csi2012Arpan pal csi2012
Arpan pal csi2012
 
Arpan pal ncccs
Arpan pal ncccsArpan pal ncccs
Arpan pal ncccs
 
Arpan pal tac tics2012
Arpan pal tac tics2012Arpan pal tac tics2012
Arpan pal tac tics2012
 
Arpan pal u world2012
Arpan pal u world2012Arpan pal u world2012
Arpan pal u world2012
 
Arpan pal gridcomputing_iot_uworld2013
Arpan pal gridcomputing_iot_uworld2013Arpan pal gridcomputing_iot_uworld2013
Arpan pal gridcomputing_iot_uworld2013
 
Arpan pal besu
Arpan pal besuArpan pal besu
Arpan pal besu
 
Bitm2003 802.11g
Bitm2003 802.11gBitm2003 802.11g
Bitm2003 802.11g
 
Contest presentation ocr
Contest presentation ocrContest presentation ocr
Contest presentation ocr
 
Contest presentation epg
Contest presentation epgContest presentation epg
Contest presentation epg
 
Embedded
EmbeddedEmbedded
Embedded
 
Grid computing iot_sci_bbsr
Grid computing iot_sci_bbsrGrid computing iot_sci_bbsr
Grid computing iot_sci_bbsr
 
Euro india2006 wirelessradioembeddedchallenges
Euro india2006 wirelessradioembeddedchallengesEuro india2006 wirelessradioembeddedchallenges
Euro india2006 wirelessradioembeddedchallenges
 
Generic mac
Generic macGeneric mac
Generic mac
 
Heig tcs
Heig tcsHeig tcs
Heig tcs
 

Thesis arpan pal_gisfi

  • 1. 0PhD. Thesis Defense – 9th April, 2013 Novel Applications for Emerging Markets Using Television as a Ubiquitous Device CTIF, Department of Electronic Systems Arpan Pal Principal Scientist and Head of Research TCS Innovation Labs, Kolkata Supervisor Prof. Ramjee Prasad
  • 2. 1 Motivation - TV as a Ubiquitous Computing Device “Ubiquitous computing enhances computer use by making many computers available throughout the physical environment, but making them effectively invisible to the user” – Mark Weiser Can TV be the Ubiquitous Screen for Home? • It is available • It is easy to use Market Facts (India) Source – IMRB, ITU, IMAI (2009, 2010, 2011)  PC – Low penetration - cost, skill and usability issues  Mobile screen - good for social networking - not for Information-heavy content  Tablets and large-screen smartphones – still costly  Television - low-cost of ownership, large screen real-estate , familiarity of usage Device Penetration (%) Internet (%) PC 6.1% 4.2% (Home) Mobile 61.4% 3.2% Television 60% (Household) ??  Internet Browser  Media Player (Audio, Video, Image)  Video Chat, SMS  Remote Healthcare  Distance Education Pilot Deployment in India and Philippines Home Infotainment Platform
  • 3. 2 User Study User survey with HIP (Pilot version of “Dialog” launched by Tata Teleservices Ltd.) was conducted among the city users in India – by TNS Sample - 50 middle-class and lower middle-class families involving 50 working adults and 50 students (12-18 years age). Confidence score measure - % of respondents responding with a score of 4 or 5
  • 4. 3 User Study Findings  Slow to very-slow internet connection affecting the user experience • Video chat could not be tested – how to do acceptable quality video chat under low-QoS networks?  Browser - Ease of use Issue and not much liking for TV-Internet blending • How to make experience acceptable through TV-Internet Mash-ups?  Preference of remote control for non-computer-savvy users and dislike of the QWERTY layout for on-screen keyboard • How to design a better text entry method using remote control?  Additional multimedia content security requirement for remote healthcare and distance education applications • How to have low computational complexity Access Control and DRM schemes capable of running on the constrained low-cost platform? Problem Statement Improving the user experience of TV as low-cost Internet-access device through  QoS-aware Video Transmissions  Low-complexity Video Security  TV context-aware Intelligent TV-Internet Mash-ups  TV Remote-based on-screen keyboard for text entry Challenges  Resource Constrained Platform  Non-Computer-Savvy Users  Low-QoS Network
  • 5. Video Chat over low-QoS Networks Low-QoS Network Resource Constrained Platform Low-cost Infotainment Platform ICT Infrastructure Limitations User Experience Requirements Challenges
  • 6. 5 Problem Definition  Video Chat • Bandwidth-hungry with real-time packet delivery requirement  2G wireless networks (CDMA1xRTT / GPRS) • Poor bandwidth and high latency – fluctuates with time and place Need for a adaptive rate control based video chat system with preference to audio State-of-the-Art Analysis  Network Condition Estimation and Adaptation on WiFi and IP networks, not on 2G wireless where the latency is higher  Adequate work not reported on low computational complexity rate control algorithms for streaming video  Adequate work not reported for optimal design for audio/video streaming systems addressing the latency and perceptual quality issue
  • 7. 6 Proposed System Contribution  Network Sensing • An experimental heuristics based mapping of effective bandwidth to probe packet delay  Adaptation • A low complexity video rate control algorithm for H.264 CBR - automatic switching between frame/MB as basic units for quantization based on video complexity • An adaptive fragmentation scheme with frame-level sequencing that minimizes packet-delay based discards, prioritizes audio and improves perceived video quality Video Chat Application Middleware Framework Rate Control - Encode Rate Control - Decode Video Encode Audio Encode Audio Decode Video Decode Network Sensing Fragmentation Re-assembly Underlying Network Stack (UDP/IP)
  • 8. 7 Results Grandma @ 30kbpsAkiyo @ 25kbps PSNR(indB) PSNR(indB) Network Type Mean (kbps) Stdev (kbps) ADSL-ADSL 596.14 203.45 Modem- Modem 26.96 19.23 Modem-ADSL 18.13 3.21 Network Sensing Adaptive Rate Control (QCIF, 5fps) Adaptive Fragmentation Feedback from 20 users about overall experience using a) Standard RTP based system b) Proposed system 100% reported better audio quality and better perceived video quality
  • 9. Low Complexity Video Security Watermarking for DRM (Education) Resource Constrained Platform Low-cost Infotainment Platform Socially Value- Adding Apps User Experience Requirements Challenges
  • 10. 9 Problem Definition Digital Watermarking Requirement  Needs to be imperceptible and robust at the same time  Needs to have low-computational complexity State-of-the-Art  Initial and classical works are on MPEG2 and not on H.264  Reported H.264 watermarking systems have high computational overhead  Reported works lack perceptual quality analysis and attack-robustness analysis . Video Encryption Requirement  Needs to have low-computational complexity, yet adequate security State-of-the-Art  Uncompressed domain encryption - high decryption computational overhead  Reported H.264 video encryption works – no focus on computational complexity  No work reported on Video Quality assessment after encryption/decryption
  • 11. 10 Digital Watermarking – Proposed System Contribution Robust, imperceptible yet low- computational-complexity H.264 watermarking system  Hash based integrity check  Embed watermark reusing the quantizer  Embedding location carefully chosen for imperceptibility Evaluation methodology for watermark attack and perceptual video quality  Peak-Signal-to-Noise-Ratio (PSNR) based Imperceptibility  Evaluation against 10 known attacks  Mean-Opinion-Score (MoS) based post-attack measurement of Video Quality, Retrieved Image Quality and Retrieved Text Quality Input video (YUV 4:2:0 format) F/ n-1 (Reference) F/ n (Reconstructe d) De-blocking Filter Entropy Encoder ReorderQT Inter (P) Intra (I) Q-1T-1 NAL + + + + Extract watermark Embed watermark Best Predictio n Mode and block size selection Text And Image Retrieved Text And Image
  • 12. 11 Digital Watermarking – Results Operation No. of Operations per GOP ADD 2779 MULTIPLY 3564 DIVIDE 1980 MODULO 3564 CONDIONAL 7524 MEMORY I/O 1584 Function CPU Mega Cycles taken per GOP (1 second) Watermark Embedding 6.8 Watermark Extraction 3.8 Computational Complexity Video Quality after Watermark Attack Video Quality Image Quality Text Quality Overall Performance against Attack AA Excellent Excellent Excellent Excellent CAA Poor Medium Bad Attack Degrades Video Quality FFA Poor Poor Bad Attack Degrades Video Quality GCA Poor Good Good Attack Degrades Video Quality GA Bad Bad Bad Attack Degrades Video Quality HEA Poor Good Poor Attack Degrades Video Quality LEA Poor Good Bad Attack Degrades Video Quality NLFA Poor Medium Bad Attack Degrades Video Quality RsA Excellent Excellent Excellent Excellent RoA Poor Excellent Excellent Attack Degrades Video Quality Perceptual Video Quality after Attacks + Retrieved Image / Text Quality (14 streams, 20 users)
  • 13. Context-aware Intelligent TV-Internet Mash-ups TV Channel Identity as Context Textual Context in Static TV pages Textual Context embedded in Broadcast Video Resource Constrained Platform Low-cost Infotainment Platform Television as the Ubiquitous Access Device User Experience Non-Computer- savvy Users Requirements Challenges
  • 14. 13 Requirement (Analog TV Context) DTH / Cable Set top Box HIP Video Capture & Context Extraction Information Mash-up Engine Television Internet RF in A/V in Video Graphics A/V Out ALPHA Blending Channel Identity – EPG, Viewership Rating Text on Static Pages – Return path on DTH Text on Broadcast Video – News Mash-up
  • 15. 14 Problem Definition TV Channel Identity  Audio watermarking and audio signature based - need content modification or non-real-time offline processing  TV channel logo detection based Reported works use PCA and ICA – computationally intensive and works only on Static, opaque and rectangular logos, not on non-rectangular / transparent / dynamic logos Text on Broadcast Video Challenge is in identifying the text against dynamically changing video background  Pixel domain and Compressed domain • Region Based (Connected Component Based (CC) and Edge Based (EB))  Different methods work for varying kinds of texts against varying video backgrounds • Need for Hybrid Approach (Region and Texture, CC and EB) • Text Area Localization and pre-processing remain the biggest challenge Text on Static Pages  Noisy Data with fixed fonts.  Efficient pre-processing techniques is the main challenge .
  • 16. 15 TV Channel Identity – Results 110 channels tested: r = 0.96 and p = 0.95. • The channel logos with very small number of pixels are missed in 1% cases. • For rest 3% misses – moving channel logo / changed logo. • 3% false positive from small size logo – removed from template • 2% false positive due to highly transparent logos TV Channel Identity Recall and Precision Text Detection on Static Pages Textual Context from Broadcast TV • 20 News Channels (5 min. duration each) • Recall of 100% and Precision of 78% for the text localization and OCR • Precision Improves to 88.57% after Heuristics-based keyword spotting • Almost 100% precision after Google Dictionary based post-processing
  • 17. Novel On-screen Keyboard Low-cost Infotainment Platform Television as the Ubiquitous Access Device User Experience Non-Computer- savvy Users Requirements Challenges
  • 18. 17 Problem Definition Requirements • To provide cost-effective and easy-to-use text-entry mechanism for accessing services like internet, email and short message services (SMS) from television. • Full-fledged separate wireless keyboard (Bluetooth or RF) is costly • Explore option of using infra-red remotes with on-screen keyboard on TV screen State-of-the-Art  Traditional “QWERTY” on-screen keyboards require a large number of keystrokes to navigate which makes it cumbersome to use.  Available on-screen keyboards do not address the usability aspect for a non- computer-savvy user.  Most of them are designed for cursor-based systems and not for traditional key- based infra-red remotes which have relatively slow response time on key press.  Insufficient user study and modeling for on-screen layouts
  • 19. 18 Proposed System Contribution  A novel formulation of on-screen layout • Significantly reduces the number of key strokes while typing (19 for QWERTY, 9 for proposed)  Formal methodology for user study evaluation • Popular Keystroke-level-model (KLM) and Goal-Operator-Methods (GOMS) model for formal evaluation  Extend the standard KLM operator set to model the remote based operations and hierarchical layouts. • A finger movement operator replacing standard pointing device operator Aa Ab Ag Ah
  • 20. 19 Results -User Study 1 (Basic Benchmarking) Users 25 users (diverse age / keyboard exposure) Tasks Users were asked to type - “The quick brown fox jumps over a lazy dog” on SMS/Email and “www.google.com” as URL Layout 1 QWERTY QWERTY vs. Layout-1 Layout 2 Layouts % improvement (Experiment) % improvement (predicted from KLM- GOMS) Layout 1 over QWERTY 44.23 45.75 Layout 2 over QWERTY 45 46.75 Layout 2 over layout1 2 1.84 Layouts % improvement (Experiment) % improvement (predicted from KLM- GOMS) Layout1 over QWERTY 42.2 35.23 Layout2 over QWERTY 43.4 37.2 Layout 2 over Layout1 3.18 2
  • 21. 20 Conclusions Motivation – Why Television  Validated through Market Data from India Background – TCS Home Infotainment Platform (HIP)  Internet Browser, Media Player, SMS, Video Chat, Remote Healthcare, Distance Education Requirement Analysis – Field Study  Challenges - Slow Internet Speed, Non-computer-savvy users, Resource Constrained Platform How to Improve user experience of TV as low-cost Internet-access device  Network-adaptive Video Rate Control and Packet Fragmentation Protocols for improved Video Chat experience  Low-computational-complexity Video Watermarking and Encryption for secure multimedia content sharing  TV context-aware Intelligent TV-Internet Mash-ups for improving the Internet Browsing Experience on TV  Remote-based on-screen keyboard for improved text entry on TV using remote control
  • 22. 21 Publications 1. Arpan Pal, M. Prashant, Avik Ghose, Chirabrata Bhaumik, “Home Infotainment Platform – A Ubiquitous Access Device for Masses”, Proceedings on Ubiquitous Computing and Multimedia Applications (UCMA), Miyazaki, Japan, March 2010. 2. Dhiman Chattopadhyay, Aniruddha Sinha, T. Chattopadhyay, Arpan Pal, “Adaptive Rate Control for H.264 Based Video Conferencing Over a Low Bandwidth Wired and Wireless Channel”, IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Bilbao, Spain, May 2009. 3. Arpan Pal and T. Chattopadhyay, “A Novel, Low-Complexity Video Watermarking Scheme for H.264”, Texas Instruments Developers Conference, Dallas, Texas, March 2007. 4. T. Chattopadhyay and Arpan Pal, “Two fold video encryption technique applicable to H.264 AVC”, IEEE International Advance Computing Conference (IACC), Patiala, India, March 2009. 5. T. Chattopadhyay, Aniruddha Sinha, Arpan Pal, Debabrata Pradhan, Soumali Roychowdhury, “Recognition of Channel Logos From Streamed Videos for Value Added Services in Connected TV”, IEEE International Conference for Consumer Electronics (ICCE), Las Vegas, USA , January 2011. 6. T. Chattopadhyay, Arpan Pal, Utpal Garain, “Mash up of Breaking News and Contextual Web Information: A Novel Service for Connected Television”, Proceedings of 19th International Conference on Computer Communications and Networks (ICCCN), Zurich, Switzerland, August 2010. 7. T. Chattopadhyay, Aniruddha Sinha, Arpan Pal, “TV Video Context Extraction”, IEEE Trends and Developments in Converging Technology towards 2020 (TENCON 2011), Bali, INDONESIA, November 21-24, 2011. 8. Arpan Pal, Chirabrata Bhaumik, Debnarayan Kar, Somnath Ghoshdastidar, Jasma Shukla, “A Novel On- Screen Keyboard for Hierarchical Navigation with Reduced Number of Key Strokes”, IEEE International Conference on Systems, Man and Cybernetics (SMC), San Antonio, Texas, October 2009. 9. Arpan Pal, Debatri Chatterjee, Debnarayan Kar, “Evaluation and Improvements of on-screen keyboard for Television and Set-top Box”, IEEE International Symposium for Consumer Electronics (ISCE), Singapore, June 2011.
  • 23. 22 Publications (contd…) 10. Arpan Pal, M. Prashant, Avik Ghose, Chirabrata Bhaumik, “Home Infotainment Platform – A Ubiquitous Access Device for Masses”, Book Chapter in Springer Communications in Computer and Information Science, Volume 75, 2010, Pages 11-19.DOI: 10.1007/978-3-642-13467-8. 11. Arpan Pal, Ramjee Prasad, Rohit Gupta, “A low-cost Connected TV platform for Emerging Markets– Requirement Analysis through User Study”, Engineering Science and Technology: An International Journal (ESTIJ), ISSN: 2250-3498, Vol.2, No.6, December 2012. 12. T. Chattopadhyay and Arpan Pal, “Watermarking for H.264 Video”, EE Times Design, Signal Processing Design Line, November 2007. 13. Arpan Pal, Aniruddha Sinha and Tanushyam Chattopadhyay, “Recognition of Characters from Streaming Videos”, Book Chapter in book: Character Recognition, Edited by Minoru Mori, Sciyo Publications, ISBN: 978-953-307-105-3, September 2010. 14. Arpan Pal, Tanushyam Chattopadhyay, Aniruddha Sinha and Ramjee Prasad, “The Context-aware Television using Logo Detection and Character Recognition”, (Submitted) Springer Journal of Pattern Analysis and Applications 15. Debatri Chatterjee, Aniruddha Sinha, Arpan Pal, Anupam Basu,, “An Iterative Methodology to Improve TV Onscreen Keyboard Layout Design Through Evaluation of user Study”, Journal of Advances in Computing, Vol.2, No.5, October 2012), Scientific and Academic Publishing (SAP), p-ISSN:2163-2944, e- ISSN:2163-2979.
  • 24. 23 Future Work Adaptive Video Chat  Network sensing via IP statistics, Support multicast Low-complexity Video Security  Extending security analysis, Audio Watermarking Internet-TV mash-up  Extend to second-screens (mobile, tablets), seamless transfer of streaming content to/from TV from/to mobile/tablets, Integrating social media into broadcast TV. On-screen Keyboard  Predictive keyboard integration, incorporating voice interfaces and gesture controls.  Explore using mobile phones as the remote control for TV. Generic  Optimization the contradicting triad of requirements in form of cost, feature and performance, in the face of ever improving hardware functionalities.
  • 25. 24 Learning from the CTIF-GISFI PhD Program  Access to excellent and state-of-the-art technical course content from Aalborg University  Very useful Research Soft Skill courses  Problem based Learning  Professional Networking  Qualitative Research Approaches  Innovation Methodology  Writing Scientific Papers  Technical Support from CTIF faculty  Culture of Practical Problem based Research  Attending relevant conferences and EU FP7 related programs  Networking with experts in related areas
  • 28. 27 Introduction Objective  Improving the user experience of TV as a low-cost Internet-access device for masses in emerging markets like India • Motivation • Background • User Study Based Requirement Analysis • Contributions 1) Improve the Quality of Experience (QoE) of Video Chat under poor network conditions using network sensing 2) Provide computationally efficient yet sufficiently secure algorithms for efficient access control and digital rights management (DRM) for sensitive multimedia content 3) Improve the experience of browsing the Internet on TV through intelligently understanding the context of the TV program being watched and blending related information from the Internet (known as TV-Internet mash-up) 4) Improve the experience of text entry on TV using the remote control through a novel design of an on-screen keyboard layout
  • 29. 28 Thesis Organization ICT Infrastructure Limitations User Experience Resource Constrained Platform Low-QoS Network Non computer-savvy users TV context-aware Intelligent TV- Internet Mash-ups Multimedia Framework for Quick Application Development Low complexity Video Encryption and Watermarking Algorithms Improved on-screen keyboard layout for text entry on TV using remote QoS-aware Video Transmission Low-cost Infotainment Platform Television as a Ubiquitous Display Device Applications of Social Value-add Chapter 2 Chapter 4 Chapter 5 Chapter 6 Chapter 3 Scientific Contributions Chapter 2 Chapter 2 Appendix A Appendix A Appendix A Engineering Contribution Requirements Challenges
  • 30. 29 Background - Home Infotainment Platform (HIP ) A/V Out A/V in  Internet Browser  Media Player (Audio, Video, Image)  Video Chat, SMS  Remote Healthcare  Distance Education Pilot Deployment in India and Philippines
  • 31. 30 • In Philippines it is launched through Smart Communications under the brand name of “SmartBro SurfTV”. HIP Deployment  Tata Teleservices Ltd. launched HIP under the brand name of “Dialog” in Tamil Nadu and West Bengal. TCS’ Home Infotainment Platform launched successfully in India and Philippines.
  • 32. 31 HIP Applications Chipset - TI Da-Vinci DM6446 (297 MHz ARM9 core and 594 MHz DSP) RAM - 256MB DDR Flash - 64MB NAND OS - Embedded Linux 2.6.x Browser - Opera for devices 9.6 with Flash Multimedia Codecs Video - H.264,MPEG1, MPEG2, MPEG4 Audio - MP3, AMR-NB, AAC, OGG, FLAC Image - JPEG
  • 33. 32 Socially Value-adding Apps – Healthcare and Education ECG Blood Pressure Monitor Pulse OxyMeter HIP Patient Records Health Center / Home Expert Doctor Digital Stethoscope
  • 34. 33 HIP Application Development Framework Network Microphone Camera A/V in Storage USB Compress Decompress Multiplex Demux Render Blend Network VGA TV Video TV Audio Headphone Storage Control APIs SRC PROC SINK Applications Application SRC PROC SINK Video from Internet Network Demux – Decompress TV Video / Audio Media player Storage Demux – Decompress TV Video / Audio Video Chat (Far View) Network Demux - Decompress TV Video / Headphone Video Chat (Near View) Camera and Microphone Compress – Multiplex Network Internet Browser Network Render TV Video Remote Healthcare USB Compress - Multiplex Network Distance Education - Lecture Recording A/V in Demux – Decompress, Remux - Compress Storage Lecture Playback Storage Demux – Decompress TV Video / Audio QA and Course Guide Network Demux Parser TV Video / Audio
  • 35. 34 Sensing of Network Condition T = average (RTT (P1, t1i), RTT (P2, t2i)) . Heuristic based mapping of Effective bandwidth of the network based on experimentation on the real network (CDMA1xRTT using Tata Docomo Photon) at different times of the day and at different places RTT(P1, t1i) RTT(P2, t2i) P1(t1i) P2 (t2i) Transmitter End T (msec) BWeff (kbps) T < 300 50 300 < T < 800 13 800 < T < 1600 4 1600 < T < 1900 2 T > 1900 1
  • 36. 35 Rate Control in Audio and Video Audio Codec (AMR-NB) • If Effective Bandwidth > 4 kbps, bit rate = 5.15 kbps, else bit rate = 4.75 kbps Video Codec (H.264) • For complex scenes, only frame level control exhausts bit budget at the initial frames as MAD increases. • MB level control makes sure that bit budget is not exhausted because the whole frame is not complex throughout. Estimation of complexity of the video scene – • Bit rate: • Mean Absolute Difference (MAD) prediction model – • Threshold for frame T (n) is defined as 80% of MADavg(n). Threshold selection done based on 20 different test sequences of different resolutions using classical decision theory. • If in a frame, MADcb > T(n), for at least one MB in a frame, the frame is declared as complex and MB is chosen as the basic unit for that frame. • If in a frame, MADcb <= T(n), for all MBs, then basic unit is chosen as frame.
  • 37. 36 Adaptive Packetization of Encoded Video/Audio • Fragment Size N = (fr+H) = 1440 bytes (optimal size through experimentation) • Transmission interval dt = (fr +H) / (1000*BWeff) • A 9-byte header is added to each video fragment.  Frame type (1 byte) – I (Independent) or P (Predictive) frame.  Total sub-sequence number (1 byte) – total number of fragments in a video frame.  Sub sequence number (1 byte) – fragment number.  Sequence number (4 bytes) – video frame number.  Video payload size (2 bytes) – video bytes in the current fragment. • Drop a frame iff a newer frame fragment arrives before all the fragments of the current frame (improvement over RTP) • Do not discard packets based on transit delay - probability of receiving good packets on a slow network is increased (improvement over TCP). • 20 msec AMR_NB audio frames - M (=10, matching 5 fps video) audio frames aggregated and sent in a single UDP fragment. • DTX (Discontinuous Transmission) enabled in the AMR-NB encode (VAD) - If the silence period is more than D seconds then the audio transmission is discontinued.  For a good channel (BWeff > 4 kbps), D > 10 seconds  For a bad channel ((BWeff <= 4 kbps), D = 3 to 5 seconds
  • 38. 37 Problem Definition Digital Watermarking Requirement  Needs to be imperceptible and robust at the same time  Needs to have low-computational complexity State-of-the-Art  Initial and classical works are on MPEG2 and not on H.264  Reported H.264 watermarking systems have high computational overhead  Reported works lack perceptual quality analysis and attack-robustness analysis . Video Encryption Requirement  Needs to have low-computational complexity, yet adequate security State-of-the-Art  Uncompressed domain encryption - high decryption computational overhead  Reported H.264 video encryption works – no focus on computational complexity  No work reported on Video Quality assessment after encryption/decryption
  • 39. 38 Watermarking Algorithm Flow Is the frame IDR? Y N N Is it an even IDR? Y CONTINUE Hash the previous GOP Check for Message Size Embed Watermark (Hash Number of previous GOP) Find location for embedding (image + data) Embed Watermark (actual message) Find location for embedding (data)
  • 40. 39 Digital Watermarking – Algorithm Details Embed information in corresponding coefficient (10th or 15th bit depending on bit-index being odd or even) Data Image Message is image or data Diagonal SB? Ab-Diagonal SB? SKIP Y YN N • HxW logo image in binary format and K byte text data information. • Total number of bits to embed N = HxW + K*8 - stored in an N byte binary array (called wn). • Wn quantized using same quantization parameter (qp) used in H.264 - quantized values stored in array (wqn ). • For each wqn, find the location of embedding inside the image - Image location mapped wqn is depicted as M(u,v), where (u,v) denotes the position in the DCT domain.
  • 41. 40 Digital Watermarking - Attacks .264 with WM .YUV .YUV .264 .YUV Retrieved Image / Text H.264 Decoder (without WM detector) H.264 Encoder (without WM embedding) H.264 Decoder (with WM detector) Measure of quality for retrieved Image / Text Original Image / Text Attack Report Generation 6. Non-linear filtering attack (NLFA) 7. Gaussian attack (GA) 8. Gama correction attack (GCA) 9. Histogram equalization attack (HEA) 10.Laplacian attack (LEA) 1. Averaging attack (AA) 2. Circular averaging attack (CAA) 3. Rotate attack (RoA) 4. Resize attack (RsA) 5. Frequency filtering attack (FFA), Video Quality Comparison
  • 42. 41 Watermarking Perceptual Video Quality after Attack • Ten Quality Measures - Average Absolute Difference (AAD), Mean Square Error (MSE), Normalised Mean Square Error (NMSE), Laplacian Mean Square Error (LMSE), Signal to Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR), Image Fidelity (IF), Structural Content (SC), Global Sigma Signal to Noise Ratio (GSSNR), Histogram Similarity (HS) • Three pairs of videos – a) Two identical videos (high quality), b) Two completely different videos (poor quality) and c) original video and compressed / decompressed video (average quality) • Weighted Quality Metric W_VAL = ((AAD+GSSNR+LMSE+MSE+PSNR)*3+HS+IF+NMSE+SC+SNR) • 14 test video streams were taken, subjected to different kinds to attacks and W_VAL calculated 20 users requested to judge attacked and original watermarked video based on perception. • Judgement based on human vision psychology (HVS) – converted to a fuzzy Mean Opinion Score (MOS based parameter Cqual IF (W_VAL >= 90), Cqual = Excellent ELSEIF (W_VAL >= 80), Cqual = Good ELSEIF (W_VAL >= 75), Cqual = Average ELSEIF (W_VAL >= 70), Cqual = Bad ELSE Cqual = Poor
  • 43. 42 Watermarking Retrieved Image and Text Quality Image • Normalized deviation parameter (de) from Euclidian distance d • Bit error (be) - % of bits differing between retrieved and original binary image • c - difference in crossing count of 0 to 1 of original and retrieved binary image, Crossing count error • Final error in retrieved image • Based on MOS, decision logic for the quality of the retrieved image (Cimg) IF e < 0.5 Cimg = Excellent IF 5 > e > 0.5 Cimg = Good IF 10 > e > 5 Cimg = Medium IF 15 > e > 10 Cimg = Bad ELSE Cimg = Poor Text • Compute mean error (te) of the Hamming distance and Levensthein distance • MOS based retrieved text quality Ctxt - IF te< 0.5 Ctxt = Excellent IF 1 > te >0.5 Ctxt = Good IF 3> te > 1 Ctxt = Medium IF 5> te > 3 Ctxt = Bad ELSE Ctxt = Poor
  • 44. 43 Watermarking Perceptual Video Quality - Results Attack W_VAL Cqual AA 100 Excellent CAA 52 Poor FFA 25 Poor GCA 27 Poor GA 71 Bad HEA 27 Poor LEA 28 Poor NLFA 25 Poor RsA 100 Excellent RoA 37 Poor Original Logo AA FFA GCA NLFA HEA LEA RoA RsA Original Video
  • 45. 44 Watermarking Decision Logic on Perceptual Quality after Attacks Video Quality Retrieved Image Quality Retrieved Text Quality Overall Measure of Goodness Excellent or Good Excellent Excellent Excellent Excellent or Good Excellent Good Good Excellent or Good Good Excellent Good Excellent or Good Good Good Good Excellent or Good Medium Medium Medium Excellent or Good Bad or Poor Medium Bad Excellent or Good Medium Bad or Poor Bad Excellent or Good Bad or Poor Bad or Poor Poor Medium, Bad or Poor Any Any Attack degrades video quality
  • 46. 45 Watermarking Results on Retrieved Quality after Attacks Attack be ce de e Image Quality (CImg) AA 0.000 0.000 0.000 0.000 Excellent CAA 5.469 9.896 3.448 6.271 Medium FFA 5.469 10.938 55.17223.860 Poor GCA 0.781 1.563 3.448 1.931 Good GA 4.948 9.896 24.13812.994 Bad HEA 1.563 1.563 3.448 2.191 Good LEA 1.823 2.083 0.000 1.302 Good NLFA 5.729 10.417 13.793 9.980 Medium RsA 0.000 0.000 0.000 0.000 Excellent RoA 0.781 0.521 0.000 0.434 Excellent Attack L H te Text Quality (Ctxt) AA 0 0 0.000 Excellent CAA 6 1 3.5 Bad FFA 6 1 3.5 Bad GCA 0 1 .5 Good GA 5 1 3 Bad HEA 6 7 6.5 Poor LEA 4 5 4.5 Bad NLFA 6 1 3.5 Bad RsA 0 0 0.000 Excellent RoA 0 0 0.000 Excellent
  • 47. 46 Encryption – Proposed System Contribution Low-computational-complexity two-stage H.264 video encryption algorithm  Separate Header Encryption  Reuse of the flexible macro-block re-ordering (FMO) of H.264/AVC as the encryption operator Analysis of the effect of the encryption-decryption chain on the video quality  Important from end-user experience perspective.  PSNR used as Quality Measure End of Slice? N Encode Next frame End of Sequence? Y N Encode Next MB Encode Next Slice End of Frame? Y N End FMO to Get Next MB Number Proceed to next Macroblock Y FMO Modify MB ordering using key based look-up Get next MB number
  • 48. 47 Encryption – Two Stage Algorithm Read NALU Read Control data Read Video data Read Macroblock NALU type = Control Y N SPS PPS IDR P SPS PPSP …. ….. IDRP SPS PPS P P IDR PP …. ….. PP Header Encryption • First encrypt the SPS (Seq. Param. Set), PPS (Pict. Param. Set), IDR • Encode the first frame using conventional H.264 encoder and take a 16-bit Key (KU) and • Take the length of IDR (lIDR) - It is a 16-bit number for QCIF resolution • Define encryption key value KP using a Hash function of lIDR and KU Modify macro-block ordering using key based look-up • Use KP as seed to generate random sequence Le (between 0 to 97). This is used for the first GOP. For subsequent GOPs, KP of the previous GOP is used as KU for a new KP. • MBs in an IDR frame encoded in the order specified by the look-up table Le.
  • 49. 48 Encryption – Results Security Analysis  Brute force attack – 98 possible Macroblock orders 98! = 9.42 x 10153 attempts  Actually, this is restricted by the 32 bit key used for generating the MB order - can be generated in 232 ways, requiring half that number of attempts to decrypt  Key changed every GOP - proposed method is robust enough in comparison to other reported similar methods Operation # per GOP % Increase ADD 2*24*97 + 3 = 4659 0.004 MULTIPLICATION 5*24*97 = 11,640 0.200 DIVISION 5 0 MODULO 4*24*97 + 4 = 9316 0 Resolution (wxh) Picture size in MBs Memory (bytes) QCIF (176x144) 99 198 CIF (352x288) 396 792 VGA (640x480) 1200 2400 SDTV-525 (720x480) 1350 2700 SDTV-625 (720x576) 1620 3240 Video Sequence Size / frame (in bytes) PSNR (of Y component) Without encryption With encryption Without encryption With encryption Claire 155.125 158.16 39.03 39.03 Foreman 668.975 700.3 35.14 35.16 Hall monitor 264.82 268.705 36.97 36.96 Computational Complexity Analysis Video Quality Analysis
  • 50. 49 Textual Context from Broadcast TV – Requirement ANDHRA CM’S MISSING CHOPPER MYSTERY Andhra, CM, Missing Chopper Keyword spotting Search for RSS feed containing related information Search through any Engine for related information Display Related Information on TV Missing Chopper Found, CM Dead
  • 51. 50 Textual Context in Static Pages – Proposed System for DTH Contribution Pre-processing and Enhancement  Noise Removal through low-pass filtering on Y  Resolution Enhancement through Interpolation-based zooming Binarization and Touching Character Segmentation  Adaptive Thresholding based Binarization  Touching character segmentation using width outlier detection Use standard OCR tools like GOCR and Tesseract A-priori ROI Mapping Pre-processing for noise removal and image enhancement Binarization and Touching Character Segmentation OCR using standard engines
  • 52. 51 TV Channel Identity – Results 110 channels tested: r = 0.96 and p = 0.95. • The channel logos with very small number of pixels are missed in 1% cases. • For rest 3% misses – moving channel logo / changed logo. • 3% false positive from small size logo – removed from template • 2% false positive due to highly transparent logos Recall and Precision Measures Original Channel Detected As Zee Trendz DD Ne Zee Punjabi TV9 Gujarati DD News DD Ne Nick DD Ne Nepal 1 Zee Cinema Module Time (msec) YUV to HSV 321.09 ROI mapping 0.08 Mean SAD matching 293.65 Correlation 847.55 Changed Logo Examples False Positive Examples . Computational Complexity
  • 53. 52 Textual Context from Broadcast TV– Proposed System for News Mash-up from Internet Breaking News Heuristics • Breaking news always comes in Capital Letter. • Font size of breaking news is larger than that of the ticker text • They tend to appear on the central to central-bottom part of the screen Localization of suspected text regions Text region confirmation using Temporal Consistency Binarization Text Recognition Keyword Selection Contribution  An improved method for text region localization and screen layout segmentation  Pre-processing techniques on the text region (same as previous section)  Heuristics based keyword spotting algorithm followed by Google’s in built dictionary-based correction
  • 54. 53 Imag e Output of GOCR Output of Tesseract After Applying Proposed Algorithms GOCR Tesseract (a) Sta_ring Govind_. Reem_ _n. RajpaI Yadav. Om Puri. Starring Guvinda, Rcema Sen, Raipal Yadav, Om Puri. Starring Govind_. Reem_ _n. RajpaI Yadav. Om Puri. Starring Guvinda. Reema Sen, Raipal Yadav. Om Puri. (b) _____ ___ ___ _________ ____ __ __ Pluww SMS thu fnlluwmg (adn In 56633 ___ SMS th_ folIcmng cod_ to S__ Planta SMS tha Iullmmng mda tn 56633 (c) SmS YR SH to SMS YR SH in 56633 SmS YR SH to _____ SMS YR SH to 56533 (d) _m_ BD to _____ SMS BD to 56633 SMS BD to S____ SMS BD to 56633 (e) AM t___o_,_b _q____ AM 2048eb 141117 AM tOa_gb _q____ AM 2048eb 141117 (f) _M_= _ _A___ to Sd___ SMS: SC 34393 tn 56533 _M_= _ _A___ to Sd___ SMS: SC34393 tn 56633 g) _W _ ' _b _ Ib_lb _a W6.} 048abl;lbwzIb1a ___ __Y_b yIbw_Ib_a WP 2048ab Mlbwzlb 1 a (h) ADD Ed_J to S____ ADD Eau to $6633 ADD Ed_J to S____ ADD Edu to 56633 (i) AIC STAlUSlS/OUO_ t_;OS;t_ AIC STATUS25/02/09 1 9:05:1 4 mlC S_ATUSlS/OUO_ t_;OS=tA A/C STATUS 25/02/09 1 9:05:14 (j) _ ________'__ Sub ID 1005681893 WbID_OOS_B_B__ Sub ID 1005681893 Screenshots of candidate ROI’s Accuracy of Text Detection OCR Example Results Source: Tata Sky DTH Service in India Textual Context in Static Pages – Results
  • 55. 54 Textual Context from Broadcast TV - Text Region Localization • Filter out low-contrast components - intensity based thresholding (output Vcont.). • Count the number of Black pixels in a row in each row of Vcont. • Let the number of Black pixels in ith row be defined as cntblack(i) • Compute the average ( avg black ) number of Black pixels in a row as avg 𝒃𝒍𝒂𝒄𝒌 = i=1 ht 𝒄𝒏𝒕 𝒃𝒍𝒂𝒄𝒌 i ht where ht is the height of the frame. • Compute the absolute variation av(i) in number of black pixels in a row from avgblack as av(i) = abs(cntblack(i) – avgblack) • Compute the average absolute variation ( aav ) as 𝒂𝒂𝒗= i=1 ht 𝒂𝒗 i ht • Compute the threshold for marking the textual region as 𝑻𝑯 𝒕𝒙𝒕_𝒓𝒆𝒈 = 𝒂𝒗𝒈 𝒃𝒍𝒂𝒄𝒌 + 𝒂𝒂𝒗 • Mark all pixels in ith row of Vcont as white if 𝒄𝒏𝒕 𝒃𝒍𝒂𝒄𝒌 𝒊 < 𝑻𝑯 𝒕𝒙𝒕_𝒓𝒆𝒈 Confirmation Of The Text Regions Using Temporal Consistency • Assumption that texts in the breaking news persist for some time. • Vcont sometime contains noise because of some high contrast regions in the video frame • In a typical video sequence with 30 FPS, one frame gets displayed for 33 msec. • Assuming breaking news to be persistent for at least 2 seconds, all regions which are not persistently present for more than 2 seconds can be filtered out.
  • 56. 55 Textual Context from Broadcast TV - Post- processing Heuristics • Operate the OCR only in upper case • If the number of words in a text line is above a heuristically obtained threshold value they are considered as candidate text region. • If multiple such text lines are obtained, chose a line near the bottom • Remove the stop words (like a, an, the, for, of etc.) and correct the words using a dictionary. • Concatenate the remaining words to generate the search string for internet search engine  Selected keyword can be given to Internet search engines using Web APIs to fetch related news, which can be blended on top of TV video to create a mash up between TV and Web.  Since search engines like Google already provide word correction, thereby eliminating the requirement of dictionary based correction of keywords.
  • 57. 56 Textual Context from Broadcast TV – Results Accuracy Of Text Localization • Experimental results show a recall rate of 100% and precision of 78% • The reason behind a low precision rate is tuning the parameters and threshold values in a manner so that the probability of false negative (misses) is minimized. • The final precision performance can be only seen after applying text recognition and keyword selection algorithms Accuracy Of Text Recognition • in case of false positives a number of special characters are coming as output of OCR. • So the candidate texts having special character/ alphabet ratio > 1 are discarded. • Moreover proposed keyword detection method suggests that concentrating more on capital letters. • So only the words in all capitals are kept under consideration. • It is found that character level accuracy of the selected OCR for those cases in improves to 86.57%. Accuracy of information retrieval • Limitations of the OCR module can be overcome by having a strong dictionary or language model. • But in the proposed method this constraint is bypassed as the Google search engine itself has one such strong module. • So one simply gives the output of OCR to Google search engine and in turn Google gives the option with actual text
  • 58. 57 Proposed Algorithm for Keyboard Layout Algorithm • Total No. of Character Cells = T • Total no. of rows of key-blocks = R • Total no. of columns of key-blocks = C • Total no. of Cells in a key-block = 4 • T = R x C x 4, Max. Keystrokes in worst case K = (R+C+1) keystrokes. • Hence, the desired solution boils down to finding R and C for which K is minimum. Start numCharsSqrt = Square root of (T) sqRootInt = Ceiling of (numCharsSqrt) sqRootInt even no. ? N=sqRootInt N=sqRootInt + 1 Yes No N2= N - 2 C= N/ 2 (N* N2)>= T R = N2/2 R = C Yes No output R,C Stop Input T Example • “QWERTY” - T=54 : 4 rows, 14 columns (14+4=18) keystrokes max. • “PROPSED” - T=54, sqrtInt = 8, N=8., N2=6, C=4, (N*N2=48 < 54), R = C = 4. Max 9 keystrokes • Final Layout used - T=48, sqrtInt = 7, N=8, N2=6, C=4,. (N*N2=48 == T=48) , R = N2/2 = 3. Finally, R=3 and C=4. Max 8 keystrokes
  • 59. 58 Onscreen Keyboard Results – User Study 1 QWERTY vs. Layout-1 (after practice) QWERTY vs. Layout-1 (before practice) 1. Does the on-screen keyboard provide enough assistance? 2. How is the ease of use of the on- screen keyboard?
  • 60. 59 Results – User Study 2 (KLM-GOMS Modelling) Layouts % improvement (Experiment) % improvement (predicted from KLM- GOMS) Layout 1 over QWERTY 44.23 45.75 Layout 2 over QWERTY 45 46.75 Layout 2 over layout1 2 1.84 Layouts % improvement (Experiment) % improvement (predicted from KLM- GOMS) Layout1 over QWERTY 42.2 35.23 Layout2 over QWERTY 43.4 37.2 Layout 2 over Layout1 3.18 2 • A total of 20 users • Simple Text Entry and Email Sending Task. • Six phrase sets selected randomly from standard MacKenzie’s phrase set. • Users were given initial familiarization phrase and then asked to enter six phrases at one go. • Time taken by each user and the number of keystrokes required to type the phrase were recorded. Simple Text Entry Complete Email Typing and Sending Task
  • 61. 60 Onscreen Keyboard Results – User Study 2 P - redefined as the total time taken in finding a key and moving the focus to select the block containing that particular key. Layout 1 – 1.77 sec, Layout 2 – 1.73 sec and QWERTY – 1.10 sec H – not used New parameter F - the time required for finger movement – 0.22 sec . Op erators Description Time in sec P Pointing a pointing device 1.10 K Key or button press 0.20 H Move from mouse to keyboard and back 0.40 M Mental preparation and thinking time. 1.35 Operations Time for Layout-1 in sec Time for Layout- 2 in sec Time for QWERTY in sec Open/close onscreen keyboard layout. 0.4 0.4 0.4 Find any key 1.07 1.03 1.1 Move focus to select a key 0.7 0.7 2.6 Move finger to the corner keys 0.2 0.2 0.2 Enter a character using keyboard 2.17 2.13 4.0 Sub- goals Time for layout 1 in sec Tim e for layout 2 in sec Time for QWERTY in sec Open browser 0.5 0.5 0.5 Open gmail server & login 45.3 44.5 82.1 Compose mail 68.0 65.8 115.1 Dispatch 0.4 0.4 0.4 KLM-GOMS Operators KLM-GOMS sub-goals for Email Task