DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Elettronica: Multimedia Information Processing in Smart Environments by Alessandro Neri
1. COMLAB
Multimedia Arts & Technologies
Patrizio CAMPISI
Marco CARLI
Emanuele MAIORANA
Federica BATTISTI
MULTIMEDIA INFORMATION PROCESSING Anna Maria VEGNI
Veronica PALMA
Marco LEO
IN Mauro UGOLINI
Marina SALATINO
SMART ENVIRONMENTS Elena MAMMI
Paolo SITA’
Luca COSTANTINI
Daria LA ROCCA
Alessandro Neri
Engineering Department
University of “Roma Tre”,
Via della Vasca Navale 84, 00146 Roma, Italy
neri@uniroma3.it
2. Outline
• Introduction
• Smart Environments
• Feature Extraction
• Object recognition
• Distributed Video coding for multiple sources
• New Imaging Techniques
• Conclusions
3. SMART ENVIRONMENT
SMART ENVIRONMENT
insieme di tecnologie basate su una forte integrazione tra
• apparati sensoriali,
• sistemi distribuiti di elaborazione
• tecnologie delle comunicazioni,
che dà luogo ad ambienti (casa, ufficio, ecc.) i cui servizi si
adattano alle condizioni ambientali ed essendo in grado di
reagire opportunamente alla presenza di persone sono in grado
di produrre stimoli e interagire proattivamente con esse, ovvero
anticipandone i desideri senza una mediazione cosciente, al fine
di migliorare la qualità della vita.
4. SMART ENVIRONMENT
SMART ENVIRONMENT
insieme di tecnologie basate su una forte integrazione tra
• apparati sensoriali,
• sistemi distribuiti di elaborazione
• tecnologie delle comunicazioni,
che dà luogo ad ambienti (casa, ufficio, ecc.) i cui servizi si
adattano alle condizioni ambientali ed essendo in grado di
reagire opportunamente alla presenza di persone sono in grado
di produrre stimoli e interagire proattivamente con esse, ovvero
anticipandone i desideri senza una mediazione cosciente, al fine
di migliorare la qualità della vita.
INFORMATION PROCESSING CHAIN
Filtering & Parameter Feature Semantic
Denoising estimation extraction Analysis
5. Image Analysis
• Need for
– an efficient and parsimonious representation of the various relevant
components of a natural scene such as edges and textures (non
achievable by means of a unique, non-redundant system).
• Approach
– Adaptation of the basis to the local image contents, by selecting the
elements from an highly redundant set (wave-form dictionary)
• Critical elements
– dictionary setup
– construction of the best local representation (Minimum Description
Length).
• Objective
– local expansion
– efficiently approximated by a few wave-forms based on specific patterns
of visual relevance (edges, lines, crosses, etc.) whose scale, position and
orientation can be varied in a parametric way
6. Gauss-Laguerre Wavelets
Filters n(r, ) n = 1, k = 0 n = 2, k = 0 n = 3, k = 0 n = 4, k = 0
Real part
Imaginary
part
1.0
0.5
0.0
Test image Edges Lines Y-crosses X-crosses
7. Surround Inhibition
Input image Desired output Canny edge detector
output
• Natural images may contain both texture and noise
• Local luminance changes: strong on texture, weak on contours
• Task: suppression of edges due to noise only
• Human Visual System (HVS) easily discriminates between texture, noise and
contours
8. Multiscale Contour Detector
Output of the Canny edge detector for different scales
Destroyed junction
Restored
• Morphological dilation
• Superposition and logic AND
Fine scale (small ) Coarse scale (large )
Texture residuals Texture residuals
Well detailed contours Well detailed contours
Preserved Junctions Preserved Junctions
12. Object Recognition- Video Browsing
Image Ranked Image
Storing Collection
Query Image
Submission
Features
Extraction Image DB
Similarity Features
Features DB Measurement Extraction
13. Analisi Multiviste
Key points extraction
Key point matching (invariant with respect scale rotation perspective changes)
log2 σ
y
L. Sorgi, A. Neri. Keypoints Selection in the Gauss
Laguerre Transformed Domain - BMVC06
x
14. KEYPOINTS SELECTION: SYSTEM OUTLINE
Pre-processing
Smoothing and color
conversion
Scalogram
building
Scalogram
Keypoints scale-space inspection
location
Descriptors
construction
Descriptors
Keypoints descriptors normalization
15. Image festures
• 2D Patterns: based on Zernike polinomials expansion.
j
f x
i
x0
• Texture: Laguerre-Gauss local expansions hystograms
• Edge: relative phase of Laguerre-Gauss expansions
16. Position, orientation, and scale estimation
• Extensive retrieval experiments making use of quadtree
decomposition combined with Gauss-Laguerre CHFs, as well as on
Zernike's CHF have been performed on the Corel-1000-A Database.
• The average percentage of recovered relevant images is greater
than 0.96 while the other methods attain at the maximum 0.87 (global
search)
18. Experimental results
‘’Breakdancer’’ multiview sequence.
Source: Veronica Palma, PhD Thesis
50
48
MDVC_Zernike
46
H.264/AVC
44
Encoder driven fusion
[1]
42
PSNR (dB)
40
38
36
34
32
30
80 200 300 800
Kbit/s
[1] M. Ouaret, F. Dufaux and T. Ebrahimi, ‘’ MULTIVIEW DISTRIBUTED VIDEO CODING WITH ENCODER DRIVEN FUSION ‘’. In EUSIPCO Proceedings, 2007
[2]M. Ouaret, F. Dufuax, and T. Ebrahimi. ‘’Recent advances in multi-view distributed video coding’’. In SPIE Mobile Multimedia/Image Processing for
Military and Security Applications, April 2007.