1. A Generalized Multidimensional Index Structure
for Multimedia Data to Support Content-Based
Similarity Searches in a Collaborative
Environment
Kasturi Chatterjee
Distributed Multimedia Information Systems Laboratory
School of Computing and Information Sciences
Florida International University
2. Committee Members
• Dr. Shu-Ching Chen (Advisor)
• Dr. Jainendra K. Navlakha
• Dr. Xudong He
• Dr. Keqi Zhang
• Dr. Mei-Ling Shyu
2
3. Acknowledgment
School of Computing and Information Sciences
Continuing Graduate Assistantship (GA, RA)
Awards recognizing research
Florida International University
Dissertation Year Fellowship
Travel Grants (GSA)
Members of DMIS Lab
SCIS staffs
Special thanks to Olga
3
4. Outline
i. Motivation
ii. Contributions
a. Generalized Index Structure
b. Query Refinement
c. Visualizing & Analyzing Multimedia
Semantic Relationships in
Collaborative Environments
iii. Discussions
iv. Future Direction
4
5. What is so special about
multimedia data?
i. Expressive
ii. Attractive
Which medium is more
helpful?! 5
6. Everything comes at a price
i. Multidimensional
Representation
ii. Perception Subjectivity
iii.Semantic Gap
Very different from
traditional data!
6
7. Multidimensional
Representation
Imag
e
Y
Z
Apply feature extraction
(HSV color space) <3.5,0,8>
X
(0.1602,0.0818,0.0405,0.0536,0.0685,0.0667,0,0,0.0287,0,0,0)
black red yellow green blue purple
white red-yellow yellow- green- blue-
green blue purple
purple-red
7
10. Semantic Gap
Similar feature
representation
Very different
semantic
information
10
11. Are existing DBMS frameworks
able to handle Multimedia Data?
A Typical Query
Traditional alpha-numeric Multimedia queries
queries
SELECT image FROM table
SELECT studentName FROM
WHERE red „is-close-to‟ 0.245
table WHERE studentAge >
AND black „is-close-to‟ 0.356
20 AND studentMajor =
AND red-yellow „is-close-to‟
„Computer Science‟;
0.5672 AND …….. AND
semanticInterpretation =
„something‟….etc.
11
12. Communication Manager
What is missing? Application Front Ends
SQL Interface
SQL Compiler/Interpreter
i. Suitable data organization (index
structure)
Query Evaluation Engine
Query Query Query
Optimizer Processor Evaluator
ii. Suitable query handling
Catalog
Manager
Transaction
Manager
Lock
manager
Buffer
Manager
Access Structure
iii. Suitable handling of semantic contents
Recovery
Manager Manager
Storage
Manager
Index Structure
Index Access
12
13. Outline
i. Motivation
ii. Contributions
a. Generalized Index Structure
b. Query Refinement
c. Visualizing & Analyzing Multimedia
Semantic Relationships in
Collaborative Environments
iii. Discussions
iv. Future Direction
13
14. Generalized Index Structure
GeM-Tree [chat09c]
Expectations
i. Provide a single framework to
manage different types of
multimedia data
separate index structures for different
data types are inefficient to embed
into the database kernel
14
15. Generalized Index Structure
GeM-Tree
Expectations
ii. Accommodate varied
Multidimensional
Representation
existing multidimensional
existing index structures index structures cannot
for database kernels are handle retrieval
mostly single-dimensional requirements of multimedia
data
plethora of feature
representations call for a
flexible structure 15
16. Generalized Index Structure
GeM-Tree
Expectations
iii.Accommodate CBR of individual data
type along with concept retrievals
involving cross-similarity between
multimedia data
query handling need to existing index structures
consider low-level features & cannot handle such retrieval
semantic-information approaches
16
17. What has been done so far
First generation Multi-dimensional index
index structures structures
Feature- Distance-
B-Tree [1]
• tree-based index Based Based
structure • feature space • metric-space
• single-dimensional indexed based formed from the
• currently used in on feature distances
relational databases dimension between data
• KDB-Tree [2], objects is
R-Tree[3], indexed
Hybrid-Tree[4] • M-Tree [5], VP-
Tree[6] 17
18. KDB-Tree
3 4 7 8
F I
12345678
G H J K
N
D A 1234 5678
L O
C M
12 34 56 78
E B
T P Q
1 2 5 6
DE ABC FGH IJ ST PQR KLM NO
S R
18
19. VP-Tree
I
J
Data Space
E
Partition for VP-Tree
H B
V (A,B,C,D) closest to V
A C
D (E,F,G,H) next close
G F
(I,J,K) farthest
K
19
20. Issues?
Feature-Based Indexes Distance-Based Indexes
Semantic Information during CBR
low-level feature values no existing semantics capturing
correlated to semantics model embedded into search
queries
Different data types
none designed for handling
videos/documents
Seamless solution
none designed to handle multiple data types from a
single framework 20
21. GeM-Tree
how does it accomplish the goals?
Expectation I
Provide a single framework to manage
different types of multimedia data
Using a data-signature to
represent multimedia data
objects
F image ( x 1 , x 2 ,......... , x i ) , ( 0 , 0 , 0 ,......., 0 ) , (1, 0 , 0 ) ,1
Image part: FA = {x1 ,x2 ,…….,xi F A F B F C
}
Video part: FB = {y1,y2,…….,yj}
Ids: FC = {object_id, v_id,
s_id} F shot ( z 1 , z 2 ,......... , z i ) , ( y , y ,...., y ) , (1,1, 0 ) ,1
j 1 2
F A
F B
21F C
22. GeM-Tree
how does it accomplish the
goals?
Expectation II
Accommodate varied Multidimensional
Representation
Using Earth Mover‟s Distance (EMD) to calculate (dis)similarity
• Derived from Monge-Kantorovich, a transportation problem
• Calculates distance between 2 distributions
• Distributions can be of variable lengths
K ,n
Given two distributions
x X ,w D
K ,m
andY , u D
y , a flow between x
and is y aFmatrix R
f ij
mxn
, find a flow that minimizes the overall
m n
flow, W ork x, y, F d ij f ij
i 1 j i
m n m n
EMD x , y d ij f ij
f ij 22
i 1 j i i 1 j i
EMD is calculated by:
23. GeM-Tree
how does it accomplish the
goals?
Expectation III
Accommodate CBR of individual data type along
with concept retrievals involving cross-similarity
between + EMD +
data-signature multimedia data Affinity Relationship[8][9]
a stochastic construct called
Markov Model Mediator [12]
extended into HMMM for videos
determines the closeness of
two multimedia objects (affinity)
by following the access patterns
“more frequently two objects
are accessed together, greater is
their semantic closeness/affinity”
23
24. How GeM-Tree supports CBR
Range Search: select all the appropriate
database objects within a given range
from the query
k-NN Search: search the entire
database to select k database objects
most similar to the query
if ((d(Findex_object, Fquery) <= dk) && (A(data, query) >= affinityk ))
add index_object to priority queue;
update dk and affinityk;
else
check next index_object from priority queue;
24
25. How GeM-Tree supports cross-
multimedia similarity search
Low-level Similarity High-level Similarity
Euclidean distance between HMMM [9] framework is
F of data objects take care of traversed
the image and video (upwards/downwards)
components according to the information
gathered from FC part
FC={object_id, v_id,
s_id} 25
26. Performance of GeM-Tree
Index structure Index structure
handling only handling only
images videos
Query # of Distance Computations Accuracy
GeM AH HAH Seq GeM AH HAH Seq
Only 98 80 X 147 90% 93% X 98%
Image
Only 63 X 50 147 90% X 91% 95%
Video
Mixed 80 X X 147 80% X X 90%
Types
26
27. Performance of GeM-Tree
Capability of handling variable-length features and
supporting queries such as region-based/object-
based queries
Distance Computing during Developing Index
Structure
Data Type GeM-Tree
Only Images 145
Only Videos 240
Both 960
27
28. Outline
i. Motivation
ii. Contributions
a. Generalized Index Structure
b. Query Refinement
c. Visualizing & Analyzing Multimedia
Semantic Relationships in
Collaborative Environments
iii. Discussions
iv. Future Direction
28
29. What is Query Refinement
To Alleviate….
i. Number of queries in each iteration
Semantic Gap
increases
Perception Subjectivity
ii. High-level semantic requirement of the
userFuzziness of multimedia query
is modified
29
30. Where do we stand?
Existing Query Refinement Models
for Index Structures [7]
attempts to capture user requirements by
ONLY adjusting the inter and intra-level
feature-weights
30
31. Query Refinement in GeM-Tree
Requirement I
Number of queries in each iteration
increases
i. Introduces the concept of multi-point
query
ii. Modifies the (dis)similarity computation
approach
n 2
D IS T M U L T I ( Q , O ) i 1
Wi | C Fi | r
31
32. Query Refinement in GeM-Tree
Requirement
II
High-level semantic requirement of the user is
modified
i. Introduces affinity update method
aff m , n t 1 x1 x ( access t 1
1)
ii. Embeds semantic information into the
index structure considering multi-point
query
n
m ax i 1
(m ax( affinity a , q i , affinity b , q i ), m ax( affinity a , q i 1 , affinity b , q i 1
))
32
33. Evaluation
Evaluation score proposed to compare the
utility of different multimedia data management
frameworks
T T m in F Fm ax
M odel _ Score (1 ) x (1 | |)
n 2 n 2
3x ( i 1
(Ti T m in ) ) n 3x ( i 1
( Fi Fm ax ) ) n
• Compares based on both computation time and
accuracy
• One can be improved at the cost of other
• A balance is necessary 33
36. Outline
i. Motivation
ii. Contributions
a. Generalized Index Structure
b. Query Refinement
c. Visualizing & Analyzing Multimedia
Semantic Relationships in
Collaborative Environments
iii. Discussions
iv. Future Direction
36
37. Why?
~ 400 million
users *
Collaborative Environment
Explosion of social network applications
Multimedia Data an important communication
medium shared
Data management no longer an isolated task
youtube video
The way a multimedia data is used in a
social network can be used to generate
A Multimedia Data Network
* http://www.facebook.com/press/info.php?statistics 37
38. Multimedia Data Network
Multimedia Data shared/accessed
among a particular user group can nodes
form a social network
Each data object acts as edges
an actor (node)
Their relationship the
link (edge)
38
39. What kind of relationship?
The edges defining the relationships
vary with applications
Want to utilize information for
User behavior collected for over 5 years
customizing Multimedia Database
using Multimedia Retrieval Application
developed at DMIS strategies Dataset
Management for COREL
having 10,000 images
Used semantic similarity, as
perceived and reported by users, as
the relationship
39
40. Multimedia Data Network for
10,000 images
How the relationship information was presented
before
affinity.txt
1 2 ……………………… 10000
1 24 34 ……………………… 0
2 12 0 ……………………… 45
3 ………………………………………………………….
4 ………………………………………………………….
. ………………………………………………………….
. ………………………………………………………….
. ………………………………………………………….
. ………………………………………………………….
. ………………………………………………………….
. ………………………………………………………….
. ………………………………………………………….
10000 ………………………………………………………….
40
41. Multimedia Data Network
Characteristics of the generated
network structure
A weighted Disconnected Graph Structure
Large Size
Visual Interpretation/Analysis becomes
challenging
41
42. Graph Preview
Solution
Approach
Reduce number of
nodes
Maintain network
characteristics
Maximize similarity
between original
and represented
networks 42
43. Existing Approaches
Using
semantic
information Identifying
associated disjoint
with data clusters
(content-
Using based) Represent
structural clusters as
information of glyphs or
data compound graph
(structure-
based)
Discovering Clustered
Use node
groupings/clas Graph metrics
ses in data
Layouts
43
44. Issues with Clustered Graph
representations
Determining the cluster size
Preserving overall structural
similarity/equivalence
Determining the representative
nodes
Preserving the network
characteristics
44
45. Proposed Approach
Node Filtering
Similarity Calculation
Node Assignment
Determine Metric
Graph Layout
Pick nodes Calculate Calculate Assign Generate
based on structural structural & filtered the
network and semantic nodes to representati
structure/us semantic similarity original ve graph
er choice metric nodes to
maximize
overall
similarity
45
46. Detailed Algorithm
Sample nodes to
capture overall
Step 1 network
characteristics
Node Filtering
Pick nodes
based on Select nodes
network
structure/us representing different
er choice groups in the
network
Random sampling
approaches which
preserve the
distribution
46
47. Detailed Algorithm
Step 2 Structural metrics
Determine Node Metric
Calculate • Adjacency Matrices:
structural
and edge source &
semantic edge terminus
metric
Semantic metrics
• A matrix of scores
of different
centrality values
47
48. Detailed Algorithm
y ij ( k ) xs (i ) s ( j ) ( k 1) xt ( i ) t ( j ) ( k 1)
x ij ( k ) y kl ( k 1) y kl ( k 1)
Step 3 t ( k ) i ,t ( l ) j s ( k ) i,s (l ) j
Similarity Calculation
Calculate
structural & Structural similarity
semantic • Coupled node-edge
similarity
score [11]
Semantic metrics
• Euclidean distance
between semantic
values
48
49. Detailed Algorithm
Step 4
Hungarian Algorithm
Assign
Node Assignment
filtered nodes • Pick up m nodes from
to original
nodes to the set of n nodes
maximize which maximizes the
overall total similarity score
similarity between the original
graph and the sub-
graph formed
• Assignment Problem
applying Munkres
Algorithm
49
50. Detailed Algorithm
Step 5
SPi , j
Connect node i and j with edgei,j if threshold
Max ( SPi , j k )
Generate Shortest Path
Graph Layout
the Approach
representati
ve graph • Preserve the ties
between nodes
• Consider the
overall
reach/strength of
each node
50
51. Evaluation
• Overall structural comparison
• Degree of similarity between connected nodes
(dyads)
• Using Euclidean distance between the centrality
values
What is Centrality? [10]
• Centrality measures the power/importance of a node with
respect to the entire network it belongs to
• Measure of holistic behavior of a node
M
2
c ik c jk
k 1
Ec 1 M
2
max c ik c jk 51
k 1
53. How is the Multimedia Data
Network utilized ?
• identify mutual relationships and role of a
particular
multimedia data object in a database
• design decisions of operations of the index
structures
Index structure is built on ONLY the low-level
features
Semantic relationship was introduced during
querying
No existing insertion policies consider the
53
semantic information stored in a data object
54. Insertion policies
degree centrality is defined as the
Use degree centrality number of links incident upon a node
(i.e., the number of ties that a node
has)
For a Multimedia Data Network, degree centrality
identifies the power/importance of a particular data
object in the entire network image to be inserted
node 1 node 2
insert
higher centrality
54
55. Deletion policies
Current Status
Any delete request from the users is entertained
That the user and hence the data might belong to a
collaborative environment is not considered
55
56. Deletion policies
betweenness centrality is
Use betweenness centrality defined as the number of
vertices that connect via a
particular node
For a delete request, if betweenness centrality of the
node is high, ask the user to reconsider
56
57. Outline
i. Motivation
ii. Contributions
a. Generalized Index Structure
b. Query Refinement
c. Visualizing & Analyzing Multimedia
Semantic Relationships in
Collaborative Environments
iii. Discussions
iv. Future Direction
57
58. Assumptions and Limitations
• Assumed that features used for indexing represent
the multimedia data well
• Accuracy calculations are not quantitative and it
may vary from person to person
• Can handle only Numeric Data
• Only Soccer videos were used as test bed, other
domains were not checked
58
59. Outline
i. Motivation
ii. Contributions
a. Generalized Index Structure
b. Query Refinement
c. Visualizing & Analyzing Multimedia
Semantic Relationships in
Collaborative Environments
iii. Discussions
iv. Future Direction
59
60. Future Direction
• Intelligent multimedia index structure optimizer
• Document indexing
• Support traditional alpha-numeric data
• Query optimizer for multimedia database
• Multimedia data management framework for
Collaborative Applications
60
61. Publications
Journals & Book Chapters
i. [chat10] Kasturi Chatterjee, Shixia Liu, Shu-Ching Chen, “Social Network Preview using Graph
Similarity,” (submitted to ACM Transactions on Information Systems), 2010.
ii. [chat09a] Kasturi Chatterjee, S. Masoud Sadjadi, Shu-Ching Chen, “A Distributed Multimedia
Data Management over Grid,” Multimedia Services in Intelligent Environments – Integrated
Systems, 2009 (in press).
iii. [chat09b] Kasturi Chatterjee, Shu-Ching Chen, “HAH-tree: Towards a Multidimensional Index
Structure Supporting Different Video Modeling Approaches in a Video Database Management
System,” IJIDS, vol. 2, no. 2, pp. 188-207, 2010.
iv. [chat09c] Kasturi Chatterjee, Shu-Ching Chen, “A Multimedia Data Management Approach
with GeM-Tree,” JMM, 2010 (in press).
v. [chat09d] Shu-Ching Chen, Min Chen, Na Zhao, Shahid Hamid, Kasturi Chatterjee, and Michael
Armella, “Florida Public Hurricane Loss Model: Research in Multi-Disciplinary System
Integration Assisting Government Policy Making,” Special Issue on Building the Next
Generation Infrastructure for Digital Government, Government Information Quarterly, Volume
26, Issue 2, pp. 285-294, April 2009.
vi. [chat 07a] Kasturi Chatterjee and Shu-Ching Chen, “A Novel Indexing and Access Mechanism
using Affinity Hybrid Tree for Content-Based Image Retrieval in Multimedia Databases,”
International Journal of Semantic Computing (IJSC), Vol. 1, Issue 2, pp. 147-170, June 2007.
61
62. Publications
Conferences
i. [chat09d] Yudan Li, Kasturi Chatterjee, Shu-Ching Chen, and Keqi Zhang, “A 3-D Traffic
Animation System with Storm Surge Response,” accepted for publication, IEEE International
Symposium on Multimedia (ISM2009), 2009.
ii. [chat08a] Kasturi Chatterjee and Shu-Ching Chen, “GeM-Tree: Towards a Generalized
Multidimensional Index Structure Supporting Image and Video Retrieval,” the Fourth IEEE
Publications
International Workshop on Multimedia Information Processing and Retrieval (MIPR2008), in
conjunction with IEEE International Symposium on Multimedia (ISM2008), 2008.
iii. [chat08c] Kasturi Chatterjee and Shu-Ching Chen, “Hierarchical Affinity-Hybrid Tree: A
Multidimensional Index Structure to Organize Videos and Support Content-Based Retrievals,”
Proceedings of the 2008 IEEE International Conference on Information Reuse and Integration
(IEEE IRI-08), 2008.
iv. [chat08d] Shu-Ching Chen, Min Chen, Na Zhao, Shahid Hamid, Khalid Saleem, and Kasturi
Chatterjee, “Florida Public Hurricane Loss Model (FPHLM): Research Experience in System
Integration,” the 9th Annual International Conference on Digital Government Research, 2008.
62
63. Publications
Conferences
v. [chat08e] Kasturi Chatterjee, Shixia Liu, and Shu-Ching Chen, “Using Graph Similarity for
Social Network Analysis,” in 6th LA Grid Summit, (First Place), 2008.
vi. [chat06a] Kasturi Chatterjee and Shu-Ching Chen, “Affinity Hybrid Tree: An Indexing
Technique for Content-Based Image Retrieval in Multimedia Databases,” in proceedings of
IEEE International Symposium on Multimedia (ISM2006), (Best Paper Award), 2006.
vii. [chat06b] Kasturi Chatterjee, Khalid Saleem, Na Zhao, Min Chen, Shu-Ching Chen, and Shahid
Hamid, “Modeling Methodology for Component Reuse and System Integration for Hurricane
Loss Projection Application,” in proceedings of IEEE International Conference on Information
Reuse and Integration (IEEE IRI-2006),2006.
63
65. References
[1] R. Bayer, “Binary B-Trees for Virtual Memory,” in ACM-SIGFIDET Workshop, San
Diego, California, Session 5B, pp. 219-235, 1971.
[2] J. Robinson, “The k-d-b-tree: A search structure for large multidimensional dynamic indexes,” in
Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data, Ann
Arbor, United States, pp. 10–18, 1981.
[3] Y. N. Peter, "Data structures and algorithms for nearest neighbor search in general metric
spaces,“ in Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, pp. 311-
321, 1993.
[4] C. Patella, et al., “M-tree: An efficient access method for similarity search in metric spaces,’’ in
Proceedings of 23rd VLDB, pp. 426-435, 1997.
[5] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” in Proc. 1984 ACM
SIGMOD International Conference on Management of Data, pp. 47-57, 1984.
[6] K. Chakrabarti, S. Mehrotra, “The Hybrid Tree: An Index Structure for High Dimensional Feature
Spaces,” in ICDE 1999, pp. 440-447, 1999.
[7] K. Chakbarti, et al., “ Efficient Query Refinement in Multimedia Databases,” in Proc. International
Conference on Data Engineering, pp. 196-200, 2000.
[8] M-L. Shyu, S-C. Chen, M. Chen, C. Zhang, and C-M. Shu, "MMM: A Stochastic Mechanism for
Image Database Queries," Proceedings of the IEEE Fifth International Symposium on Multimedia
Software Engineering (MSE2003), pp. 188-195, December 10-12, 2003, Taichung, Taiwan, ROC.
65
66. References
[9] Shu-Ching Chen, Na Zhao, and Mei-Ling Shyu, "Modeling Semantic Concepts and User
Preferences in Content-Based Video Retrieval," International Journal of Semantic Computing
(IJSC), Vol. 1, Issue 3, pp. 377-402, September 2007.
[10] L. C. Freeman, “Centrality in Social Network: Conceptual Classification,” Social
Networks, vol. 1, no. 3, pp. 215-239, 1979.
[12] L. A . Zager, et. sl., “Graph Similarity Scoring and Matching,” Applied Mathematics
Letters, vol. 21, no.1, pp. 86-94, 2007.
66
Notas do Editor
Good afternoon everybody and welcome to my Ph.D defense. In my dissertation, I proposed a “A Generalized Multidimensional Index Structure for Multimedia Data to Support Content-Based Similarity Searches in a Collaborative Environment”. Yes the title is a bit long, but this dissertation addresses some important issues related to Multimedia Data Management and I wanted to emphasize on them. I had been working on this topic for the past 5.5. years with Dr. Shu-Ching Chen at DMIS Lab.
Before starting my presentation, I wish to sincerely thank my committee members for agreeing to be a part of this.
I would also like to thank SCIS for their continuing generous support which enabled me to concentrate on my research without worrying about my financial security. Also, the numerous awards that I received from the department, acknowledging my work, motivated me and encouraged me. I also want to thank FIU for the DYF and the travel grants I received for attending two conferences. Apart from that I sincerely want to thank my lab-mates for making my work place enjoyable and always willing to help me. And definitely I want to thank the SCIS staffs especially to Olga, who made my life here in school a little easier.
Today’s presentation is going to follow this Outline. I will go over the motivation of this research in details as I sincerely believe that it is the fundamental portion of a Ph.D research. Unless you are clear and convinced why you want to work on something and why it is important to a greater scientific community, it is very difficult to remain focused for a stretch of time. Thus a solid motivation is necessary to remain motivated!Then I will go over the contributions of this research. There are three major contributions of this research: A Generalized Index Structure, specially designed to organize multimedia data, A query refinement framework and a technique to visualize and analyze the semantic relationship of multimedia data in a collaborative environment. Along with the discussions of each contribution, I will go over the existing works in each field. It will be followed by a discussions on the limitations and assumptions of the proposed framework. I will finish off this presentation with a brief discussion of the future direction of this research.So, first the motivation.
What is so special about multimedia data? For the past couple of decades, it has gained immense popularity and has become the preferred medium of communication. WHY?Check this example. On the left is a hand written recipe card of the legendary cake by your grandma that has been passed on for generations. Now, you are a novice in cooking and all you have got is this card. On the other hand, lets imagine your grandma is pretty tech savvy and have convinced your grandpa to record all her speciality cooking with special instructions. She then shares those videos with all your family members (in an effort to keep up with the cooking talent that she is convinced this family possess!!). Which would you prefer?? O.K for sentimental and keepsake reasons, the recipe card might be useful. But for someone who never did any baking, it is pretty daunting.So, multimedia data is way more expressive and thus attractive than traditional alpha-numeric data.
But as all things in this world, the special qualities possessed by multimedia data comes at a price. The three main characteristics which makes it useful yet complicated are 1)The multidimensional Representation 2)The perception subjectivity and 3)Semantic Gap. I will explain what each characteristics stand for in a bit. Overall, we can conclude that it is very different from traditional data.
A multimedia data is made up of low-level features such as color, textures, sound, etc., which make multimedia data so attractive. A feature extraction technique is used, which extracts the low-level features and represent a multimedia data as a multidimensional feature-vector. For example, applying color feature extraction to an image in the HSV color space yeilds the following feature vector (made of multiple dimensions). If projected in a multidimensional space, the multimedia data is reduced to a point. Here, an example is shown with three dimensions, as anything greater than three is rather difficult to represent and comprehend. It is worth mentioning that it is this feature-vector that is used to organize the data in the database.
A video is even more expressive than an image. Naturally, it carries more varied content and hence have a more complex representation. A video can be modeled as a hierarchical structure with each video comprised of a number of sequentially related shots and each shot comprising of a number of sequentially related frames. Also, a video is represented as a multi-modal feature vector as a single mode is unable to capture all its nuances. Thus, along with the color features (as used in image representation), it has additional features such as audio features, shot-level features etc. Any unit of the video can stored separately with its feature representation in the multidimensional feature space as described in the previous slide.After click, the animation starts. Explain the shot-level multi-modal features.
The second important characteristics of Multimedia Data is Perception Subjectivity. A single multimedia data can represent several concepts and each user might have a different interpretation. Also, the same user might think differently at different point of time, based on circumstances or cognitive mind-set up. Take these two images. Each can carry different perception. It can communicate the idea of Togetherness, Baking, Family, Quality Time etc. Same is with the picture as the right. One can think of it as Sunset, others as dolphins or a third person might think of something completely abstract which I couldn’t comprehend!
A multimedia data broadly contains two types of contents: the low-level contents (features with which you represent them) and the high-level contents (semantics/perception). Frequently, there remains a gap between these two which make their management a huge challenge. Basically, for storage purpose, we use their low-level content (it is a quantitative definitive measure), but while retrieval users are more interested in the high-level contents (semantics). Thus, when the gap is large, such organization strategies fail miserably. For example, check these two images. Both have similar low-level color content but semantically represent two hugely different concept.
So, the next question is are the existing DBMS frameworks able to handle these three atypical characteristics of multimedia data? Lets take the example of a simple query that you ordinarily issue to a database:“Select…….”Now, if we want to use the relational database to organize Multimedia Data, the query issued to retrieve a particular image might look something like:“Selec……”Obviously you can see that the existing query frameworks do not NATURALLY accommodate the requirements of multimedia retrievals. Of course, they can be adhocly made to store multimedia data, as been done today, but that will definitely not meet the quality of service that is expected.Please be noted, that I have not used “NOT POSSIBLE”, but rather used “NOT SUITABLE”. Explain on “How the multimedia data can be accommodated by the existing relational DBMS in an ADHOC manner”.
Whats missing from the existing database management frameworks?Suitable data organization (index structure)Suitable Query Handling (Content-Based Retrievals)Suitable handling of the Semantics Information carried by the multimedia dataHere is an architecture of the traditional database management framework. It can be concluded, that almost all the components need to be tailored to meet the multimedia data requirements. In this dissertation, I mostly deal with the Index Structure, as it can be considered as the most pivotal part of a successful DBMS framework.
So, lets go into the details of the contributions. First the index structure:
There are three major expectations from an index structure that is to manage multimedia data successfully:First: To provide a single seamless framework for different types of multimedia data. As you know multimedia data can be of various types, images, videos, documents (even a web-page) as each one of them have different representations and different retrieval requirements. Thus, having separate index structures for individual data type is not practical. Finally, whatever index structures you have, they need to be embedded to the database kernel and other components such as the query optimizer, query processor etc, need to be tuned according to the index structure, If you have multiple index structures, there might be conflicting tuning issues. Moreover, for answering context-based queries, where users might be interested in finding both images/videos pertaining to a particular context, cross-similarity between images and videos need to be determined.
The second expectation is to accommodate varied multidimensional representations.Why do we need it? Most of the existing index structures, embedded into the database kernel are single dimensional, hence cannot be used for multidimensional data types.Even if there are a few multidimensional index structures, they are not capable of handling the query requirements of MM data as they do not consider semantics at all.There are a plethora of feature representations. Hence you need a flexible structure that can accommodate the varied types. Otherwise the utility of an index structure will largely be challenged.
The third and final expectation, is that it should be able to accommodate the particular query methodology of MM data i.e. the content-based retrieval.For this, the query handling need to consider both the low-level and the high-level contents which the existing index structures are not designed to do.
Before going into the details of how the proposed GeM-Tree meet these expectations, lets quickly go over what has been done so far in this area. The first generation index structure, which is still largely used in most of the database management frameworks is the B-Tree and its variants. It is a single dimensional index structure and is tree-based. Then came the genre of multidimensional index structures. They can be broadly divided into two categories: Feature-Based and Distance-Based. Feature-Based index structures such as KDB-Tree, R-Tree and more recently Hybrid-tree index the multidimensional feature space that is used to represent the multimedia data. Distanced-based index structures such as M-Tree and VP-tree indexes the metric space formed from the similarity measurement between pairs of multimedia data objects. This genre of index structure is useful, because it can be used even if the feature values of data objects are not available but only their mutual (dis)similarity measurement is provided. Now both these genres of index structures are useful depending on the dataset and retrieval requirements of a particular application.
Replace the VP-tree with an M-tree description
So, if we already have so many index structures, why do we need another one? Yes, there are a few issues with the existing ones and they are pretty serious when multimedia data is concerned. First lets see how does each Multidimensional index structure performs when handling semantic relationship during CBR?For feature-based index structures, there need to be a direct correlation between the low-level features and the semantic information. Thus, if there is a semantic gap issue for a particular data set, these kind of query handling is not useful.For distance-based index structures, there is no existing semantics capturing mechanism.None of the existing index structures can handle different data types and none of them can handle different data types from one seamless framework as well.
GeM Tree is a distance-based index structure. Lets now see how it addresses each of the expectations.Provide a single framework to manage different types of multimedia data: I propose a very flexible data-signature to represent different data objects. It has three parts: an image part, a video part and an identification part. We have used only images and videos as the datasets, thus there only two parts. But, other data types can be represented as well with such signature. The image part stores the color/texture values, the video part the pixel-change, audio features and the ids, store the information about the hierarchical relationship of the data (if theres any). Lets see with examples, for images, this is how the data signature will look. The image parts have the values, the video-part has all 0s, the ids have only the identification of the particular object. As there is not hierachical/containment relationship, v_id and s_id is zero. W is the distribution weight. In the next slide, we will see what does the weight signify. For a video, it has the image part (as color/texture are common to both the types), additionally it has the video part as well as the containment relationship. Here, a shot is represented. If you recall, a video shot is contained within a video. Thus the Ids have 1, 1, 0 specifying that this particular shot has an Id 1 and it belongs to a video with id 1. If we were to represent a frame, it would have been 1,1,1…..Thus, we can see that both images and videos can be represented with a single data signature.
(ii) Accommodate varied Multidimensional RepresentationNow, to accommodate varied representation, we use earth mover’s distance as the similarity measurement metric between multimedia data objects. It calculates the distance between two distributions where distributions can be of variable lengths. Thus, you are no longer required to represent every multimedia data with similar feature distributions. This is particularly useful for multimedia retrieval strategies such as region-based/object-based retrieval where each image/video unit is represented by a varying number of regions/objects.EMD is based on transportation problem , where the amount of work needed to convert one distribution to another is optimized.
(iii) Accommodate CBR of individual data type along with concept retrievals involving cross-similarity between multimedia dataGeM-Tree addresses the third expectation of supporting CBR with the help of the Data Signature, the distance function used along with a high-level semantics capturing mechanism. The high-level semantics between multimedia data is captured using a construct called Affinity Relationship which determines the closeness of teo multimedia objects by following the access patterns. It should be pointed out here that this semantic capturing do not rely on the feature-level similarity, hence performs well for cases of semantic gap.
Thus, we see that GeM-Tree covers the three expectations quite successfully. Lets now go into a little detail on how GeM-Tree introduces retrieval techniques into its framework. Basically, a multidimensional index structure answers queries following these two strategies: range search and K-nn search. For range search, the database is searched for objects that is within a given range of the query object. For k-NN search, the entire database is searched to retrieve the k objects most similar to the query object. Of course k-nn search is a more natural extension of CBR as you cannot really expect an user to specify a particular range in the form of a numerical value. It is more convinient to search the entire database for the most similar objects. The introduction of CBR into the k-NN search is a pretty complex algorithm because you need to make sure that while you are considering both the low-level and high-level similarity, the properties of the underlying metric space (viz. positivity, symmetry and triangular equality) is not violated. However, as a simple representation, the main step of the k-nn search implementing CBR is as follows where both the low-level similarity in the form of ‘d’ (euclidean distance) and high-level similarity (affinity) between an indexed database object and the query object is considered.
Additionally, GeM-Tree supports cross-multimedia similarity search. The data signature is designed such that the Euclidean distance between them can determine the similarity between two different multimedia data types. It is proved in a lemma in the dissertation and is beyond the scope of the this presentation. The high-level similarity between between types of multimedia data is determined from the id representation of the data signature along with the HMMM model used to capture the semantic relationship. For example, you want to find the high-level similarity between a video shot and a frame. We traverse up the HMMM hierarchy to find the video-shot to which the frame belongs and compare the similarity between the two shots.
Here we represent the performance of GeM-Tree in terms of distance computation and the accuracy. It is compared with an index structure developed only for images and with another developed only for videos. We also compare it with a framework having no index structures. It can be seen that the computation overhead of GeM-Tree is slightly higher than the dedicated image/video index structures. This is because GeM-Tree need to manage two types of media and hence the variety of candidate pool is bigger and more elimination is necessary to reach to the desired objects. However, it has the added functionality of answering mixed type queries with a reasonable computation overhead. The accuracy of GeM-Tree is also acceptable, though a slightly less than the other two index structures.Seq has the highest accuracy as it scans the entire dataset to provide the query results. The high accuracy is at the cost of very high computation overhead.
Here, we demonstrate the capability of GeM-tree in handling variable-length features. This is the distance computations while building the tree with variable length features. We could not compare it with any other tree-based index structure, as there is practically none that can do so.
Next I will go over the second contribution, the query refinement.
What is a query refinement? Query refinement is necessary for multimedia data management frameworks to alleviate three major problem areas:1)The semantic Gap, 2) the perception subjectivity and 3)The fuzziness of users expressionThis is a multimedia retrieval application (do not have a database management). Users submit a query, the system gives back a set of results. Not all results are related to the user query. The user is then given a chance to refine his/her requirement by result images as positive or negative. He then resubmits the query and the system gives back the result in the next iteration considering the user’s feedback. Thus two things happen here:Number of queries increase in each iteration as users marks positives. For each subsequent iteration, the system should consider the original submitted query along with the positives.The semantic requirement of the user is redefined. Thus, as there is a modification in the query representation as well as the semantic requirements, the index strcture need to accommodate these dynamic changes.
Are there any existing query refinement techniques implemented by the multidimensional index structures?Yes, there are query refinement models for feature-based index structures where tries to capture the user requirements by adjusting the intra-inter feature weights. That is again it tries to find a correlation between low-level features and high level semantics. The approach has two major drawbacks:If there is a semantic gap, it remainsIt cannot be utilized for distance –based index structures as it has been seen that such inter and intra feature weights violates the metric property of triangular inequality.
GeM-Tree handles the first requirement, that is increase in the number of query points in each iteration, by introducing the concept of multipoint query and modifying the distance function (necessary for calculation of similarity) accordingly.
In order to handle the refinement of the high-level semantic information, it introduces an affinity update mechanism and introduces the affinity into the index structure for the multipoint query as shown. All these equations are established with detailed lemma in the dissertation. The basic idea for the first part is that when two data objects are marked as similar by the user, the access value for the following pair is increased by one.
In order to evaluate the performance of a retrieval framework, two factors need to considered. How fast it is providing the result and how accurate is the provided result. One can be improved at the cost of other. Thus a balance is very necessary. You can increase the accuracy by evaluating more database objects (i.e. at the cost of the computation cost). Again, you can provide some sort of result by considering few objects (lowering the computation cost at the cost of accuracy). I proposed a score based on the computation cost (T) and the Accuracy (computed with the F-score).
We compared the accuracy of the proposed system, here AH-Tree refine (blue line) with three frameworks:Distance Based Multidimensional Index Structure without refinement model(yellow line)Feature based multidimensional index structure with refinement model (pink like) Naïve method (no index structure, but considering user feedback with relevance feedback)We see that Naïve has the best accuracy (as it considers all the objects) followed by our refinement model, followed by one without refinement followed by the feature based refinement. Next we evaluate the computation time…….
Thus, there are several values and it is a little difficult to evaluate the overall goodness of a particular model. Here, the evaluation score comes into handy. Computing the proposed evaluation score, we find that our proposed systems performs the best. We call the model as AH-Tree refine instead of GeM-Tree, because we considered only images.
Next we discuss the third contribution, visualizing and Analyzing Multimedia…..
Why do we need to consider a collaborative environment?With the explosion of social network application and multimedia data being the popular medium of communication, there need to be a proper way to manage this data considering the dynamic and evolving relationships.Thus, the……..Lets consider a Facebook page: frequently users share videos from Youtube with a particular group of users. With the increasing number of users on facebook, we could use this information to manage the data on youtube to provide an easy, cheap access, based on not only annotations but also on contents and user behavior.
For the multimedia Data network, each data object acts as a actor (node) and their relationships act as a relationship.
Now, the relationship considered is an interesting factor. It varies with the applications considered. For utilizing such information in the database management framework, we considered the semantic relationship, as perceived and reported by the users, as the relationship. In our case, the application is a multimedia retrieval framework in a collaborative environment. User behavior….
Previously, we had there information as a text file looking something like this:It is pretty difficult to form any overall idea about the data relationship from such information presentation. SO we form a data network such as this. It represents the semantic relationships among the different data objects in the data base.
The generated data network is a weighted graph and is disconnected in nature. It is pretty large and thus visual interpretation is challenging. To overcome the problem, a preview generation technique is proposed, which represent a large graph structure with fewer number of nodes but preserves the overall characteristics.
Thus, the Graph Preview has the following approach as just discussed.
From the literature, there has been so far only one frequently used approach, the clustered Graph Layouts.
But it has the following issues:1. Determining the …
It has the following 5 steps:
Since, we need to represent the original graph with fewer nodes, at first we filter nodes using different sampling techniques.
Next we determine the node metrices. Two types of metrices are identified: Structural metrics and semantic metrics.
In step 3, the structural similarity between the original nodes and the sampled nodes is determined using a Graph Similarity approach as shown in the following equation. It considers both the nodes as well as the edges.
In step 4, node assignment is done. That the sampled nodes are assigned to some of the original nodes in such a way so that the total similarity between is maximized. An assignment problem approach , called hungarian algorithm, is used to do so.
At the last step, in order to form the representative graph, the sampled and assigned nodes need to be connected. It is done using the shortest path approach.
To evaluate the representative graph with the original graph, an overall structural comparison is done using centrality measurements. Centrality is basically the measurement of the power/importance of individual node. Thus, it provides information about the connectivity of the individual nodes with respect to the entire network (holistic behavior). The Ec value gives a score describing the structure of a particular graph. We can then find the deviation of the scores between the original and representative graphs to find the effectiveness of the result.The denominator if the maximum possible value of the numeratorEc has a minimum value of 0: star configurationEc has a maximum value of 1: circle configuration
Ok, so now we have a multimedia data network that can be visualized and analyzed with ease. How do we utilize it in our multimedia data management framework and specially in the index structure?It should be recalled here that the index structure was built based on ONLY the low-level features. Semantic information (in the form of affinity value) was introduced during the query. The information obtained from the analysis of the multimedia data network can be used to introduce semantic relationship into the indexed metric space without violating any of its properties.
We analyze the multimedia data network based on analysis techniques of social networks. For insertion, we find the degree centrality of each element (node) of the multimedia data network. The degree centrality is defined as the number of links incident upon a node. Number of ties it has. It helps to identify the power/importance of a particular node in the entire network. For example, lets assume we have these two images in two nodes. ……..
Currently, any delete operation requested by the user is entertained, without considering what effect it might have on the subsequent qualities of the query results. Lets take this example:
Deletion policies are based on betweenness centrality measurement of the network. What is a betweenness centrality? It is defines as the number of nodes that connect via a particular node. If for a delete request, the betweenness centrality is high, ask the user to reconsider stating the effect.Other similar decisions can be formed as well by carefully analysing the data network.
Next lets see what are the limitations and assumptions of the proposed work: (here comes the achilli’s heel)
I assumed that the features used for indexing the multimedia data is sufficient. The framework has a plug in type approach. You can feed in any feature representation or any high level capturing mechanism, and it will generate results accordingly. The framework ensures that the quality of the input information is reflected in the output results.Accuracy calculations are of course subjective.Can handle onlu numeric data and no nominal data. And only soccer videos were used.
So, what is the future direction of this research?
This research was started with an envision to lay doen the foundation of a full-fledged multimedia database management framework. Thus, its far from completion. The basic and perhaps the most important part, an index structure is proposed in this research and it can be extended in the following useful directions:……