Processing Large Complex Data

Processing
Large
Complex
Data

Social
Data
and
Mul8media
Analy8cs
for
News
and
Events

Applica8ons

Dr.
Yiannis
Kompatsiaris,
ikom@i2.gr

Mul$media,
Knowledge
and
Social
Media
Analy$cs
Lab,
Head

CERTH-‐ITI

2015
IEEE
SPS
Italy
Chapter
Summer
School
on
Signal

Processing
(S3P)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#2

Overview

•  Introduc8on

–  Mo8va8on
–
Challenges

•  Example
Use
Cases

•  Research
Approaches

–  Large-‐Scale
visual
search

–  Graphs
-‐
Community
Detec8on
-‐
Clustering

–  Social
Event
Detec8on

–  Veriﬁca8on

•  Demos
–
Applica8ons

–  MM
News
Demo

–  ClusJour

–  Thessfest

•  Evalua8on
-‐
Benchmarking

•  Conclusions

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#3

Introduc2on

Mo2va2on

Example
Applica2ons

Conceptual
Architecture

Challenges

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#4

Pope
Francis

Pope
Benedict

2007:
iPhone
release

2008:
Android
release

2010:
iPad
release

http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

hJp://www.puzzlemarketer.com/digital-‐social-‐brands-‐in-‐60-‐seconds/

(Apr,
2012)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
6

rise
of
the
networks

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Social
Networks
as
Graphs

10#
social#web#as#a#graph#
nodes&=&twi+er&users&
edges&=&retweets&on&#jan25&hashtag&
announcement&of&Mubarak’s&resigna<on&
h1p://gephi.org/2011/the7egyp9an7revolu9on7on7twi1er/#

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#8

Social
Networks
as
Graphs

“Social
networks
have
emergent

proper$es.
Emergent
proper$es

are
new
aFributes
of
a
whole
that

arise
from
the
interac$on
and

interconnec$on
of
the
parts”

•  Emo8ons,
Health,
Sexual

rela8onships
do
not
depend

just
on
our
connec8ons
(e.g.

number
of
them)
but
on
our

posi8on
-‐
structure
in
the
social

graph

–  Central
–
Hub

–  Outlier

–  Transi8vity
(connec8ons
between

friends)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Social
Networks
as
Real-‐Life
Sensors

•  Social
Networks
is
a
data
source
with
an

extremely
dynamic
nature
that
reﬂects

events
and
the
evolu8on
of
community

focus
(user’s
interests)

•  Huge
smartphones
and
mobile
devices

penetra2on
provides
real-‐8me
and

loca8on-‐based
user
feedback

•  Transform
individually
rare
but

collec2vely
frequent
media
to
meaningful

topics,
events,
points
of
interest,

emo8onal
states
and
social
connec8ons

•  Present
in
an
eﬃcient
way
for
a
variety
of

applica8ons
(news,
marke8ng,
science,

health,
entertainment)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Caption
Time
User
Profile
Favs
Comms
Tags
Social
Media
aspects

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Examples
-‐
Science

Xin
Jin,
Andrew
Gallagher,
Liangliang
Cao,
Jiebo
Luo,
and

Jiawei
Han.
The
wisdom
of
social
mulHmedia:

using
ﬂickr
for
predicHon
and
forecast,

Interna8onal
conference
on
Mul8media
(MM
'10).
ACM.

11

“…if
you're
more
than
100
km
away
from
the
epicenter

[of
an
earthquake]
you
can
read
about
the
quake
on

twiJer
before
it
hits
you…”

Many
twiJer
examples
at:
What
can
TwiJer
tell
us
about
the
real
world?
TwiJer
and
the
Real

World
CIKM'13
Tutorial,
hJps://sites.google.com/site/twiJerandtherealworld/home

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Examples
-‐
Science

12

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Examples
-‐
Science

13

Be
careful
of
correla8on
diagrams

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Example
–
News
(Boston
bombing)

#14

“Following
the
Boston
Marathon
bombings,
one
quarter
of

Americans
reportedly
looked
to
Facebook,
TwiJer
and

other
social
networking
sites
for
informa8on,
according
to

The
Pew
Research
Center.
When
the
Boston
Police

Department
posted
its
ﬁnal
“CAPTURED!!!”
tweet
of
the

manhunt,
more
than
140,000
people
retweeted
it.”

“Authori8es
have
recognized
that
one
the
ﬁrst

places
people
go
in
events
like
this
is
to
social

media,
to
see
what
the
crowd
is
saying
about
what

to
do
next”

"I
have
been
following
my
friend's

Facebook
[account]
who
is
near
the
scene

and
she
is
upda2ng
everyone
before
it

even
gets
to
the
news”

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Example
–
Crisis
–
Humanitarian
(Syria)

#15

Syria
Tracker
offers
a
crisis
mapping
system
that
uses
crowdsourced
text,
photo

and
video
reports
and
data
mining
techniques
forming
a
live
map
of
the
Syrian

conflict
since
March
2011

…stream
of

content-‐filtered

media
from

news,
social

media
(TwiJer

and
Facebook)

and
official

sources

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Events
-‐
Fes2vals

#16

http://www.eventmanagerblog.com/uploads/2012/12/event-technology-infographic.jpg

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Many
other
examples:
smellymaps

#17

Smell
related
words
in
geo-‐located
social
media

hJp://researchswinger.org/smellymaps/

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

API
Wrapper

Website
Wrapper

Scheduler

CRAWLING

Visual
Indexing

Near-‐duplicates

Text
Indexing

INDEXING

Media
Fetcher

SNA

Sen2ment
-‐
Inﬂuence

Trends
-‐
Topics

MINING

Model
Building

Concepts

Relevance

Diversity

Popularity

RANKING

Veracity

Crawling
Specs

Sources

Interac2on

Responsiveness

Aggrega2on

VISUALIZATION

Aesthe2cs

Conceptual
Architecture

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Challenges
–
Content
(Mining)

•  Mul2-‐modality:
e.g.
image
+
tags,
video,
audio

•  Rich
social
context:
spa8o-‐temporal,
social
connec8ons,

rela8ons
and
social
graph

•  Speciﬁc
messages:
short,
conversa8ons,
errors,
no
context

•  Inconsistent
quality:
noise,
spam,
fake,
propaganda

•  Huge
volume:
Massively
produced
and
disseminated

•  Mul2-‐source:
may
be
generated
by
diﬀerent
applica8ons

and
user
communi8es

•  Dynamic:
Fast
updates,
real-‐8me

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Policy
–
Licensing
–
Legal
challenges

• 
Fragmented
access
to
data

–  Separate
wrappers/APIs
for
each
source
(TwiJer,
Facebook,
etc.)

–  Diﬀerent
data
collec8on/crawling
policies

• 
Limita8ons
imposed
by
API
providers
(“Walled
Gardens”)

•  Full
access
to
data
impossible
or
extremely
expensive
(e.g.
see
data

licensing
plans
for
GNIP
and
DataSit

•  Non-‐transparent
data
access
prac8ces
(e.g.
access
is
provided
to
an

organiza8on/person
if
they
have
a
contact
in
TwiJer)

• 
Constant
change
of
model
and
ToS
of
social
APIs

–  No
backwards
compa8bility,
addi8onal
development
costs

• 
Ephemeral
nature
of
content

•  Social
search
results
oten
lead
to
removed
content
à
inconsistent

and
unreliable
referencing

• 
User
Privacy
&
Purpose
of
use

•  Fuzzy
regulatory
framework
regarding
mining
user-‐contributed
data

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#21

Example
Use
Cases

Events
and
News

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

SocialSensor
Project
Objec2ve

SocialSensor
quickly
surfaces
trusted
and
relevant
material

from
social
media
–
with
context.

DySCO

behaviour

loca8on

8me
content

usage

social
context

Massive
social
media

and
unstructured
web

Social
media
mining

Aggrega8on
&
indexing

News
-‐
Infotainment

Personalised
access

Ad-‐hoc
P2P
networks

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#23

“It has changed the way we do
news”(MSN)
“Social media is the key place for emerging stories –
internationally, nationally, locally” (BBC)
“Social media is transforming the way we do journalism”
(New York Times)
Source: picture alliance / dpa

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#24

Source:
GeJy
Images

“It’s really hard to find the nuggets of useful stuff
in an ocean of content” (BBC)
“Things that aren’t relevant crowd out the content
you are looking for” (MSN)
“The filters aren’t configurable
enough” (CNN)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Veriﬁca2on
was
simpler
in
the
past...

Source: Frank Grätz
#25

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#26

News
Use
Case
Requirements

Quickly
surface
trusted
and
relevant
material
from

social
media
–
with
context.

•  “quickly”:
in
real
8me

•  “surfaces”:
automa8cally
discovers,
clusters
and
searches

•  “trusted”:
automa8c
support
in
verifica8on
process

•  “relevant”:
to
the
specific
event

•  “material”:
any
material
(text,
image,
audio,
video
=

mul8media),
aggregated
with
other
sources
(e.g.
web)

•  “social
media”:
across
all
relevant
social
media
plaworms

•  “with
context”:
loca8on,
8me,
sen8ment,
influence

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#27

Infotainment

•  Events
with
large
numbers

of
visitors

•  Thessaloniki
Interna8onal

Film
Fes8val

–  80,000
viewers
/
100,000

visitors
in
10
days

–  150
ﬁlms,
350
screenings

•  Discovery
and
presenta8on

of
relevant
aggregated

social
media

–  Trending
Topics

–  Sen8ment

–  Tweet
–
ﬁlm
matching

–  Visualiza8on
(Social
Walls)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#28

Conceptual
Architecture
and
Main
components

SEMANTIC
MIDDLEWARE

Public

Data

SEARCH
&
RECOMMENDATION

USER
MODELLING
&
PRESENTATION

INDEXING
MINING

STORAGE

DATA
COLLECTION
/
CRAWLING

•  Real
8me
dynamic
topic

and
event
clustering

•  Trend,
popularity

and
sen8ment
analysis

•  Calculate
trust/inﬂuence

scores
around
people

•  Personalized
search,

access
&
presenta8on

based
on
social
network

interac8ons

•  Seman8c
enrichment

and
discovery
of
services

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#29

Research
Approaches

Large-‐Scale
Visual
Search

Graphs
–
Clustering/Community
Detec2on

Visual
Event
Summariza2on

Social
Media
Veriﬁca2on

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#30

Scalable
visual
feature
aggrega2on
&

indexing

•  Problem:
Example-‐based
image
search

–  Find
images
that
represent
same
or
similar
object
or
scene

with
a
given
query
image

–  Viewed
from
different
viewpoints,

occlusions,

cluJer

•  Challenge:
Large-‐scale

–  Searching
databases
with
tens
of
millions
of
images

–  Objec8ves
to
be
full-‐filed:

•  Sufficient
discrimina8ve
power

•  Fast
response
8mes

•  Efficient
memory
usage

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#31

Large-‐scale
visual
search

image
collec8on

from
social
media/

Web

image
local
feature

extrac8on

feature
aggrega8on

feature
indexing
kNN
visual

similarity
search

concept-‐based

image
annota8on

image
clustering

image
(geo)tagging

concept-‐based

search/ﬁltering

duplicate
detec2on

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#32

Framework

•  Implementa8on
and
evalua8on
of
the
effec8veness

of
VLAD
in
combina8on
with
SURF

•  Scalable
image
indexing

E.
Spyromitros-‐Xioufis,
S.
Papadopoulos,
Y.
Kompatsiaris,
G.

Tsoumakas,
I.
Vlahavas,
"A
Comprehensive
Study
over
VLAD
and

Product
Quan8za8on
in
Large-‐scale
Image
Retrieval",
IEEE

Transac8ons
on
Mul8media
16(6),
pp.
1713-‐1728,
October
2014.

image

local

descriptor

extrac8on

descriptor

aggrega8on

dimensionality

reduc8on
set
of
local

descriptors

fixed
size

vector

encoding
&

indexing

low
dimensional

vector

SIFT
/
SURF
BOW
/
VLAD
PCA

PQ
+
ADC/IVFADC

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#33

Scalable
indexing
of
features

•  ADC
16x8
requires
16
bytes
per
image

–  ~67M
images
per
GB

•  IVFADC
requires
4
addi8onal
bytes
per
image

–  ~53.6M
images
per
GB

•  In
current
implementa8on
we
achieve
only
half
of
above
numbers
due
to

using
short
int[]
instead
of
byte[],
but
possible
to
improve.

•  Ideally,
1
billion
images
could
be
indexed
on
a
server
with

20GB
of
RAM
(projec2on).

•  Query
8me
(for
1M
vectors):

–  Exhaus8ve
search
of
VLAD
vectors
(d’=128):

0.50
sec

–  Product
Quan8za8on
with
ADC
16x8:

0.10
sec
(x5
faster)

–  Product
Quan8za8on
with
IVFADC
16x8:

0.02
sec
(x25
faster)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#34

VLAD+SIFT
vs.
VLAD+SURF

Accuracy
vs.
dimensionality

•  VLAD+SURF
improves
VLAD+SIFT
and
FV+SIFT
across
all
dimensions
in

both
Holidays
and
Oxford
datasets

Results
in
rows
star8ng
with
*
are
taken
from
Jégou
et
al.,
2011,

hence
the
missing
values
for
some
entries.

SIFT
corresponds

to
PCA
reduced
SIFT
which
yielded
beJer
results
than
standard
SIFT
in
Jegou
et
al.,
2011

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#35

Clustering
–
Community
Detec2on

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

graph

G
=
(V,
E)

nodes

edges

An
abstract
data
type
represen8ng
rela8onships
or
connec8ons

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Some
Examples

Webpage
www.x.com

href=“www.y.com”

href
=
“www.z.com”

Webpage
www.y.com

href=“www.x.com”

href
=
“www.a.com”

href
=
“www.b.com”

Webpage
www.z.com

href=“www.a.com”

y

a

x

z

b

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Biology
example

Nodes
–
Proteins

Edges
–
Interac8ons

Visualiza8on
plays
an
important
role

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

blogosphere
as
a
graph

nodes
=
blogs

edges
=
hyperlinks

technical
-‐
gadgets

society
-‐
poli2cs

hJp://datamining.typepad.com/gallery/blog-‐map-‐gallery.html

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

social
web
as
a
graph

nodes
=
twirer
users

edges
=
retweets
on
#jan25
hashtag

announcement
of
Mubarak’s
resigna2on

hJp://gephi.org/2011/the-‐egyp8an-‐revolu8on-‐on-‐twiJer/

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  graphs
on
the
web
present
certain
structural

characteris8cs

•  groups
of
nodes
interac8ng
with
each
other
à

dense
inter-‐connec2ons
à

func8onal/topical
associa8ons

•  what
can
we
gain
by
studying
them?

–  topic
analysis

–  photo
clustering

–  improved
recommenda8on
methods

–  detect
inﬂuencers

emerging
structures

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Community
and
graphs

Communi8es
correspond
to
groups
of
nodes
on
a
graph
that

share
common
proper8es
or
have
a
common
role
in
the

organiza8on/opera8on
of
the
system.

S.
Fortunato,
C.
Castellano.
Community
structure
in
graphs.
arXiv:0712.2716v1,
Dec
2007.

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  Pairs
of
nodes
are
more
likely
to
be
connected
if
they
are

both
members
of
the
same
community,
and
less
likely
to

be
connected
if
they
do
not
share
communi8es.

•  explicit

–  the
result
of
conscious
human
decision

•  implicit

–  emerging
from
the
interac8ons
&
ac8vi8es
of
users

–  need
special
methods
to
be
discovered

–  Community
detec8on,
par88on,
clustering

Community
types

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  Oten
communi8es
are
deﬁned
with
respect
to
a

graph,

G
=
(V,E)
represen8ng
a
set
of
objects
(V)
and

their
rela8ons
(E).

•  Even
if
such
graph
is
not
explicit
in
the
raw
data,
it
is

usually
possible
to
construct,
e.g.
feature
vectors
à

distances
à
thresholding
à
graph

•  Given
a
graph,
a
community
is
deﬁned
as
a
set
of

nodes
that
are
more
densely
connected
to
each

other
than
to
the
rest
of
the
network
nodes.

communi2es
and
graphs

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

communi2es
and
graphs
-‐
example

inter-‐community
edge

intra-‐community
edge

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

community
arributes

overlap
weighted
par8cipa8on
roles

hierarchy
evolu8on

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  Given
nodes
u
and
v
of
graph
G
=
(V,E)
a
cut
is
a
set

of
edges
C
⊂
E,
such
that
the
two
nodes
are

unconnected
on
the
graph
G΄=
(V,E-‐C).

•  Using
s
to
denote
a
“source”
node
and
t
to
denote
a

“terminal”
node,
a
cut
(S,T)
of
G
=
(V,E)
is
a
par88on

of
V
in
sets
S
and
Τ
=
V-‐S,
such
that
s
∈
S
and
t∈T.

graph
cuts

s
t
T
S

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  A
graph
can
be
split
into
communi8es
in
numerous
ways,
i.e.

for
each
graph
there
are
many
possible
community

structures.
In
the
simple
case,
a
community
structure
is

deﬁned
as
a
graph
par88on
into
a
set
of
node
sets

C
=
{Ci}

•  To
provide
a
measure
of
the
quality
of
a
community
structure,

we
make
use
of
modularity.

•  The
modularity
maximiza8on
method
detects
communi8es
by

searching
over
possible
divisions
of
a
network
for
one
or
more

that
have
par8cularly
high
modularity.

•  Modularity
quan8ﬁes
the
extent
to
which
a
given
graph

par88on
into
communi8es
presents
a
systema8c
tendency
to

have
more
intra-‐community
links
than
the
same
community

structure
would
present
if
the
links
would
be
rewired
under

ER
(Erdos-‐Renyi)
graph
model.

Modularity
maximiza2on

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

graph
degress

deg(vi)
=
ki
=
number
of
neighbors

In
directed
graphs,
we
diﬀeren8ate
between
in-‐
and
out-‐degree.

Αij
=
link
between
nodes
i
and
j

0
à
no
link

1
à
link

α
à
link
with
weight
equal
to
α

node
degree

adjacency
matrix

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Degrees
&
Adjancency

v1
v2

v3

v4
v5

Adjacency
matrix
on
an
undirected
graph

:
A(i,j),

i,j
<=
n

degree
of
a
vertex
v

(number
of
edges
incident
upon
it):
∑=
w
v wvAk ),(

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  Modularity
is
computed
as
follows:

–  Αij:
adjacency
matrix

–  ki:
degree
of
node
i

–  ci:
community
of
node
i

–  δ(ci,cj)
=
1
if
i,
j
belong
to
the
same
community

–  m:
number
of
edges
on
the
graph

modularity
computa2on

∑ −=
ji
ji
ji
ij cc
m
kk
A
m
Q
,
),()
2
(
2
1
δ
Expected number of
edges between i and j, if
edges are placed
randomly.
Observed number of
intra-community
edges.

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  In
a
random
graph
(ER
model),
we
expect
that
any

possible
par88on
would
lead
to
Q
=
0.

•  Typically,
in
non-‐random
graphs
modularity
takes

values
between
0.3
and
0.7.

modularity
-‐
example

Q = 0.60
clear community
structure
Q = 0.37
fuzzy communities

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  Exhaus8ve
search
over
all
possible
divisions
is
usually

intractable

•  Algorithms
based
on
approximate
op8miza8on

–  greedy
algorithms

–  simulated
annealing

–  spectral
op8miza8on

–  local-‐based
op8miza8on

•  Balances
between
speed
and
accuracy

Modularity
maximiza2on
approaches

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  other
community-‐ness
measures:

–  conductance

–  density

•  deﬁni8ons
to
sa8sfy

–  each
member
should
be
connected
to
more
nodes
within

the
community
than
to
nodes
outside
it

–  each
member
should
be
connected
to
all
other
members

(k-‐clique)

•  result
of
a
process

–  if
I
start
removing
edges
with
a
certain
order,
the
graph

will
break
into
pieces
à
communi8es

other
means
to
deﬁne
communi2es

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  Given
a
graph
G=(V,E),
find
a
par88on
of
V
in
k
disjoint

subsets,
such
that
the
number
of
edges
in
Ε
of
which
the

endpoints
belong
to
different
subsets
is
minimized.

•  Various
solu8ons:
Kernighan-‐Lin
algorithm
[Kernighan70],

spectral
bisec8on
[Pothen90].

•  Mul8-‐level
par88on
(me8s)
[Karypis99]:
Repeated
applica8on

of
bisec8on
un8l
the
graph
is
par88oned
into
k
parts
under

constraint
to
the
sizes
of
the
subsets.

•  Not
sa8sfactory
solu8on,
since
the
number
of
communi8es

needs
to
be
provided
as
input
to
the
algorithm.
Some8mes

event
the
community
sizes
need
to
be
provided
as
inputs.

graph
par22on

B.
W.
Kernighan,
S.
Lin.
An
Efficient
Heuris8c
Procedure
for
Par88oning
of
Electrical
Circuits.
Bell

Systems
Technical
Journal,
Vol.
49,
No.
2,
pp.
291-‐
307,
February
1970.

A.
Pothen,
H.D.
Simon
and
K.-‐P.
Liou.
Par88oning
sparse
matrices
with
eigenvectors
of
graphs.

SIAM
journal
of
Matrix
Analysis
and
Applica8ons,
11:
430-‐452,
1990.

G.
Karypis
and
V.
Kumar,
A
fast
and
high
quality
mul8level
scheme
for
par88oning

irregular
graphs,
SIAM
J.
Sci.
Comput.
20
(1):
359–392,
1999.

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

taxonomy

S.
Papadopoulos,
Y.
Kompatsiaris,
A.
Vakali,
P.
Spyridonos.
“Community
detec8on
in
Social
Media”.
In

Data
Mining
and
Knowledge
Discovery,
Springer,
2011

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  k-‐clique

•  N-‐clique

•  k-‐core

subgraph
discovery
(structure)

1

k=3
(triangle)
k=4
k=5

N=2
(star)

0-‐core

1-‐core

2-‐core

4-‐core

3-‐core

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  (μ,ε)-‐core:

–  based
on
the
concept
of
structural
similarity

subgraph
discovery

2

(μ,ε)-‐core

μ
=
5,
ε
=
0.72

(μ,ε)-‐core

μ
=
6,
ε
=
0.675

hub

outlier

Percentage
of

common
neighbors

for
each
edge

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  Betweenness
centrality

–  Being
in
many
shortest
paths

•  Closeness

–  Being
close
to
many
nodes

•  Eigenvector
centrality

–  End
of
many
paths

•  Degree
centrality

–  High
degree

hJps://commons.wikimedia.org/wiki/File:6_centrality_measures.png#/
media/File:6_centrality_measures.png

Carlos
Cas8llo,
Social
Media
Mining
and
Retrieval,

hJp://www.slideshare.net/ChaToX/social-‐media-‐mining-‐and-‐retrieval

centrality

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  Find
edges
that
stand
between
communi8es.

•  Progressively
remove
more
“central”
edges
un8l
the

graph
breaks
into
separate

communi8es.

•  As
the
graph
spli†ng

progresses,
new
communi8es

emerge
that

are
assigned
to
a
hierarchical

structure.

•  Edge
centrality
is
deﬁned

similarly
to
node
centrality:

60

divisive
-‐
use
of
edge
centrality

Depic8on
of
node
centrality:

red
(min)
à
blue
(max)

∑ ∈
≠≠=
Vts
vts
ts
ts v
vbc
,
,
, )(
)(
σ
σ
)(, vtsσ
ts,σ
:
number
of
paths
from
node
s
to
t

that
include
node
v

:
total
number
of
paths
from
s
to
t

Betweenness centrality quantifies
the number of times a node acts
as a bridge along the shortest path
between two other nodes.

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

•  GN
algorithm
is
one
of
the
most
important
algorithms

s8mula8ng
a
whole
wave
of
community
detec8on
methods.

•  Basic
principle:

–  Compute
betweenness
centrality
for
each
edge.

–  Remove
edge
with
highest
score.

–  Re-‐compute
all
scores.

–  Repeat
2nd
step.

•  Complexity:
Ο(n3)

•  Many
varia8ons
have
been
presented
to

improve
precision
by
use
of
diﬀerent
betweenness
measures

or
reduce
complexity,
e.g.
by
sampling
or
local
computa8ons.

Girvan
-‐
Newman
algorithm

Girvan,
M.,
Newman,
M.E.J.
“Community
structure
in
social
and
biological
networks”.
In

Proceedings
of
Na8onal
Academy
of
Science,
U.
S.
A.
99(12),
7821–7826,
2002

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Girvan
-‐
Newman
(example)

Social
network
in
Zachary

karate
club

Hierarchical
community
structure

detected
by
the
algorithm.

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Visual
Event
Summariza2on
on
Social
Media
using

Topic
Modelling
and
Graph-‐based
Ranking
Algorithms

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Large-‐scale
real
world
events
(1)

•  Long-‐running
events
→
Consist
of
several
sub-‐events

e.g.
10
days
of
Sundance
Film
Fes8val
include
opening

and
awards
ceremonies,
screenings
etc.

•  A
lot
of
involved
persons
that
use
social
media
→
huge

amount
of
event-‐related
micro-‐blogging
messages

•  A
growing
number
of
these
messages
carry

mul2media
content

•  The
existence
of
an
image
in
a
micro-‐post
can
convey
a

much
beJer
impression
for
the
speciﬁc
moment
of
the

ongoing
event

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Large-‐scale
real
world
events
(2)

#nbaﬁnals
→
2.6M
tweets
in
one
month

#BaltimoreRiots 29 April-2 May 2015
à1.3M tweets in 5 days
E3 conference 2015 16-18 June
>5M tweets before conference
2M tweets during conference
new game releases à multimedia content

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Large-‐scale
real
world
events
(3)

But…

•  the
huge
number
of
messages,
makes
it
very

challenging
for
interested
users
to
monitor
the

evolu8on
of
the
event

•  many
messages
can
be
considered
as
spam
or
non-‐
informa2ve

•  In
case
of
mul8media:
internet
memes,

screenshots,
images
of
low
quality…

•  Redundancy
due
to
near
duplicate
messages
and

images

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Large-‐scale
real
world
events
(4)

#nbaﬁnals

Irrelevant
Duplicates with
no explicit
association
Non-informative

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Event
related
collec$on
is
available

Visual
Event
Summariza2on

Visual
Event
Summariza2on
is
the
problem
of
selec8ng

a
concise
set
of
images
that
are
highly
relevant
to
the

event
and
contain
visually,
the
key
aspects
of
the

event.

Event-‐based

Visual

Summarizer

List
of
all
event
images

Set
of
Selected

Representa2ve

and
Diverse

Images

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Exis2ng
Approaches:
Text-‐based

Radev
et
al.
(2004)

•  summary
consists
of
messages
that
are
closest
to
their
N·∙idf
centroid

Erkan
et
al.
(2004),
LexRank
&
Mihalcea
et
al.
(2004),
TextRank

•  finding
salient
sentences
by
using
the
centrality
of
each
sentence
in
a
similarity

graph

•  adapted
for
mul8-‐document
summariza8on
using
each
message
as
a
sentence.

•  outperforms
naïve
centroid-‐based
approach.

Shen
at
al.
(2013)

•  mixture
model
to
detect
sub-‐events
at
par8cipant
level

•  N·∙idf
centroid
to
find
a
summary
of
each
sub-‐event

Chakrabar2
and
Punera
(2011)

•  Hidden
Markov
Model
to
obtain
a
8me-‐based
segmenta8on
of
tweets

•  N·∙idf
centroid
to
find
a
summary
of
each
8me
segment

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Exis2ng
Approaches:
Mul2media

Bian
et
al.
(2013)

•  mul8modal
extension
of
LDA

•  textual
and
visual
features

Lin
et
al.
(2012)

•  mul8-‐graph
of
objects
capturing
visual,
textual
and
temporal

proximity

•  8me-‐ordered
sequence
of
important
objects
via
graph

op8miza8on

McParlane
et
al.
(2014)
–
state-‐of-‐the-‐art
baseline

•  visual
features
+
SVM
to
discard
irrelevant
images

•  clustering
in
subtopics
and
selec8on
of
popular
images
for

each
subtopic
based
on
popularity
and
speciﬁcity

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

MGraph:
Framework
Overview

1.  create
message
mul8-‐graph
using
textual,
visual
and
temporal
proximity

2.  ﬁnd
underlying
topics
using
SCAN
algorithm

3.  calculate
prior
scores
of
images
based
on
topics
and
popularity
(relevance)

4.  diversify
using
DivRank

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Pre-‐processing
/
Filtering

Text-‐based
filtering

•  heuris8c
rules
for
spam
filtering
→
discard
very
short
messages
&

messages
with
many
men8ons,
URLs
or
hashtags.

•  filtering
of
unstructured
messages
using
POS
tagging

Accept

→
(determiner?
adjec$ve*
noun+
verb)+

Visual-‐based
filtering

•  discard
small
images

•  detect
and
discard
memes,
screenshots
and
images
containing

heavy
text

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Pre-‐processing
/
Filtering

Text-‐based
ﬁltering

Visual-based filtering
Tweet length
POS tagging filtering

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Mul2-‐graph
Genera2on
(1)

Given
a
set
of
(original)
messages
M={m1,
m2,
...,
mn}
we
construct
a

mul8-‐graph
GM
=
{V,
Etextual,
Evisual,
Esocial,
E2me}

•  vertex
vi
∈
V
corresponds
to
message
mi

•  Etextual
→
undirected
edges
expressing
the
textual
similarity
(cosine

similarity)
between
nodes
(Z·∙idf
vector
vm)

•  Evisual
→
undirected
edges
that
represent
the
visual
similarity
(L2

distance)
between
nodes
with
images
(VLAD+SURF
vectors)

Thresholding:
add
an
edge
in
Etextual
or
Evisual,
only
if
the
textual
or
visual
similarity

between
the
corresponding
nodes
is
higher
than
thtextual
or
thvisual
respec8vely

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Mul2-‐graph
Genera2on
(2)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Example
mul2-‐modal
sub-‐graph

#

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Visual
deduplica2on

•  Visual
duplicates
for
which
there
is
no
explicit
connec8on
→

apply
Clique
Percola8on
Method
(CPM)
on
sub-‐graph
Gvisual
=

{V,
Evisual}

•  Represent
detected
cliques
as
single
messages:

–  VLAD
aggrega8on
on
SURF
descriptors
of
all
images
in
the
clique

–  mean
value
of
publica8on
8me

–  aggregated
value
of
reposts
of
each
message.

–  merged
w·∙idf
vector

•  Replace
clustered
messages
in
GM
with

cliques
and
re-‐calculate
the
corresponding

edges

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Visual
deduplica2on

GM
Gvisual

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Topic
Detec2on

•  Apply
Structural
Clustering
Algorithm
for
Networks

(SCAN)
→
iden8fy
dense
sub-‐graphs
of
messages
in
GM

•  Sub-‐graphs
represent
the
topics
that
exist
in
the

stream
of
messages

•  Each
topici
contains
messages
{Mi}
and
is
represented

as
a
merged
N·∙idf
vector
Vi

•  A
substan8al
amount
of
messages
is
kept
outside
of

the
detected
clusters

–  Hubs
&
Outliers
most
probably
are
non-‐informa8ve

–  May
include
valuable
informa8on
→
also
considered
in

summariza8on
process
as
single-‐item
clusters

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Message
Selec2on
Score

reposts
relevance x
cluster size
x specificity

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Specificity

High
specificity
Low
specificity

rare
across
all

topics
of
the

event

common

across

topics

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Image
Ranking
&
Diversiﬁca2on

variant
of

PageRank
aiming

diversity

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Dataset
and
Event
Descrip2on

•  dataset
of
McMinn
et
al.
having
more
than
500
events

from
diﬀerent

domains

•  we
used
the
50
largest
events
in
terms
of
tweets

•  sports
events

(e.g.,
the
Sochi
winter
Olympics),

poli8cal
events
(Ukraine

crisis,
Venezuelan
protests),

disasters,
etc.

•  364,005
tweets,
on
average
4,730
tweets/event

•  296,160
remaining
tweets,
due
to
suspended

accounts

and
deleted

messages

•  about
3,51%
of
these,
i.e.
12,772
tweets,
contain
an

embedded
image

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Relevance
Judgments

Each
image
is
shown
to
3
par8cipants
(20
img-‐20
part)
without
ranking

informa8on

Task
Descrip2on:
You
are
presented
with
an
image
and
an
event
8tle

describing
a
trending
topic
in
TwiJer.
For
each
image
and
event
8tle,
you
are

asked
to
answer
the
following
ques8on:

Is
this
image
relevant
to
the
event?

1.  The
image
is
clearly
not
relevant
to
the
event.

2.  The
image
is
probably
not
relevant
to
the
event,
but
I
am
not
en8rely
sure.

3.  The
image
is
somewhat
relevant
to
the
event,
but
I
have
my
doubts
on

whether
I
would
like
to
see
it
in
a
photo
coverage
of
the
event.

4.  The
image
is
clearly
relevant
to
the
event,
and
I
would
like
to
see
it
in
a
photo

coverage
of
the
event.

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Experimental
Se{ng

•  VLAD+SURF
extrac8on

–  64–dimensional
SURF
descriptors

–  four
codebooks
of
128
visual
words
(in
total
512)
to
quan8ze
each
descriptor

–  aggregate
SURF
descriptors
into
a
single
vector
of
64*512
=
32.768
dimensions

using

VLAD
scheme

–  PCA
to
create
a
1024-‐dimensional
L2-‐normalized
reduced
vector
that
represents
the

visual
content
of
the
image

•  Mul8-‐graph
genera8on

–  k
=
500
nearest
neighbors

–  visual
and
textual
similarity
thresholds
were
set
to
0.5
and
0.6

–  σ2
of
the
temporal
kernel
was
empirically
set
to
24
hours

•  SCAN
parameters
were
set
to

μ=2
and

ε=0.65

•  DivRank’s
dumping
factor
was
set
to
d=0.75

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Evalua2on
metrics
(1)

Precision-‐oriented
metrics

•  Precision
(P@N):
The
percentage
of
images
among
the
top
N

that
are
relevant
(answers
3&4)
to
the
corresponding
event,

averaged
among
all
events.
We
calculate
precision
for
N
equal

to
1,
5,
and
10.

•  Success
(S@N):
Percentage
of
events,
where
there
exist
at

least
one
relevant
image
among
the
top
N
returned,
for
N=10.

•  Mean
Reciprocal
Rank
(MRR)
:
Computed
as
1/r,
where
r
is

the
rank
of
the
ﬁrst
relevant
image
returned,
averaged
over
all

events.

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Evalua2on
metrics
(2)

Diversity-‐oriented
metrics

•  α-‐normalized
Discounted
Cumula2ve
Gain
:
α-‐nDCG@N

measures
the
usefulness,
or
gain,
of
the
returned
images

based
on
their
posi8on
in
the
summary
(N=10).

•  Average
Visual
Similarity:
AVS@N
measures
the
average

visual
similarity
among
all
pairs
of
images
in
the
top
N
selected

images,
averaged
over
all
events.
Lower
AVS
values
are

preferable
since
they
imply
higher
diversity
in
terms
of
visual

content.

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Baselines

•  Random:
randomly
selects
N
images
from
the
filtered
set
of
images
as
the

summary
set

•  MostPopular:
picks
up
the
N
most
popular
images
in
terms
of
reposts

•  LexRank:
uses
items
graph
GM,
ranks
the
nodes
using
the
LexRank
and

selects
the
top
N
nodes
that
contain
images

•  TopicBased:
selects
the
N
most
relevant
messages
from
the
most

significant
topics
(S_cov)
(relevance,
no
specificity
&
diversity)

•  P-‐TWR:
ranks
images
in
descending
order
using
the
weigh8ng
scheme

described
in
McParlane
et
al.
(popularity)

•  S-‐TWR:
groups
the
tweets
of
each
event
into
sub-‐clusters
and
select
the

highest
ranked
item
of
each
cluster
using
the
previous
weigh8ng
scheme

(specificity)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Results
(1)
–
Precision
oriented
metrics

89

•  MGraph
outperforms
all
of
the
compe8ng
methods

•  Popularity-‐based
approach
performs
well
for
P@1
but
drops

signiﬁcantly
for
N=5,10

•  LexRank
and
TopicBased
approaches
achieve
lower
but
more

steady
results

First relevant in
positions 1 - 2

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Results:
Canada
Team
in
#Sochi

Popularity-based
S-TWR
MGraph

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Results
(2)
–
Diversity
oriented
metrics

•  MGraph
achieves
the
best
score
for
α-‐nDCG@10

•  Best
values
of
AVS
achieved
by
S-‐TWR

•  The
worst
results
in
terms
of
AVS
are
obtained
using
LexRank

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Results
(3)

Performance
of
MGraph
across
diﬀerent
categories

•  Best
P@10
measure
is
obtained
for
events
about
Science
&
Technology

•  The
second
best
P@10
is
obtained
for
events
about
Arts
&
Entertainment

•  Diﬃcult
to
diversify

•  The
best
value
of
AVS
is
achieved
for
events
about
disasters
&
accidents

e.g.,
earthquakes

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Results
(4)

Impact
of
the
dumping
factor
d
on
P@10,
S@5,
MRR
and
α-‐nDCG@10

•  The
worst
results
for
all

metrics
are
obtained
for

d=0

(no
re-‐ranking)

•  The
best
results
are

achieved
for
0.7<d<0.8

•  slight
decrease
for
d>0.8

•  more
diverse
→
less

relevant

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Conclusions

•  Graph-‐based
approach
for
visual
summaries
for
real-‐world
events

•  Maximizes
relevance
and
diversity

•  Mul8modal
approach
taking
into
account

•  Textual
content

•  Visual
content

•  Social

•  Interac8ons
(replies)

•  Popularity

•  Time

•  Introduc8on
of
user
related
features
(e.g.
inﬂuence)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Monitoring
and
intelligence

system
for
Web
mul2media

veriﬁca2on

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Can
mul2media
on
the
Web
be
trusted?

#96

Real
photo

captured
April
2011
by
WSJ

but

heavily
tweeted
during
Hurricane
Sandy

(29
Oct
2012)

Tweeted
by
mul8ple
sources
&

retweeted
mul8ple
8mes

Original
online
at:

hJp://blogs.wsj.com/metropolis/2011/04/28/weather-‐
journal-‐clouds-‐gathered-‐but-‐no-‐tornado-‐damage/

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

The
Problem

•  Everyone
can
easily
publish
content
on
the
Web

•  Content
can
be
easily
repurposed
and
manipulated

•  News
outlets
are
compe8ng
for
views
and
clicks
à

Pressure
for
airing
stories
very
quickly
leaves
very

liJle
room
for
veriﬁca8on.
à
Very
oten,
even
well-‐
reputed
news
providers
fall
for
fake
news
content.

•  Mul8ple
tools
and
services
available
for
individual

tasks
à
complex
veriﬁca8on
process

Very
hard
and
2me
consuming
to
check
the
veracity
of

Web
mul2media

#97

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Media
REVEALr

•  Developed
within
the
REVEAL
project:

hJp://revealproject.eu/

•  Framework
for
collec8ng,
indexing
and
browsing

mul8media
content
from
the
Web
and
social
media

•  Support
for
veriﬁca8on:

–  Near-‐duplicate
detec8on
against
an
indexed
collec8on

–  Clustering
of
social
media
posts
by
visual
similarity
à

compara8ve
view
of
the
same
incident

–  Aggrega8on
and
visualiza8on
of
Named
En88es
around
an

incident

#98

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Related
Work

•  Majority
of
works
have
focused
on
problem
of
topic

detec8on
and
summariza8on:

–  TwitInfo
(Marcus
et
al.,
2011)

–  TwiJermonitor
(Mathioudakis
&
Koudas,
2010)

–  Meme
detec8on
&
predic8on
(Weng
et
al.,
2014)

•  Visual
memes
and
clustering

–  Visual
meme
tracking
(Xie
et
al.,
2011)

–  Supervised
mul8modal
clustering
(Petkos
et
al.,
2012)

•  Image
manipula8on
tracking

–  Internet
image
archaeology
(Kennedy
&
Chang,
2008)

#99

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Overview
of
Media
REVEALr

#100

Media
collec8on

Media
pre-‐processing
&

feature
extrac8on

Media
analysis,
mining
&

indexing

Persistence
(storage,
indexing)

Access
(API)

Visualiza8on,
front-‐end

TEXT
VISUAL

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Named
En2ty
Detec2on

•  Brevity
and
noisy
nature
of
text
in
social
media
poses

a
serious
challenge

•  Employed
solu8on:

–  Pre-‐processing:
tokeniza8on,
user
men8on
resolu8on,
text

cleaning

–  Stanford
NER
+
user
men8on
resolu8on

–  Regular
expressions
to
remove
special
characters
and

symbols
(e.g.,
#,
@,
URLs,
etc.)

#101

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Visual
Indexing

•  Content-‐based
image
retrieval
to
solve
Near-‐
Duplicate
Search
(NDS)
problem

•  Based
on
local
descriptors
(SURF),
aggrega8on

(VLAD),
dimensionality
reduc8on
(PCA),
quan8za8on

(PQ)
and
indexing
(IVFADC)

•  State-‐of-‐the-‐art
visual
similarity
search

–  High
precision/recall

–  Very
eﬃcient
and
scalable
implementa8on
(search
many

millions
of
images
in
a
few
msec,
maintain
full
index
in

memory
using
~1GB/10M
images)

#102

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Improving
NDS
Resilience
(NDS+)

•  Oten,
NDS
performance
suﬀers
from
overlay

graphics
and
fonts

•  To
address
this
issue,
we
integrate
a
descriptor-‐level

classiﬁer
that
tries
to
remove
the
font/graphic

descriptors
from
the
VLAD
vector

#103

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Example:
Filtering
Out
Font
Descriptors

•  Assuming
that
in
most
cases
the
classiﬁer
is
correct,

the
resul8ng
VLAD
vector
is
of
much
higher
quality

compared
to
the
one
without
ﬁltering

#104

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Classifier
Details

•  Random
Forest
used
as
base
classifier

•  Cost
Sensi8ve
meta-‐classifier
to
penalize

misclassifica8on
of
True
Posi8ves

•  Challenge
due
to
Class
Imbalance
(overlay

descriptors
<<
useful
image
content
descriptors)

–  Cost
Sensi8ve
meta-‐classifier
performs
over-‐sampling
of

minority
class
to
balance
the
training
set

•  Training
set
created
by
collec8ng
images
with

overlays
(e.g.,
memes)
from
the
Web
and
manually

annota8ng
them
(selec8ng
areas
w.
fonts/overlays)

#105

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Mining:
Clustering
and
Aggrega2on

•  Visual
aggrega8on

–  DBSCAN
on
the
visual
feature
representa8on
(PCA-‐
reduced
VLAD
vectors)

–  Element
(tweet)
selected
based
on
the
largest
amount
of

keywords
(expected
to
result
in
more
informa8on)

•  En8ty
aggrega8on

–  NER
on
individual
items

–  En8ty
categoriza8on
(à
Persons,
Loca8on,
Organiza8ons)

–  En8ty
ranking
based
on
frequency
of
occurrence

#106

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

User
Interface:
Collec2ons
View

#107

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

User
Interface:
Items
View
&
Search

#108

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

User
Interface:
Clusters
View

#109

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

User
Interface:
En22es
View

#110

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Evalua2on:
NER

•  Manual
annota8on
of
400
tweets
from
the
SNOW

Data
Challenge
dataset
(Papadopoulos
et
al.,
2014)

•  Measure:
Accuracy
à
instance
is
considered
correct

when
both
en8ty
and
type
are
correctly
iden8ﬁed

•  Three
compe8ng
solu8ons:

–  Base
Stanford
NER
(S-‐NER)

–  S-‐NER
+
Extensions/Post-‐processing
(S-‐NER+)

–  Ellogon
library
(hJp://www.ellogon.org)

#111

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Evalua2on:
NDS

•  Benchmark
Datasets

–  Holidays:
1,491
images,
500
queries
(Jegou
et
al.,
2008)

–  Oxford:
5,063
images,
55
queries
(Philbin
et
al.,
2008)

–  Paris:
6,412
images,
55
queries
(Philbin
et
al.,
2008)

•  Accuracy:
mean
Average
Precision
(mAP)

#112

CLEAN
DATASET
NOISY
DATASET

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Evalua2on:
NDS

•  Execu8on
Time
(msec)

•  Example

#113

INDEXED
IMAGE

QUERY
IMAGE

NDS:

#27

NDS+:
#1

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Use
Cases:
Real-‐world
Datasets

#114

sandy
boston
malaysia
ferry

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

NDS
Use
Case
(boston)

#115

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Clustering
Use
Case
(boston)

•  Visual
clustering
enables
compara8ve
view
and
analysis
over

8me
(in
this
case
showing
increasing
conﬁdence
on
picture).

•  When
journalists
see
many
similar
photos
of
the
same
scene,

they
have
more
conﬁdence
that
it
is
real
and
not
fabricated.

#116

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

En2ty
Aggrega2on
Use
Case
(snow)

#117

LOCATIONS
PERSONS
ORGANIZATIONS

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Conclusion

•  Key
contribu8ons

–  Framework
and
web
applica8on
oﬀering
valuable

veriﬁca8on
support
for
Web
mul8media

–  High-‐quality
individual
components
for
NER,
NDS,

clustering
and
aggrega8on

•  Future
Work

–  Incremental
image
clustering

–  Temporal
views
to
explore
evolu8on
of
a
story

–  Mul8media
forensics
toolbox
(splice,
copy-‐move

detec8on)

#118

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Computa2onal
Veriﬁca2on
in
Social
Media

•  Create
a
computa$onal
veriﬁca$on
framework
to

classify
tweets
with
unreliable
media
content.

•  Events
used
for
experimenta8on

#119

Fake
images
posted
during
Hurricane
Sandy
natural
disaster
Fake
images
posted
during
Boston
Marathon
bombings

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Methodology

#120

Tweet

Extrac8on

• Use
Topsy

machine
to
collect

tweets
with

certain
keywords

Image

Indexing

• Create
a

predefined
set
of

verified
fake
and

real
images

• Keep
the
tweets

with
iden8cal
or

near-‐duplicate

images

Feature

Extrac8on

• Extract
Content

and
User
features

for
each
tweet

collected
and

their
combina8on

Dataset

• Annotate
each

tweet
as
fake
or

real
based
on
the

image

• Keep
only
tweets

wriJen
in
English,

Spanish
or

German

Classifica8on

• Test
using
cross-‐
valida$on

approach

• Test
using
the
two

dis8nct
datasets

• Test
using

different
training

and
tes8ng

dataset

Content
features

• Length
of
the
tweet

• Number
of
words

• Contains
exclama8on
mark
and
their
number

• Contains
quota8on
mark
and
their
number

• If
the
text
contains
emo8con
(happy
or
sad)

• Number
of
uppercase
characters

• Number
of
hashtags

• Number
of
men8ons

• Number
of
pronouns

• Number
of
urls

• Number
of
sen8ment
words

• Number
of
retweets

User
features

• Username

• Number
of
friends

• Number
of
followers

• Number
of
followers/number
of
friends
ra8o

• Number
of
8mes
the
user
was
listed

• If
the
status
of
the
user
contains
url

• If
the
user
is
verified
or
not

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Results

•  Tweet
Sta8s8cs

•  Approaches

#121

Tweets
with
URLs
343939

Tweets
with
fake
images
10758

Tweets
with
real
images
3540

Hurricane
Sandy
Boston
Marathon

Tweets
with
URLs
112449

Tweets
with
fake
images
281

Tweets
with
real
images
460

Classifier
Classified
correctly(%)

Content

features

User

features

Total

features

J48
tree
81.41
67.72
80.68

KStar
81.28
71.16
81.38

Random

Forest

80.59
70.15
80.94

Detec8on
accuracy
using
cross
–
valida8on
approach

Classifier
Classified
correctly(%)

Content

features

User

features

Total

features

J48
tree
76.45
70.81
81.25

KStar
81.28
74.12
75.78

Random

Forest

78.59
76.15
79.10

Hurricane
Sandy
Boston
Marathon

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Results(2)

#122

Classifier
Classified
correctly(%)

Content

features

User

features

Total

features

J48
tree
73.79
51.06
65.06

KStar
75.30
62.29
53.31

Random

Forest

74.02
63.10
65.96

Detec8on
accuracy
using
different
training
and
tes8ng
set
in
Hurricane
Sandy

Classifier
Classified
correctly(%)

Content

features

User

features

Total

features

J48
tree
55.05
50.12
54.10

KStar
50.01
50.10
50.97

Random

Forest

58.75
51.03
58.78

Detec8on
accuracy
using
Hurricane
Sandy
for
training
and
Boston
Marathon
for
tes8ng

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#123

Other
approaches

•  Graph-‐based
mul8modal
clustering
for
social
event

detec8on
in
large
collec8ons
of
images

–  automa8c
organiza8on
of
a
mul8media
collec8on
into

groups
of
items,
each
(group)
of
which
corresponds
to
a

dis8nct
event.

•  Unsupervised
concept
learning
detec8on
using
social

media
as
training
data

•  Text
analysis
for
en88es
matching
and
sen8ment

analysis

•  Placing
images
based
on
content-‐features

•  Retrieving
diverse
images
for
same
en8ty

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#124

Demos
-‐
Applica2ons

MM
News
Demo

Clusrour

ThesFest

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Mul2media
Demo

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#126

Mul2media
Demo
Architecture

#126

StreamManager

TwiJer
Facebook
Flickr
YouTube
RSS
Instagram

160.xx.xx.207

MongoDBWrapper

160.xx.xx.207

TextIndexer

(Solr)

160.xx.xx.207

160.xx.xx.207

MediaFetcher,
FeatureExtractor
(HDFS)

160.xx.xx.58
160.xx.xx.107

Social
Focused
Crawler
(HDFS)

160.xx.xx.187

Nutch

Nutch
VLAD

FeatureIndexer
(HDFS)

160.xx.xx.207

IVFADC

Data
Mining

160.xx.xx.191

Visual
Clust.
Geo
Clust.
Sta8s8cs

Web
server

160.xx.xx.116

API
(3)
API
(4)

API
(1)
API
(2)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

MongoDB

Document-‐oriented
database
→
support
of
json

Current
stable
version:
3.0.6

hJps://www.mongodb.org/

Flexible
Data
Model
→
schemeless,
usefulll
for
social
media
data
that
change

over
8me

Horizontal
scaling
via
shards
and
replica
sets

Storage
of
social
media
items
as
json
objects
→
millions
of
documents
can

be
handled

Number
of
diﬀerent
index
types
→
single
ﬁeld,
compound,
mul8key
indexes.

Example:
Store
facebook
posts
and
index
them
by
publica8on
8me
and

number
of
likes

Query:
get
most
recent
posts
sorted
by
popularity
(#likes)

Na8ve
support
of
map-‐reduce
jobs
→
get
most
shared
images
in
a
collec8on

of
tweets

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Apache
Solr

Full-‐text
search
plaworm
built
on
top
ofApache
Lucene

Current
version:
5.3.0
hJp://lucene.apache.org/solr/

Indexing
of
social
media
items
e.g.
Tweets,
FB
posts,
metadata
of
Youtube
videos

etc.

Addi2onal
features

l  Faceted
Search
and
Filtering
→
get
top
N
per
ﬁeld
e.g.
users

l  Spa8al
index
&
Search
→
very
usefull
in
geo-‐tagged
documents
e.g.
Tweets.

l  Plugin-‐based
archtecture
→
language
detec8on,
NLP
etc
as
steps
of
indexing

pipeline

Get
tweets
containg
the
name
“Barack
Obama”
OR
the
phrase
“us
elec8ons”

having
geo-‐loca8on
around
New
York

SolrCloud
→
Cluster
of
Solr
instances

Automa8c
load
balancing
and
fail-‐over
for
queries

ZooKeeper
integra8on
for
cluster
coordina8on
and
conﬁgura8on

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Storm

Distributed
real-‐8me
computa8on
system
hJps://storm.apache.org

Topologies
→
processing
logic

Stream:
unbounded
sequence
of
tuples
e.g.
tweets
or
URLs

Spouts:
source
of
streams

Bolts:
processing,
ﬁltering,
etc

Processing
of
URLS
shared
in
social
media
→

storm
pipeline

l  Expand
short
URLs

l  Fetch
new
URLs

l  Extract
content
e.g.
ar8cles
and
images

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Redis

Key
-‐
Value
cache
and
store

Current
stable
version:
3.0
hJps://storm.apache.org/

Par22oning
→
distribu8on
of
data
among
mul8ple
Redis
instances

Keys
can
contain
strings,
hashes,
lists,
sets,
sorted
sets,
etc

Atomic
opera2ons:
set,
increment,
push
etc

Store
crawling
status
of
URLs,
sharing
informa8on
of
URLs
and
images

Addi8onal
Feature

l  Implementa8on
of
Publisher/Subscriber
paJern

l  Communica8on
of
diﬀerent
components
in
a
system
for
social

media
analy8cs

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

tags:
sagrada
familia,

cathedral,
barcelona

taken:
12
May
2009

lat:
41.4036,
lon:
2.1743

PHOTOS
&
METADATA

SPATIAL
CLUSTERING
+
TEMPORAL
ANALYSIS

COMMUNITY
DETECTION

CLASSIFICATION
TO
LANDMARKS/EVENTS

VISUAL

TAG

HYBRID

[2
years,
50
users
/
120
photos]

#users
/
#photos

dura8on

[1
day,
2
users
/
10
photos]

S.
Papadopoulos,
C.
Zigkolis,
Y.
Kompatsiaris,
A.
Vakali.
“Cluster-‐based
Landmark
and
Event
Detec8on
on
Tagged
Photo

Collec8ons”.
In
IEEE
Mul8media
Magazine
18(1),
pp.
52-‐63,
2011

City
proﬁle
crea2on
(Clusrour)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#132

City
proﬁle
crea2on
(Clusrour)

Community
detec2on
on

image
similarity
graphs

Nodes:
photos

Edges:
visual
and
tag

similarity

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#134

ThessFest

•  Thessaloniki

Interna8onal
Film

Fes8val

•  Support
twiJer/
comment
usage

within
the
app

•  Ra8ngs
and

comments
per
ﬁlm

•  Feedback

aggrega8on

•  Votes

•  Tweets

•  Real-‐8me
feedback

to
the
organisa8on

and
visitors

ThessFest

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Fête
de
la
Musique
Berlin
app

•  FETEberlin
in
App
Store
and
Google
Play

•  More
than
100K
visitors

•  About
5K
musicians

•  More
than
5K
app
downloads,
25K

sessions

App
features

•  Browse
and
ﬁlter
detailed
program

•  Interac8ve
maps
and
rou8ng

•  Social
Sharing

•  Ar8sts’
and
Stages
Details

•  Social
Monitoring

Main
beneﬁts
for
arendants

•  Visitors
can
browse
through
maps
and

don’t
get
lost
as
stages
are
numerous

•  Event
schedule
is
available
always
and

per
stage

–  Very
useful
when
the
server
was
down
and

there
was
no
access
to
the
online
schedule

#135

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#136

Topic
analysis

•  Top-‐10
topics

•  Manual
inspec8on

of
clusters:

–  53.8%
of
topic
8tles

considered

informa8ve

–  98.5%
of
clusters

were
found
to
be

“clean”

•  Topics
in
8me

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Other
Applica2on
Areas

•  Science

–  Sociology,
machine
learning
(machine
as
a
teacher),
computer
vision

(annota8on)

•  Tourism
–
Leisure
–
Culture

–  Oﬀ-‐the-‐beaten
path
POI
extrac8on

•  Marke8ng

–  Brand
monitoring,
personalised
ads

•  Predic8on

–  Poli8cs:
elec8on
results

•  News

–  Topics,
trends
event
detec8on

•  Others

–  Environment,
emergency
response,
energy
saving,
etc

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Reusable
results

•  Star2ng
point:
hJp://www.socialsensor.eu/results

– 
Deliverables

– 
Publica8ons

– 
Datasets

– 
Sotware

– 
e-‐leJer:
hJp://stcsn.ieee.net/e-‐leJer/vol-‐1-‐no-‐3

•  Open-‐source
projects
(Apache
License
v2):

hJps://github.com/socialsensor

– 
Data
collec8on
(stream-‐manager,
storm-‐focused-‐crawler)

– 
Indexing
(framework-‐client,
mul8media-‐indexing)

– 
Mining
(topic-‐detec8on,
mul8media-‐analysis,
community-‐evolu8on-‐
analysis,
social-‐event-‐detec8on)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#139

Benchmarking
-‐
Datasets

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

dataset:
SNOW
2014
Data
Challenge

•  A
set
of
~1M
tweets
collected
using
a
list
of
5000
UK-‐
focused
“news
hounds”
and
the
keywords
“Syria”,

“terror”,
“Ukraine”,
and
“bitcoin”
for
a
period
of
24

hours
star8ng
from
Feb
25,
18:00.

•  Average
rate:
~720
tweets/minute

•  Number
of
unique
twiJer
accounts:
~556K

•  Number
of
retweets:
~648K

•  Number
of
replies:
~135K

•  Ground
truth
topics:

hJp://ﬁgshare.com/ar8cles/SNOW_2014_Data_Challenge/1003755

#140

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Overview
of
Challenge

•  Goal:
Detec8on
of
newsworthy
topics
in
a
large
and

noisy
set
of
tweets

•  Topic:
a
news
story
represented
by
a
headline
+
tags

+
representa8ve
tweets
+
representa8ve
images

(op8onal)

•  Newsworthy:
A
topic
that
ends
up
being
covered
by

at
least
some
major
online
news
sources

•  Topics
are
detected
per
2meslot
(small
equally-‐sized

8me
intervals)

•  We
want
a
maximum
number
of
topics
per
8meslot

#141

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Challenge
Ac2vity
Log

•  Challenge
deﬁni8on
(Dec
2013)

•  Challenge
toolkit
and
registra8on
(Jan
20,
2014)

•  Development
dataset
collec8on
(Feb
3,
2014)

•  Rehearsal
dataset
collec8on
(Feb
17,
2014)

•  Test
dataset
collec8on
(Feb
25,
2014)

•  Results
submission
(Mar
4,
2014)

•  Paper
submission
(Mar
9,
2014)

•  Results
evalua8on
(Mar
5-‐18,
2014)

•  Workshop
(Apr
7,
2014)

#142

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Some
sta2s2cs

•  Registered
par8cipants:
25

–  India:
4,
Belgium:
3,
Germany:
3,
UK:
3,
Greece:
3,

Ireland:
2,
USA:
2,
France:
2,
Italy:
1,
Spain:
1,
Russia:
1

•  Par8cipants
that
signed
the
Challenge
agreement:
19

•  Par8cipants
that
submiJed
results:
11

•  Par8cipants
that
submiJed
papers:
9

#143

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

Evalua2on
Protocol

•  Deﬁned
several
evalua8on
criteria:

–  Newsworthiness
à
Precision/Recall,
F-‐score

–  Readability
à
scale
[1-‐5]

–  Coherence
à
scale
[1-‐5]

–  Diversity
à
scale
[1-‐5]

•  List
of
reference
topics

•  Set
up
precise
evalua8on
guidelines

•  Blind
evalua8on
(i.e.
evaluator
not
aware
of
which

method
a
topic
comes
from)
based
on
Web
UI

•  Par8cipants
submiJed
topics
for
96
8meslots,
but

manual
evalua8on
happened
for
5
sample
8meslots.

•  Result
valida8on
and
analysis

#144

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

social
event
detec2on

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

a
bit
of
background...

•  mediaeval

–  well-‐known
benchmarking
ac8vity
since
2010
(started
as

VideoCLEF
in
2008)

–  consists
of
several
tasks
dedicated
to
speciﬁc
challenges

•  social
event
detec2on
(SED)

–  ﬁrst
run
in
2011
(7
par8cipants)

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

task
defini2on
&
dataset

•  2011

collec8on:
73,645
flickr
photos
from
five
ci8es,
May
2009

find
events
related
to
two
target
categories

>
soccer
matches
in
Barcelona
and
Rome

>
concerts
in
venues
Paradiso
and
Parc
del
Forum

•  2012

collec8on:
167,332
flickr
photos
from
five
ci8es,
2009-‐2011

find
events
related
to
three
target
categories

>
technical
events
(e.g.
exhibi8ons,
fairs)
in
Germany

>
soccer
events
in
Hamburg
and
Madrid

>
Indignados
movement
in
Madrid

•  2013

collec8on
1:
437,370
flickr
photos
+
1,327
YouTube
videos

collec8on
2:
57,165
Instagram
photos

cluster
collec8on
1
into
events
(aJach
YouTube
videos
to
them)

categorize
collec8on
2
images
into
eight
event
types
or
non-‐event

variant
1

variant
4

variant
4

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

sed2012:
evalua2on
setup

•  ground
truth:
photos
clustered
around
149
events

(18
technical,
79
soccer,
52
Indignados)

•  assess
the
following
aspects:

–  accuracy
of
same-‐event
classiﬁca8on

–  compare
clustering
quality
between
item-‐to-‐cluster
and

the
two
versions
of
item-‐to-‐item
(batch
&
incremental)

–  measure
contribu8ons
of
diﬀerent
features

–  study
generaliza8on
abili8es
of
same
event
model

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data

evalua2on:
main
caveat

•  crea8on
strategy
of
benchmark
dataset
can

drama8cally
aﬀect
how
hard
(or
easy)
the
problem
is

–  if
events
are
very
sparsely
distributed
over
8me,
then
a

simple
8me-‐based
clustering
could
be
suﬃcient

–  if
events
correspond
to
users
one-‐to-‐one,
then
a
simple

user-‐based
look-‐up
could
yield
very
high
accuracy

–  using
the
same
source
for
training/tes8ng
makes
it
easy

•  need
to
explore
new
challenging
se†ngs

–  mul8ple
sources
of
mul8media

–  huge
amounts
of
non-‐event
content

–  very
dense
coverage
of
feature
space
by
test
events

S3P
2015,
Garda
Lake,
Italy

Processing
Large
Complex
Data
#150

Conclusions

Processing Large Complex Data

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (13)

Semelhante a Processing Large Complex Data

Semelhante a Processing Large Complex Data (20)

Mais de Yiannis Kompatsiaris

Mais de Yiannis Kompatsiaris (15)

Último

Último (20)

Processing Large Complex Data