- The document describes a system built by Ontotext to provide semantic-based recommendations on the Financial Times website to increase user engagement.
- It analyzes article content and user behavior to build a user profile and identify similar articles using both contextual and behavioral similarity.
- The final architecture includes components for content fetching, annotation, indexing, user profiling, and recommendations using a combination of collaborative filtering and content-based ranking.
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Scaling Semantic Technology to Increase User Engagement
1. Scaling
Seman+c
Technology
to
Increase
User
Engagement
-‐
FT.com
September,
16th
2015
Ontotext, Scaling Semantic Technology #1Sept, 2015
2. • Introducing
Ontotext
• Related
Reads
–
a
FT.com
use
case
• What
we
managed
to
achieve
• Hands
on
FT.com
live
• PosiHve
signs
across
the
news
and
media
domain
• Hands
on
NOW
–
News
on
the
Web
demo
service
Outline
Ontotext, Scaling Semantic Technology #2Sept, 2015
3. Why?
enable
be>er
search,
analy+cs
and
content
delivery
What?
data
and
content
management
technology
graph
database
engine
+
text-‐mining
solu+ons
How?
seman+c
analysis
of
text,
linking
text
to
data
NoSQL
database
with
inference
Best
for:
dealing
with
heterogeneous
dynamic
data
Clients:
BBC,
FT,
Bloomberg,
DK,
AstraZeneca,
Wiley,
etc.
Facts:
70
staff;
HQ
in
Sofia;
sales
in
London
&
New
York
USP:
the
best
semanHc
graph
database
engine
text-‐mining
pla[orm
integrated
with
graph
database
Company
Brief
Ontotext, Scaling Semantic Technology #3Sept, 2015
8. Ontotext
and
Financial
Times
Ontotext, Scaling Semantic Technology
Profile
• Top
3
business
media
• Focused
both
on
B2C
publishing
and
B2B
services
Goals
• Create
a
horizontal
pla[orm
for
both
data
and
content
based
on
semanHcs
and
serve
all
funcHonality
through
it
Challenges
• CriHcal
part
of
the
enHre
workflow
• MulHple
development
projects
in
parallel
with
up
to
2
months
Hme
between
incepHon
and
go
live
• Horizontal
pla[orm
with
focus
on
organizaHons,
people,
GPEs
and
relaHons
between
them
• AutomaHc
extracHon
of
all
these
concepts
and
relaHonships
• Separate
stream
of
work
for
a
user
behavior
based
recommenda+on
of
relevant
content
and
data
across
the
enHre
media
#8Sept, 2015
9.
Serve
relevant
arHcles
to
increase
user
engagement
and
improve
usability
FT
Primary
Objec+ve
Ontotext, Scaling Semantic Technology #9Sept, 2015
13.
Contextual
and
Behavioural
in
Combina+on
Ontotext, Scaling Semantic Technology #13Sept, 2015
Behavioural
and
Contextual
SimilarityReads
User Profile
14.
Average
News
Ar+cle
Metadata
Ontotext, Scaling Semantic Technology #14Sept, 2015
Article
N
Y
promoted
(popular)
updated
created
image
summary
title
ID
URL
reads
views
votes
comments
15.
FT
Ar+cle
Metadata
Ontotext, Scaling Semantic Technology #15Sept, 2015
Summary
Title
body
editorial
img:alt
people
regions
organisations
IPTC
tags
16.
Metadata
Used
Ontotext, Scaling Semantic Technology #16Sept, 2015
Summary
Title
body
editorial
img:alt
people
regions
organisations
IPTC
tags
concepts keyphrases
17.
User
Ac+ons
Ontotext, Scaling Semantic Technology #17Sept, 2015
Limited
to
User
reads
ArHcle
reads
18.
User
Ac+ons:
Another
Perspec+ve
Ontotext, Scaling Semantic Technology #18Sept, 2015
perform
comments
votes
posts
preview
read
contains leads to
read
leads to
preview
Article
Search
Action
Result
Date
FTS Q. Tag
Cat
Tag set
results
cat
taxonomy
Search Log
-------------
-------------
-------------
-------------
-------------
19. • Relies
on
the
previous
choices
of
an
individual
user
(a
user's
profile)
• Results
on
the
basis
of
the
similarity
of
items,
defined
in
terms
of
their
content
• The
recommended
content
is
rather
homogeneous
“Content”-‐based
Recommenda+on
Ontotext, Scaling Semantic Technology #19Sept, 2015
20. Two-‐fold
scoring
approach
• Similarity
to
recently
viewed
arHcles
(context)
• Relevance
to
a
long-‐term
user
profile
– Weights
reflecHng
the
relaHve
importance
of
the
individual
terms
(staHc
component)
– TransiHon
likelihoods
among
any
pair
of
terms
(dynamic
component)
Content-‐based
Ranking
Mechanisms
Ontotext, Scaling Semantic Technology #20Sept, 2015
21. • Rely
on
staHsHcs
that
reflect
the
past
choices
of
all
users
• Results
based
on
user
raHngs,
and
the
similarity
of
users
or
items
• Content-‐agnosHc
• Aware
of
the
quality
of
content
Collabora+ve
Filtering
Ontotext, Scaling Semantic Technology #21Sept, 2015
22. Collabora+ve
Ranking
Mechanisms
Ontotext, Scaling Semantic Technology #22Sept, 2015
User to Content
Similarity Score
User to User Sim.
Score
Content to Content
Sim. Score
23. • Combines
both
approaches
to
improve
the
quality
of
predicHon
• Implemented
via
staHsHcal
models
• Takes
a
wide
array
of
features
into
consideraHon
Hybrid
Approach
Ontotext, Scaling Semantic Technology #23Sept, 2015
28. • 8m
named
enHHes
and
metadata
about
them
• 20m
labels
of
People
and
OrganisaHons
• CES
cluster
which
can
be
scaled
horizontally
to
handle
peak
loads
• Live
dicHonary
updates
coming
from
GraphDB
through
the
EUF
(EnHty
Update
Feed)
plugin
• Max
throughput
-‐
10
docs/sec
on
a
single
c3.2xlarge
AWS
node,
mulHple
by
N
to
get
an
N
nodes
cluster
throughput
• Reliability
has
been
100%,
but
the
soluHon
hasn't
been
stressed
as
much
as
we've
designed
it
for
Wrap
up
-‐
Concept
Extrac+on
Highlights
Ontotext, Scaling Semantic Technology #28Sept, 2015
29. • 100%
reliability
in
producHon
for
a
full
year
(Ontotext
also
manages
the
deployment)
• API
handling
1,5m
requests
a
day
on
average,
up
to
3m
requests
a
day
(1/3
recommendaHons,
1/3
logging
user
acHon,
1/3
checking
whether
a
user
has
enough
history
to
ask
for
behavioural
recommendaHons)
• Roughly
200m
recommendaHons
served
and
200m
user
acHons
tracked
to
day
since
go
live
• 450
873
documents
indexed
• No
caching,
since
everything
is
effecHvely
a
personalized
search
request
Wrap
up
-‐
Recommenda+on
Highlights
Ontotext, Scaling Semantic Technology #29Sept, 2015
30. • GraphDB
had
to
comply
with
a
set
of
tests
designed
by
FT
and
OT:
Network
lag,
Disk
Space,
Disk
Load,
Less
Memory,
CPU
Load,
etc.
• Comprehensive
support
for
OWL
and
SPARQL
• Efficient
inference
through
the
enHre
life-‐cycle
of
the
data
• High-‐availability
cluster
architecture
–
proven
and
mature
for
more
than
5
years
now
– GraphDB
first
HA
implementaHons
works
at
BBC
since
2010
– Unmatched
HA
Tests
and
TransacHon
load
benchmarks
• FTS
and
NoSQL
Connectors
for
seamless
integraHon
Wrap
up
–
GraphDB
Highlights
Ontotext, Scaling Semantic Technology #30Sept, 2015
31. • Washington
Post
tests
new
‘Knowledge
Map’
feature
“Our
ulHmate
goal
is
to
mine
big
data
to
surface
highly
personalized
and
contextual
data
for
both
journalisHc
and
naHve
content.”
• New
York
Times
RnD
Lab
announced
an
experimental
project
“Editor”
1)
recognize
a
term
that
can
be
categorized,
2)
link
that
enHty
to
exisHng
databases
or
microservices,
3)
make
this
enriched
informaHon
accessible
to
journalists
• BBC
Structured
Journalist
Manifesto
Structured
journalism
:
1)
On
the
reporter
side
-‐
automaHon
helps
improve
a
journalist’s
reporHng
and
make
it
less
cumbersome,
2)
on
the
audience
side
semtech
helps
scale
things
that
can
improve
the
reader’s
experience
Posi+ve
Signs
from
the
News
Industry
Ontotext, Scaling Semantic Technology #31Sept, 2015
33. Thanks!
Ontotext, Scaling Semantic Technology #33Sept, 2015
We
will
be
delighted
to
have
a
word
with
you
auer
the
session
or
later
today
or
tomorrow!
• Dr.
Georgi
Georgiev
–
Head
of
Ontotext
Text
Analysis
Unit
-‐
georgi.georgiev@ontotext.com
• Ilian
Uzunov
–
Sales
Director
CEMEAA
-‐
ilian.uzunov@ontotext.com
• Nikolay
Krustev
–
GraphDB
Sales
Engineer
-‐
nikolay.krustev@ontotext.com