Scaling Semantic Technology to Increase User Engagement

Scaling
Seman+c
Technology
to

Increase
User
Engagement
-‐
FT.com

September,
16th
2015

Ontotext, Scaling Semantic Technology #1Sept, 2015

•  Introducing
Ontotext

•  Related
Reads
–
a
FT.com
use
case

•  What
we
managed
to
achieve

•  Hands
on
FT.com
live

•  PosiHve
signs
across
the
news
and
media
domain

•  Hands
on
NOW
–
News
on
the
Web
demo
service

Outline


Why?

enable
be>er
search,
analy+cs
and
content
delivery

What?

data
and
content
management
technology

graph
database
engine
+
text-‐mining
solu+ons

How?
seman+c
analysis
of
text,
linking
text
to
data

NoSQL
database
with
inference

Best
for:
dealing
with
heterogeneous
dynamic
data

Clients:
BBC,
FT,
Bloomberg,
DK,
AstraZeneca,
Wiley,
etc.

Facts:

70
staﬀ;
HQ
in
Soﬁa;
sales
in
London
&
New
York

USP:
the
best
semanHc
graph
database
engine

text-‐mining
pla[orm
integrated
with
graph
database

Company
Brief


Sample
RDF
Graph:
Data
and
Schema

#4Sept, 2015
myData: Maria
ptop:Agent
ptop:Person
ptop:Woman
ptop:childOf
ptop:parentOf
rdfs:range
owl:inverseOf
inferred
myData:Ivan
owl:relativeOf
owl:inverseOfowl:SymmetricProperty
rdfs:subPropertyOf
owl:inverseOf
owl:inverseOf
rdf:type
rdf:type
rdf:type
Ontotext, Scaling Semantic Technology

Interlinking
Text
and
Data


Seman+c
Annota+on

Ontotext, Scaling Semantic Technology #6
pmid:17714090
umls:C0035204
COPD
Bronchial Diseases
Respiration Disorders
umls:C0006261
Chronic Obstructive
Airway Diseases
Asthma umls:C000496
Ian A Yang
Clinical and experimental pharmacology …
Sept, 2015

Technology
PorTolio


Ontotext
and
Financial
Times

Ontotext, Scaling Semantic Technology
Proﬁle

•  Top
3
business
media

•  Focused
both
on
B2C
publishing
and
B2B

services

Goals

•  Create
a
horizontal
pla[orm
for
both
data

and
content
based
on
semanHcs
and
serve

all
funcHonality
through
it

Challenges

•  CriHcal
part
of
the
enHre
workﬂow

•  MulHple
development
projects
in
parallel

with
up
to
2
months
Hme
between

incepHon
and
go
live

•  Horizontal
pla[orm
with
focus
on

organizaHons,
people,
GPEs
and
relaHons

between
them

•  AutomaHc
extracHon
of
all
these
concepts

and
relaHonships

•  Separate
stream
of
work
for
a
user
behavior

based
recommenda+on
of
relevant
content

and
data
across
the
enHre
media

#8Sept, 2015

Serve
relevant
arHcles

to
increase
user
engagement

and
improve
usability

FT
Primary
Objec+ve


Subject:
User

Object:
Ar+cle,
Media
Asset,
Data,
…

AcHon:
Read,
Preview,
Comment,
…

Subject,
Object,
Ac+on

action

Contextual
Recommenda+on

Contextual
Similarity

Behavioural
Recommenda+on

Behavioural
Similarity
User Profile

Contextual
and
Behavioural
in
Combina+on

Behavioural
and
Contextual
SimilarityReads
User Profile

Average
News
Ar+cle
Metadata

Article
N
Y
promoted
(popular)
updated
created
image
summary
title
ID
URL
reads
views
votes
comments

FT
Ar+cle
Metadata

Summary
Title
body
editorial
img:alt
people
regions
organisations
IPTC
tags

Metadata
Used

Summary
Title
body
editorial
img:alt
people
regions
organisations
IPTC
tags
concepts keyphrases

User
Ac+ons

Limited
to
User
reads
ArHcle

reads

User
Ac+ons:
Another
Perspec+ve

perform
comments
votes
posts
preview
read
contains leads to
read
leads to
preview
Article
Search
Action
Result
Date
FTS Q. Tag
Cat
Tag set
results
cat
taxonomy
Search Log
-------------
-------------
-------------
-------------
-------------

•  Relies
on
the
previous
choices
of
an
individual

user
(a
user's
proﬁle)

•  Results
on
the
basis
of
the
similarity
of
items,

deﬁned
in
terms
of
their
content

•  The
recommended
content
is
rather

homogeneous

“Content”-‐based
Recommenda+on


Two-‐fold
scoring
approach

•  Similarity
to
recently
viewed
arHcles
(context)

•  Relevance
to
a
long-‐term
user
proﬁle

–  Weights
reﬂecHng
the
relaHve
importance
of
the
individual

terms
(staHc
component)

–  TransiHon
likelihoods
among
any
pair
of
terms
(dynamic

component)

Content-‐based
Ranking
Mechanisms


•  Rely
on
staHsHcs
that
reﬂect
the
past
choices
of

all
users

•  Results
based
on
user
raHngs,
and
the
similarity

of
users
or
items

•  Content-‐agnosHc

•  Aware
of
the
quality
of
content

Collabora+ve
Filtering


Collabora+ve
Ranking
Mechanisms

User to Content
Similarity Score
User to User Sim.
Score
Content to Content
Sim. Score

•  Combines
both
approaches
to
improve
the

quality
of
predicHon

•  Implemented
via
staHsHcal
models

•  Takes
a
wide
array
of
features
into
consideraHon

Hybrid
Approach


Ini+al
Architecture


Final
Architecture

SOLR 1
SOLR 2
SOLR 3
CS
Node 3
CS
Node 1
CS
Node 2
Replication
Group I
FT API
Fetch &
Annotation
OWLIM
Worker
Recommendation
API
Varnish Cache
RR
RR
RR
Read
Article
1. get related
2. ask
4. query
3. on cache
miss
1. pull content
2. annotate
3. index
annotate
content
store
user
proﬁles
update
popularity
click stream
update user
AWS INSTANCE
AWS INSTANCE
AWS INSTANCE
AWS Elastic LB

1.
Pull
content
–
annotate/enrich
–
index

2.
Accumulate/update
user
proﬁle

3.
Recommend

Main
Ac+ons


Implementa+on
Overview

Profile Update
Request
(User ID, Item ID)
Query Generation
Items Index
(Solr)
Profile
Storage
(Cassandra)
Recommendation
Request
(User ID)
Profile Update
User:
- context
- static component
- dynamic component
Article:
- co-visitation matrix
- popularity
Boosted sub-queries for all
involved ranking schemes:
content-based, collaborative,
popularity, recency

•  8m
named
enHHes
and
metadata
about
them

•  20m
labels
of
People
and
OrganisaHons

•  CES
cluster
which
can
be
scaled
horizontally
to
handle

peak
loads

•  Live
dicHonary
updates
coming
from
GraphDB
through

the
EUF
(EnHty
Update
Feed)
plugin

•  Max
throughput
-‐
10
docs/sec
on
a
single
c3.2xlarge
AWS

node,
mulHple
by
N
to
get
an
N
nodes
cluster
throughput

•  Reliability
has
been
100%,
but
the
soluHon
hasn't
been

stressed
as
much
as
we've
designed
it
for

Wrap
up
-‐
Concept
Extrac+on
Highlights


•  100%
reliability
in
producHon
for
a
full
year
(Ontotext

also
manages
the
deployment)

•  API
handling
1,5m
requests
a
day
on
average,
up
to
3m

requests
a
day
(1/3
recommendaHons,
1/3
logging
user

acHon,
1/3
checking
whether
a
user
has
enough
history

to
ask
for
behavioural
recommendaHons)

•  Roughly
200m
recommendaHons
served
and
200m
user

acHons
tracked
to
day
since
go
live

•  450
873
documents
indexed

•  No
caching,
since
everything
is
eﬀecHvely
a
personalized

search
request

Wrap
up
-‐
Recommenda+on
Highlights


•  GraphDB
had
to
comply
with
a
set
of
tests
designed
by
FT
and

OT:
Network
lag,
Disk
Space,
Disk
Load,
Less
Memory,
CPU

Load,
etc.

•  Comprehensive
support
for
OWL
and
SPARQL

•  Eﬃcient
inference
through
the
enHre
life-‐cycle
of
the

data

•  High-‐availability
cluster
architecture
–
proven
and
mature

for
more
than
5
years
now

–  GraphDB
ﬁrst
HA
implementaHons
works
at
BBC
since
2010

–  Unmatched
HA
Tests
and
TransacHon
load
benchmarks

•  FTS
and
NoSQL
Connectors
for
seamless
integraHon

Wrap
up
–
GraphDB
Highlights


•  Washington
Post
tests
new
‘Knowledge
Map’
feature

“Our
ulHmate
goal
is
to
mine
big
data
to
surface
highly
personalized
and

contextual
data
for
both
journalisHc
and
naHve
content.”

•  New
York
Times
RnD
Lab
announced
an
experimental

project
“Editor”

1)
recognize
a
term
that
can
be
categorized,
2)
link
that
enHty
to
exisHng

databases
or
microservices,
3)
make
this
enriched
informaHon

accessible
to
journalists

•  BBC
Structured
Journalist
Manifesto

Structured
journalism
:
1)
On
the
reporter
side
-‐
automaHon
helps

improve
a
journalist’s
reporHng
and
make
it
less
cumbersome,
2)
on

the
audience
side
semtech
helps
scale
things
that
can
improve
the

reader’s
experience

Posi+ve
Signs
from
the
News
Industry


Selec+on
of
Ontotext
Customers


Thanks!


We
will
be
delighted
to
have
a
word
with
you
auer
the

session
or
later
today
or
tomorrow!

•  Dr.
Georgi
Georgiev
–
Head
of
Ontotext
Text
Analysis

Unit

-‐
georgi.georgiev@ontotext.com

•  Ilian
Uzunov
–
Sales
Director
CEMEAA
-‐

ilian.uzunov@ontotext.com

•  Nikolay
Krustev
–
GraphDB
Sales
Engineer
-‐

nikolay.krustev@ontotext.com

Scaling Semantic Technology to Increase User Engagement

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a Scaling Semantic Technology to Increase User Engagement

Semelhante a Scaling Semantic Technology to Increase User Engagement (20)

Mais de Semantic Web Company

Mais de Semantic Web Company (20)

Último

Último (20)

Scaling Semantic Technology to Increase User Engagement