Technical Challenges in Resource Discovery

Technical challenges in resource discovery

Paul

Walk
paul@paulwalk.net
@paulwalk
http://www.paulwalk.net

Contents

1. a
general
consideration:
• open
or
closed

2. a
particular
challenge:
• synchronisation
in
an
open
world

3. the
‘nothing
new’,
but
doing
it
better
• APIs
that
work
and
can
be
trusted

a healthy(?) state of tension
between open and closed

open and closed worlds

• I’m
not
talking
about
licensing
or
access
to
data

• open
• unbounded
-‐
like
the
Web

• closed
• bounded
-‐
like
most
collections
management
system,
aggregations
etc.

• formally,
much
of
what
we
do
is
underpinned
by
‘open/closed
worlds’

assumptions:

• open
world
assumption:
any
statement
not
known
to
be
true
is
unknown
• closed
world
assumption:
any
statement
not
known
to
be
true
is
false

characteristics of an open world

characteristics of a closed/bounded world

judging where to apply each

• we
need
our
infrastructure
(especially
integration
technology
between
systems)

to
be
open
and
relatively
unbounded

• the
Web
is
still
the
best
available
foundation
for
this

• however,
we
still
need
to
manage
our
resources,
maintain
quality
and
honour

complex
rights
management
commitments

• we
probably
need
to
recognise
that
users’
experience
is
often
enhanced
through

the
application
of
a
more
focussed,
targeted
and
context-‐aware
approach

synchronisation

• how
is
the
state
of
the

resource
maintained
across

Resource
Collection an
infrastructure
of

Aggregation ‘federated’
repositories?

Resource
• if
a
resource
is
changed
or

Collection
Aggregation
deleted,
how
does
the
right-‐
hand
side
aggregation
know?
Aggregation
Resource • note
-‐
this
is
based
on
our

Collection existing
‘harvesting’
or
‘pull’

approach
Resource
Collection multiple harvest routes,
multiple copies

ResourceSync

• a
joint
project
of
NISO
and
OAI,
led
by

Herbert
Van
de
Sompel
of
Los
Alamos

• a
light-‐weight
mechanism
to
allow
the

state
of
web
resources
to
be

communicated
between
web
systems

• developing
a
spec
which
builds
on
the

sitemap
speciTication,
allowing

content
providers
to
publish

changesets

• draft:
http://bit.ly/WYhTz2

• Jisc
have
funded
UK
participation
in

this

The sun shone, having no
alternative, on the nothing
new. Murphy,
Samuel
Becket

A distributed system is one
in which the failure of a
computer you didn't even
know existed can render
your own computer unusable
Leslie Lamport

a common ‘anti-pattern’

• as
a
developer,
I
have
no
reason
to

trust
that
these
APIs
are
any
good.

end-user
end-user end-user
UI • after
all,
the
service
provider

UI UI doesn’t
seem
to
trust
them
for
their

Future own
application....
Future 3rd-party Future
3rd-party dev 3rd-party
dev dev

API AP
A PI I

some aggregated data of broad
interest and potential usefulness

= certainty UI
= belief
= speculation

end-user

a better pattern

• As
a
developer,
I’m
more
likely
to

trust
this
pattern.

• the
content
provider
is
using
their

end-user end-user own
API
to
deliver
their
own

application.
UI UI

• they
have
a
vested
interest!
3rd-party focussed
app app

API
= certainty
= belief
some aggregated data of broad
= speculation interest and potential usefulness

APIs are not best thought of
as machine-to-machine
interfaces

APIs are interfaces for
developers

messages from developers to content-providers

• These
are
from
yesterday’s
developer
day
held
here
at
the
BL
in
support
of
this

summit:

• please
don’t
build
elaborate
APIs
which
do
not
allow
us
to
see
all
of
the
data,

or
its
extent.
It’s
not
that
we
simply
want
to
download
all
the
data
-‐
but
we
do
need

to
see
what
we’re
dealing
with

• if
you
give
us
access
to
incomplete
data
(perhaps
because
you’re
worried
about

revealing
poor
data
quality),
then
we
will
tend
to
either
abandon
our
attempts
to

use
it
or
we
will
‘Bill
in
the
gaps’
with
data
from
elsewhere.
So
offering
an
API

which
delivers
incomplete
data
is
usually
self-‐defeating

• the
implicit
bargain,
made
explicit:
• give
us
access
to
the
data
as
soon
as
possible
and
we
will
do
some
of
the
work
to

process
so
it
is
Bit
for
some
new
purpose
-‐
and
we
will
happily
share
this
code

with
you

Questions for the parallel sessions

1. Which
emerging
technologies
do
we
need
to
focus
on
in

2013?

2. Do
we
still
need
to
aggregate?

3. What
does
data
quality
stop
us
doing?

Which emerging technologies do we need to
focus on in 2013?

• Graphs:
Content
Context
is
king

• both
Facebook
and
Google
are
betting

heavily
on
graph
technologies

• closer
to
home
-‐
so
are
content
providers

like
the
BBC

• linking
these
is
an
interesting
challenge

• databases
based
on
a
graph
model
give

the
potential
for
a
richer
understanding

about
entities
(users!)

• instrumentation
in
personal
devices

makes
more
context
available
(e.g.
geo-‐
location).

Do we still need to aggregate?


yes.


yes.
• to
address
systems/network
latency
-‐
provide
a
cache

• to
showcase!

• for
‘Web
Scale
concentration’

• network
effects
if
user
facing
services
also
developed

• to
create
middleman
business
opportunities

• as
infrastructure
to
support
locally
developed
services

• as
an
approach
to
preservation

What does data quality stop us doing?

• interpreted
as:
“what
does
a
concern
for
data
quality
stop
us
doing?”
• it
stops
us
from
releasing
data
early

• interpreted
as:
“what
does
poor/uncertain
data
quality
stop
us
doing?”
• it
erodes
trust,
which
impacts
the
likelihood
of
someone
doing
something

worthwhile
with
our
data

• reconciling
these
concerns
is
a
major
challenge
for
us.

thank you!

Paul

Walk
paul@paulwalk.net
@paulwalk
http://www.paulwalk.net

Technical Challenges in Resource Discovery

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Technical Challenges in Resource Discovery

Semelhante a Technical Challenges in Resource Discovery (20)

Mais de Paul Walk

Mais de Paul Walk (20)

Último

Último (20)

Technical Challenges in Resource Discovery