SlideShare uma empresa Scribd logo
1 de 113
Baixar para ler offline
Seman&c	
  Web
                                       	
  for	
  
                          Libraries	
  &	
  Publishers

                             Charleston	
  Conference	
  
                                    111103




Monday, November 21, 11

so, what’s the problem?
The	
  Problem	
  Set


                                              2

Monday, November 21, 11
Monday, November 21, 11

Silos
Monday, November 21, 11

More silos
Monday, November 21, 11

Lots of different silos
Monday, November 21, 11

Blue silos
Monday, November 21, 11

Old Silos
We in the library and publishing trades force readers, some of them who are authors as
well, to search iteratively for information they want or need or thinks might exist, in
many different silos, using many different search engines, forms, and vocabularies. We
do not make it easy for them to discover what is locally available, what is more or less
easy to get, or everything that might be available.
No wonder the young and foolish depend upon and believe in Google’s searches.
Google is quick...and in terms of search terms of relevance, very, very dirty.
Monday, November 21, 11

We give them better interfaces, ones that permit refinement of results, to our holdings at
the title level, BUT...
Monday, November 21, 11

Simulateneously, we show them many other tools, each excellent in some ways, to
continue their exploration of the literature. No single tool is comprehensive. We do not
refer our clients to the Web, at least not on our own web sites! // Our OPACs refer to our
holdings. While Indices and abstracts refer our readers to articles in journals to which
we may have licensed. SFX and similar provide readers with links to titles revealed to
which we have subscribed. Neither our opacs nor the secondary databases directly to
more than a tiny, percentage of the vast collection of pages that is the World Wide Web.
The Web, of course, refers in fragmentary fashion to information resources we might, I
emphasize, MIGHT have on hand for our readers.
Monday, November 21, 11

And the results of using other, often very good, discovery tools differ in relevance
ranking, format, and options than the ones we provide for our OPAcs, thus adding
confusion.
Monday, November 21, 11

some of us provide our readers with lots of databases to search. Too many really, for all
but a few are not forensic-level scholars.
Monday, November 21, 11

Selecting a licensed data base is an art in itself!
Once again notice that we rarely offer a web search engine as an option, and for good
reasons. Nevertheless, the discoverable relevant information resources on the web
apparently are not part of our repertory.
!!!


Monday, November 21, 11

We have not conspired to make the search for relevant information objects difficult. We
just have not yet had the tools, the methods, the vision, and yes, the gumption to try
something new.
ATLAS at LHC -- 150*106 sensors




    Ntl Cntr for
    Biotech Info




                                         NSF CyberInfrastructure
          quake engineering simulation




Monday, November 21, 11

Here’s a teensy slice of the information and communication environment in which our
faculty and students find themselves. And it gets more complex every day. Alas the
larger the number of websites indexed by Bing or Google or whatever search engine du
jour, the more likely it is that the relevance of the returns will be less pointed and
precisely matched to what the searcher hoped to find.
Monday, November 21, 11

Too many silos.
Here’s the biggest of the lot...
16

Monday, November 21, 11
One size fits all???
                                                           17

Monday, November 21, 11

Does	
  one	
  size	
  fit	
  all?
18

Monday, November 21, 11

Not	
  quite.	
  	
  Even	
  Google	
  has	
  silos	
  and	
  uses,	
  as	
  do	
  others,	
  clever	
  interfaces	
  to	
  hide	
  the	
  fact	
  of	
  the	
  
silos.
Monday, November 21, 11

Given all these silos and search engines, our users, our authors, and readers, and
teachers, and students, people on the street, our nations...need us to find a better way.
Facts about the information objects we have acquired or leased, facts about books,
articles, films, and so forth that we have published need to be found in the wild, on the
web. Ideally, we, librarians and publishers will get the facts about what we have and
what we are making public, for fun or profit, discoverable on the Web.
Discovery & Access


                              ...   the problems



Monday, November 21, 11
Let’s dwell on the problems
briefly...
1. Too many stovepipe systems


        2. Too little precision
           with inadequate recall


                                    3
          3. Too far removed from W
                                              Web
                                          Wide
                                        World




Monday, November 21, 11
1. Too many stovepipe systems




Monday, November 21, 11
1. Too many stovepipe systems

     The landscape of discovery & access
     services is a shambles




Monday, November 21, 11
1. Too many stovepipe systems

     The landscape of discovery & access
     services is a shambles

     It can’t be mapped in any logical way




Monday, November 21, 11
1. Too many stovepipe systems

     The landscape of discovery & access
     services is a shambles

     It can’t be mapped in any logical way
           • not by us (the supposed information pros)
           • not by the faculty & students who must navigate the chaos




Monday, November 21, 11
1. Too many stovepipe systems

     The landscape of discovery & access
     services is a shambles

     It can’t be mapped in any logical way
           • not by us (the supposed information pros)
           • not by the faculty & students who must navigate the chaos


     This state of affairs shouldn’t be a surprise



Monday, November 21, 11
2. Too little precision
        with inadequate recall




Monday, November 21, 11
2. Too little precision
        with inadequate recall

     Some of the problem ... too many stovepipe systems




Monday, November 21, 11
2. Too little precision
        with inadequate recall

     Some of the problem ... too many stovepipe systems
     • dumbing-down effects of federation often hinder explicit searches
     • each interface has its own search-refinement tricks
     • numerous, overlapping discovery paths hamper full recall




Monday, November 21, 11
2. Too little precision
        with inadequate recall

     Some of the problem ... too many systems
     • dumbing down effects of federation often hinder explicit searches
     • each interface has its own search-refinement tricks
     • numerous, overlapping discovery paths hamper full recall


      Most of the problem ...
         limitations in the design & execution of infrastructure
         that supports discovery & access


Monday, November 21, 11
the 1st limiting factor ... ambiguity




Monday, November 21, 11
the 1st limiting factor ... ambiguity
   Most of our metadata uses a string of bytes
   to label a semantic entity [people, places, things, events, ...]




Monday, November 21, 11
the 1st limiting factor ... ambiguity
   Most of our metadata uses a string of bytes
   to label a semantic entity [person, place, thing, event, ...]
     • discovery based on matching text labels
     • not on the gist of semantic entities




Monday, November 21, 11
the 1st limiting factor ... ambiguity
   Most of our metadata uses a string of bytes
   to label a semantic entity [person, place, thing, event, ...]
     • discovery based on matching text labels
     • not on the gist of semantic entities
     For libraries, the fix is authorities
     • authoritative forms of strings
       (names, organization, titles, places, events, topics, etc.)




Monday, November 21, 11
the 1st limiting factor ... ambiguity
   Most of our metadata uses a string of bytes
   to label a semantic entity [person, place, thing, event, ...]
     • discovery based on matching text labels
     • not on the gist of semantic entities
     For libraries, the fix is authorities
     • authoritative forms of strings (names, organization, titles,
       places, events, topics, etc.) work to improve precision and recall


      hold on
        ... what about cases where no one-to-one relationship exists
       between a string-of-text label & the underlying semantic entity



Monday, November 21, 11
the 1st limiting factor ... ambiguity
   Most of our metadata uses a string of bytes
   to label a semantic entity [person, place, thing, event, ...]
     • discovery based on matching text labels
     • not on the gist of semantic entities
     For libraries, the fix is authorities
     • authoritative forms of strings (names, organization, titles,
       places, events, topics, etc.) work to improve precision and recall


      hold on
        ... what about cases where no one-to-one relationship exists
       between a string-of-text label & the underlying semantic entity
     Take for example the text string: jaguar
                                          byte string: 4a 61 67 75 61 72

Monday, November 21, 11
... a rose is a rose is a rose
     company

                   Ltd.


        cars

                    XK series, in pro-
                    duction since 1996



                               E-Type (UK) or
                               XK-E (US) mftg
                               1961 to 1974



                                         etc.



           hardware & software

                    Atari video
                    game console



                               Macintosh
                               OS X 10.2


                                                                      John Giannandrea, CTO, Metaweb


Monday, November 21, 11
Imagine this keyword search and realize the ambiguity of the term “jaquar”

inspired by John Giannandrea, CTO, Metaweb ... from his presentation at PARC in
April, 2008
... a rose is a rose is a rose
      company                                       music

                       Ltd.                                   heavy metal band formed
                                                              in Bristol, England. Dec 1979


         cars
                                                                       Fender electric guitar,
                        XK series, in pro-                             introduced in 1962
                        duction since 1996

                                                                                 Philadelphia-based
                                                                                 singer/songwriter
                                   E-Type (UK) or                                Jaguar Wright
                                   XK-E (US) mftg
                                   1961 to 1974



                                             etc.           military
                                                                                  type 140 Jaguar
                                                                                  class fast attack
                                                                                  craft [torpedo],
             hardware & software                                                  Germany WWII

                        Atari video
                        game console                                                Anglo-French ground
                                                                                    attack aircraft



                                   Macintosh                                                          XF10F prototype swing-wing
                                   OS X 10.2                                                          fighter, early 1950s, Grumman


                                                                                                                       John Giannandrea, CTO, Metaweb


Monday, November 21, 11
 inspired by John Giannandrea, CTO, Metaweb
 ... from his presentation at PARC in April, 2008
... a rose is a rose is a rose
      company                                       music

                       Ltd.                                   heavy metal band formed
                                                              in Bristol, England. Dec 1979


         cars
                                                                       Fender electric guitar,            heros
                        XK series, in pro-                             introduced in 1962
                        duction since 1996
                                                                                                                       The Jaguar is a superhero
                                                                                                                       published by Archie Comics
                                                                                 Philadelphia-based
                                                                                 singer/songwriter
                                   E-Type (UK) or                                Jaguar Wright
                                   XK-E (US) mftg
                                   1961 to 1974                                                                              DC Comics' Impact series,
                                                                                                                             ... loosely based on Archie
                                                                                                                             Comics' character

                                             etc.           military
                                                                                  type 140 Jaguar
                                                                                  class fast attack                   pro footbal
                                                                                  craft [torpedo],
             hardware & software                                                  Germany WWII

                                                                                                                                      Jacksonville
                        Atari video
                        game console                                                Anglo-French ground
                                                                                    attack aircraft



                                   Macintosh                                                          XF10F prototype swing-wing
                                   OS X 10.2                                                          fighter, early 1950s, Grumman


                                                                                                                       John Giannandrea, CTO, Metaweb


Monday, November 21, 11
 inspired by John Giannandrea, CTO, Metaweb
 ... from his presentation at PARC in April, 2008
Prrrrr
     ... a rose is a rose is a rose
     company                                    music

                   Ltd.                                   heavy metal band formed
                                                          in Bristol, England. Dec 1979


        cars
                                                                   Fender electric guitar,            heros
                    XK series, in pro-                             introduced in 1962
                    duction since 1996
                                                                                                                    The Jaguar is a superhero
                                                                                                                    published by Archie Comics
                                                                             Philadelphia-based
                                                                             singer/songwriter
                               E-Type (UK) or                                Jaguar Wright
                               XK-E (US) mftg
                               1961 to 1974                                                                               DC Comics' Impact series,
                                                                                                                          ... loosely based on Archie
                                                                                                                          Comics' character

                                         etc.           military
                                                                              type 140 Jaguar
                                                                              class fast attack                    pro footbal
                                                                              craft [torpedo],
           hardware & software                                                Germany WWII

                                                                                                                                   Jacksonville
                    Atari video
                    game console                                                Anglo-French ground
                                                                                attack aircraft



                               Macintosh                                                           XF10F prototype swing-wing
                               OS X 10.2                                                           fighter, early 1950s, Grumman


                                                                                                                    John Giannandrea, CTO, Metaweb


Monday, November 21, 11
inspired by John Giannandrea, CTO, Metaweb
... from his presentation at PARC in April, 2008
the 2nd limiting factor
                  ... instance-based metadata




Monday, November 21, 11
the 2nd limiting factor
                  ... instance-based metadata
     Most of our metadata uses focuses
      on publication artifacts
           • identify responsibility for its creation
           • list topical headings




Monday, November 21, 11
the 2nd limiting factor
                  ... instance-based metadata
    Most of our metadata uses focuses
     on publication artifacts
          • identify responsibility for its creation
          • list topical headings

    For simple cases ... few worries
    • as with ambiguity, one-to-one relationships pose few problems
    • things work for authors with a few books in several editions




Monday, November 21, 11
the 2nd limiting factor
                  ... instance-based metadata
    Most of our metadata uses focuses
     on publication artifacts
          • identify responsibility for its creation
          • list topical headings

    For simple cases ... few worries
    • as with ambiguity, one-to-one relationships pose few problems
    • things work for authors with a few books in several editions


    But, as complexity increases,
                    precision & recall suffer
Monday, November 21, 11
Prolific authors ...               search:
                                              Shakespeare’s Hamlet
     Wading thru search results for authors       811 entries
     like Shakespeare shows clearly the
     effects that instance-based metadata
     has on precision & recall




Monday, November 21, 11
A Socrates (Stanford Libraries OPAC) keyword search for the terms shakespeare and
hamlet
Prolific authors ...               search:
                                              Shakespeare’s Hamlet
     Wading thru search results for authors       811 entries
     like Shakespeare shows clearly the
     effects that instance-based metadata
     has on precision & recall
          Unflagging patience marks the task of
          flipping back & forth between hundreds
          of brief and full records to sort thru
          the varied instances of a single entity




Monday, November 21, 11
Prolific authors ...               search:
                                              Shakespeare’s Hamlet
     Wading thru search results for authors       811 entries
     like Shakespeare shows clearly the
     effects that instance-based metadata
     has on precision & recall
          Unflagging patience marks the task of
          flipping back & forth between hundreds
          of brief and full records to sort thru
          the varied instances of a single entity, e.g.
           • critical editions based on primary sources
           • 18th & 19th century collections of the plays
           • social, historical and literary essays
           • histories & critiques of such writings
           • video and audio recordings of performances
           • reviews and indices of the same
           • treatments of stagecraft, costumes, music
           • life & works of notables associated with the
             plays (e.g., performers, directors)
           • other art forms inspired by the plays
Monday, November 21, 11
3
     3. Too far removed from W
                                       Web
                                   Wide
                                 World




Monday, November 21, 11
3
     3. Too far removed from W
                                                                              Web
                                                                          Wide
                                                                        World




    Together, our metadata & collections
    make up a big chunk of the “dark web”
          [ info resources that search-engine spiders can’t see ]




Monday, November 21, 11
3
     3. Too far removed from W
                                                                              Web
                                                                          Wide
                                                                        World




    Together, our metadata & collections
    make up a big chunk of the “dark web”
          [ info resources that search-engine spiders can’t see ]


    It’s clear that visibility on the web promotes
    dramatic increases in discovery and access




Monday, November 21, 11
3
     3. Too far removed from W
                                                                              Web
                                                                          Wide
                                                                        World




    Together, our metadata & collections
    make up a big chunk of the “dark web”
          [ info resources that search-engine spiders can’t see ]


    It’s clear that visibility on the web promotes
    dramatic increases in discovery and access
    • Library of Congress & Smithsonian images (FLICKR)




Monday, November 21, 11
3
     3. Too far removed from W
                                                                              Web
                                                                          Wide
                                                                        World




    Together, our metadata & collections
    make up a big chunk of the “dark web”
          [ info resources that search-engine spiders can’t see ]


    It’s clear that visibility on the web promotes
    dramatic increases in discovery and access
    • Library of Congress & Smithsonian images (FLICKR)
    • SULAIR’s Highwire Press ( > 2x increase via Google)




Monday, November 21, 11
3
     3. Too far removed from W
                                                                              Web
                                                                          Wide
                                                                        World




    Together, our metadata & collections
    make up a big chunk of the “dark web”
          [ info resources that search-engine spiders can’t see ]


    It’s clear that visibility on the web promotes
    dramatic increases in discovery and access
    • Library of Congress & Smithsonian images (FLICKR)
    • SULAIR’s Highwire Press ( > 2x increase via Google)

                          The state of affairs is well known ...

Monday, November 21, 11
Our	
  Working	
  Environment


                            54

Monday, November 21, 11
academy

publisher

                          pr
                             od
                             ce u


library
                                pr




                                                                  Scholars
                                   ov




                                                                  &	
  students
                                    e id




Monday, November 21, 11

Here is a schematic to suggest how our ecosystem works. It is more complex, of
course, but the basics are embodied here.
Once	
  upon	
  a	
  &me…the	
  Internet
        internet




Monday, November 21, 11
And here is the way the e-discovery and e-communication environment is developing.
First there was the Internet. Prophets such as Vannevar Bush, Ted Nelson, and Doug
Englebart showed us the way.
Then…the	
  World	
  Wide	
  Web


                                                   web
                                                          of
                                                               pages



        internet




Monday, November 21, 11
Thanks to another profit, Tim Berners-Lee, the Internet, a network of communicating
computers, became a web of pages of information. Scholarly journal publishers and some
librarians realized early on that there were functional advantages to scholarship and to
publishing in the web of pages. Yahoo, Google, and others realized that mining the web o
pages by words on those pages, could make the rapidly growing web of pages reveal mor
through indexing and cataloging the web. Indexing won out as we now know over catalog

The next thing is the subject of this talk. It is the web of data. It is the web of relationships
constructed and expressed so that both computers and humans can identify and understa
relationships in that web. The web of data lives with the web of pages and is carried on th
Internet, the global carrier.
web
Under	
  construc&on
                                                                              of
                                                                                   data
                                web
                                   of
                                      pages




        internet




Monday, November 21, 11
This web of data is the next big thing in discovering relevant information objects and the n
big thing in empowering individuals, communities, and industries in making better use of
information that they or others create. What distinguishes this web of data, this linked dat
environment, is the principal of identifying entities, virtual & real by statements of relations
and descriptions in machine readable form. More about this as we go along.
web
Under	
  construc&on
                                                                                                                     of
                                                                                                                              data
                                                web
                                                     of
                                                        pages




          internet




                                                   aka               Linked Data
Monday, November 21, 11

We	
  are	
  calling	
  this	
  next	
  phase	
  the	
  Linked	
  Data	
  phase,	
  because	
  it	
  is	
  enGrely	
  dependent	
  upon	
  
statements	
  of	
  relaGonships	
  and	
  descripGons	
  in	
  machine	
  readable	
  form,	
  but	
  this	
  phase	
  may	
  be	
  onl
a	
  pre-­‐cursor	
  to	
  another,	
  more	
  complex	
  and	
  more	
  difficult	
  web	
  world	
  to	
  engineer.	
  The	
  next	
  phase	
  i
the	
  SemanGc	
  Web,	
  which	
  in	
  theory	
  allows	
  the	
  machine	
  readable	
  relaGonships	
  and	
  descripGons	
  to
interoperate	
  to	
  saGsfy	
  a	
  person’s	
  requirements,	
  albeit	
  without	
  	
  constant	
  interacGon.	
  	
  In	
  short,	
  in	
  th
SemanGc	
  Web,	
  the	
  machines	
  will	
  understand	
  meaning	
  and	
  presumably	
  act	
  on	
  it.	
  	
  Scarey,	
  eh?
ConstrucGon	
  Tools


                                                                                                                               60

Monday, November 21, 11

How	
  to	
  we	
  work	
  to	
  alleviate	
  our	
  problems	
  as	
  informaGon	
  professionals,	
  librarians	
  and	
  
publishers?
Recipe	
  for	
  crea+ng	
  the	
  web	
  of	
  data
                          • identify people, places, things, events,
                           and other entities embedded in the
                           knowledge resources that a research
                           university consumes and produces




Monday, November 21, 11
Recipe	
  for	
  crea+ng	
  the	
  web	
  of	
  data

                          • identify people, places, things, events,
                           and other entities embedded in the
                           knowledge resources that a research
                           university consumes and produces
                          • tie those facts together with
                           named connections




Monday, November 21, 11
Recipe	
  for	
  crea+ng	
  the	
  web	
  of	
  data

                          • identify people, places, things, events,
                           and other entities embedded in the
                           knowledge resources that a research
                           university consumes and produces
                          • tie those facts together with
                           named connections
                          • publish the relationships as
                           crawl-able links on the web




Monday, November 21, 11
Recipe	
  for	
  crea+ng	
  the	
  web	
  of	
  data

                          • identify people, places, things, events,
                           and other entities embedded in the
                           knowledge resources that a research
                           university consumes and produces
                          • tie those facts together with
                           named connections
                          • publish the relationships as
                           crawl-able links on the web
                            Build/use apps supporting discovery
                           via the web of data


Monday, November 21, 11
65

Monday, November 21, 11

Here	
  is	
  a	
  pile	
  of	
  words	
  represenGng	
  all	
  the	
  words	
  on	
  the	
  web	
  that	
  most	
  search	
  engines	
  index	
  
constantly.	
  	
  Good	
  search	
  engines	
  today	
  can	
  do	
  a	
  lot	
  with	
  this	
  pile.	
  	
  BUT,	
  the	
  search	
  engines	
  
create	
  the	
  percepGon	
  of	
  relaGonships,	
  not	
  based	
  on	
  meaning,	
  but	
  on	
  other	
  factors,	
  such	
  as	
  
number	
  of	
  links	
  to	
  a	
  site	
  containing	
  the	
  words	
  of	
  interest	
  OR	
  the	
  traffic	
  to	
  a	
  site.
From	
  this	
  pile	
  of	
  words,	
  structure!                                                                                             66

Monday, November 21, 11

The	
  Linked	
  Data	
  approach	
  aSempts	
  to	
  structure	
  the	
  pile	
  in	
  anGcipaGon	
  of	
  the	
  need	
  for	
  
discovery.	
  	
  That	
  structure	
  is	
  based	
  on	
  meaning,	
  on	
  relaGonships.	
  	
  I	
  will	
  make	
  this	
  clearer	
  in	
  the	
  
next	
  slides.
67

Monday, November 21, 11

Here’s	
  a	
  graph	
  of	
  a	
  very	
  few	
  relaGonships	
  to	
  Yo	
  Yo	
  Ma,	
  the	
  great	
  ‘cellist.
Linked	
  Data	
  Web                                                                                           68

Monday, November 21, 11

Here’s	
  a	
  graph	
  of	
  relaGonships	
  to	
  Haggis,	
  just	
  a	
  fun	
  one	
  I	
  could	
  not	
  resist	
  throwing	
  in.	
  	
  Meaning	
  
is	
  provided	
  by	
  understanding	
  relaGonships.
RDF$triples$&$URIs$
              •  RDF$triples$=$subject$–$object$–$predicate$
                    –  A$way$to$describe$objects$or$even$ideas$on$the$web$
                    –  An$object$or$idea$might$have$many$RDF$triples$describing$it$
                    –  Objects$or$ideas$need$not$exist$on$the$web!$
              •  URIs$=$Uniform$Resource$IdenDfiers$
                    –  Allows$machine$interacDon$among$Web$objects$
                    –  Various$syntacDcal$schemes$&$protocols$used$to$construct$
                       URIs$
                    –  At$least$3$needed$to$support$an$RDF$(subject$–$objectJ$
                       predicate)$




                                                                                                                            69

Monday, November 21, 11

Geek	
  ingredients	
  to	
  the	
  construcGon	
  of	
  the	
  Linked	
  DAta	
  Web.	
  RDF	
  means	
  Resource	
  DescripGon	
  
Framework,	
  always	
  expressed	
  as	
  a	
  simple	
  sentence,	
  though	
  mulGple	
  such	
  statements	
  might	
  
aSach	
  to	
  a	
  single	
  enGty.	
  	
  In	
  fact,	
  we	
  need	
  mulGple	
  RDFs	
  in	
  this	
  scheme.
70

Monday, November 21, 11

A	
  graph	
  of	
  RDF	
  statements	
  and	
  URIs
The Linked Data Principles
 1. Use Resource Description Frameworks as
 names of things (people, places, times, objects,
 ideas...anything really)
 2. Use HTTP URIs so that people can look up
 those names
 3. When someone looks up a URI, provide useful
 RDF information
 4. Include RDF statements that link to other
 URIs so that they can discover related things


                                                                                                                                                         71

Monday, November 21, 11

The	
  really	
  great	
  aspect	
  of	
  RDFs	
  is	
  that	
  they	
  can	
  refer	
  to	
  ideas,	
  not	
  just	
  to	
  physical	
  or	
  virtual	
  
enGGes.	
  	
  Any	
  kind	
  of	
  idea	
  could	
  be	
  treated.
Library'Metadata'
            •    Library'metadata'standards'closed'
            •                              '
                 “Passive”'metadata,'searchable,'but…'
            •    In'Silos ''
            •    Readable,'but'not'ac=onable'
            •    Search'results'refinable,'but'final'




                                                                                                   72

Monday, November 21, 11

These	
  are	
  some	
  of	
  the	
  edges	
  of	
  the	
  problem	
  of	
  library	
  metadata.
Library'Metadata'                                     Seman/c'Web'Metadata'
               Library'Metadata'                                     Seman/c'Web'Metadata'
             •  Library'metadata'standards'                        •  Open'
               •  Library'metadata'standards'                        •  Open'
                 closed'
                  closed'
             •  “Passive”'metadata,'                               •  Dynamic,'Contextualized'
               •  “Passive”'metadata,'                               •  Dynamic,'Contextualized'
                 searchable,'but…'
                  searchable,'but…'
             •  In'Silos ''                                        •  In'the'wild'
               •  In'Silos ''                                        •  In'the'wild'
             •  Readable,'but'not'                                 •  Interac<ve,'Responsive'
               •  Readable,'but'not'                                 •  Interac<ve,'Responsive'
                 ac<onable'
                  ac<onable'
             •  Search'results'refinable,'but'
               •  Search'results'refinable,'but'                    •  Leading'to'other'queries'&'
                 final'                                               •  Leading'to'other'queries'&'
                  final'                                                views'
                                                                        views'




                                                                                                                                   73

Monday, November 21, 11

And	
  here	
  is	
  the	
  comparison	
  between	
  the	
  library	
  metadata	
  scene	
  now	
  and	
  the	
  one	
  we	
  advocate	
  
for	
  the	
  Linked	
  Data/SemanGc	
  Web.	
  	
  Library	
  metadata	
  in	
  the	
  Linked	
  Data	
  Web	
  should	
  be	
  freely	
  
available,	
  constantly	
  updated,	
  o[en	
  reconciled	
  with	
  RDF	
  triple	
  statements	
  from	
  non-­‐library	
  
sources.	
  	
  Library	
  Linked	
  Data	
  should	
  be	
  enGrely	
  open	
  on	
  the	
  web.
Make	
  Library	
  	
  bibliographic	
  facts
in	
  to	
  RDFs	
  &	
  URIs;
Release	
  them	
  into	
  the	
  wild.
Make	
  Library	
  Linked	
  Data	
  OPEN.


                                                                                                                                  74

Monday, November 21, 11

I	
  should	
  add	
  that	
  accounGng	
  for	
  physical	
  objects	
  in	
  our	
  collecGons,	
  locaGng	
  them,	
  making	
  our	
  
collecGons	
  auditable,	
  and	
  managing	
  our	
  collecGons	
  seems	
  to	
  be	
  possible	
  using	
  Linked	
  Data	
  too,	
  
at	
  least	
  in	
  principal.
What	
  about	
  Publishers?



                                                  75

Monday, November 21, 11
Publishers*&*Socie/es**
                              making*use*of*Linked*Data*
     •  Aggregate*content*in*their*own*realms*&*beyond*
     •  Aggregate*informa/on*about*
            –  Conferences*
            –  Career*building*&*employment*opportuni/es*
            –  Communi/es*in*collabora/on*
            –  Commercial*&*other*services*suppor/ng*research*with*
               specimens,*source*material,*processing,*trials*
            –  Produc/ve*rela/onships*with*others*
     •  Provide*ac/onable,*constantly*updated*links*in*
        support*of*scholars,*teachers,*and*learners*
     •  Provide*compelling*services*tying*users*to*them*
                                                                                                                                      76

Monday, November 21, 11

Libraries	
  too	
  can	
  use	
  Linked	
  Data	
  to	
  reveal	
  and	
  adverGse	
  compelling	
  services	
  offered	
  to	
  their	
  
clients.
Seman4c	
  Web	
  adopters                                                        77

Monday, November 21, 11

Here	
  are	
  some	
  of	
  the	
  big	
  players	
  in	
  the	
  Linked	
  Data	
  /	
  SemanGc	
  Web	
  world.	
  	
  The	
  BriGsh	
  Library	
  
has	
  released	
  RDFs/URIs	
  for	
  the	
  enGre	
  BriGsh	
  NaGonal	
  Bibliography.	
  	
  The	
  Library	
  of	
  Congress	
  has	
  
released	
  the	
  same	
  for	
  LCSH	
  &	
  Name	
  Authority	
  Files.	
  	
  LCSH	
  includes	
  links	
  to	
  AGROVOC,	
  RAMEAU,	
  
DNB,	
  GLIN	
  Subject	
  Thesaurus,	
  and	
  the	
  NaGonal	
  Agriculture	
  Library's	
  Subject	
  Index.	
  	
  Every	
  
Personal	
  and	
  Corporate	
  entry	
  in	
  LC/NAF	
  links	
  to	
  VIAF,	
  the	
  Virtual	
  InternaGonal	
  Authority	
  File	
  
based	
  at	
  OCLC.	
  	
  	
  	
  The	
  N	
  Y	
  Times	
  18	
  months	
  ago	
  made	
  all	
  500,000	
  (and	
  growing)	
  of	
  its	
  index	
  
terms	
  available	
  in	
  the	
  wild	
  as	
  RDFs	
  and	
  URIs.
78

Monday, November 21, 11

For	
  publishers	
  and	
  libraries...though	
  we	
  should	
  not	
  neglect	
  services.
...if	
  users	
  can	
  find	
  it	
  in	
  their	
  own	
  context
                                                                            79

Monday, November 21, 11
Context



                                       Users                  Content




Users	
  =	
  readers,	
  authors,	
  teachers,	
  students             80

Monday, November 21, 11
Context



                                         Users                         Content




Publishers	
  must	
  make	
  content	
  VISIBLE                                                                                 81

Monday, November 21, 11

I	
  am	
  using	
  the	
  imperaGve	
  here,	
  because	
  invisible	
  published	
  content	
  means	
  invisible	
  benefit	
  to	
  
the	
  author	
  and/or	
  the	
  publisher.
82

Monday, November 21, 11

Here	
  is	
  a	
  recent	
  PLoS	
  arGcle	
  from	
  PLoS	
  Neglected	
  Tropical	
  Diseases.	
  	
  
83

Monday, November 21, 11

And	
  here	
  is	
  the	
  semanGcally	
  enhanced	
  version	
  of	
  this	
  arGcle,	
  enhancements	
  provided	
  by	
  David	
  
ShoSen	
  et	
  al.	
  in	
  the	
  form	
  of	
  links	
  to	
  further	
  informaGon,	
  interacGve	
  figures,	
  re-­‐orderable	
  
reference	
  list,	
  citaGons	
  in	
  context	
  and	
  tag	
  trees.	
  These	
  enhancements	
  took	
  10	
  man	
  weeks	
  in	
  
2009!	
  	
  However,	
  with	
  the	
  growing	
  ecology	
  of	
  linked	
  data,	
  much	
  of	
  this	
  could	
  be	
  accomplished	
  
by	
  auto-­‐tagging	
  and	
  algorithmic	
  construcGon	
  of	
  the	
  basic	
  RDFs	
  &	
  URIs	
  for	
  the	
  unique	
  arGcle.	
  	
  
Microdata	
  submiSed	
  by	
  some	
  publishers	
  and	
  their	
  supporGng	
  services	
  to	
  schema.org	
  lead	
  to	
  
these	
  exciGng	
  possibiliGes.
aggrega+on
                                                                                                                                   84

Monday, November 21, 11

AggregaGon	
  counts,	
  but	
  think	
  how	
  much	
  more	
  we	
  would	
  get	
  if	
  we	
  could	
  aggregate	
  from	
  
libraries,	
  publishers,	
  and	
  the	
  wild	
  and	
  weird	
  variety	
  of	
  sources	
  on	
  the	
  web?
85

Monday, November 21, 11
Disambigua4on




                                                                                                                                  86

Monday, November 21, 11

RDFs	
  and	
  URIs	
  can	
  operate	
  in	
  many	
  languages	
  and	
  relaGonships	
  can	
  be	
  expressed	
  across	
  
languages,	
  a	
  potenGal	
  big	
  benefit	
  to	
  research	
  and	
  collaboraGon	
  in	
  research.
Web	
  of	
  Data	
  Progress



                                                      87

Monday, November 21, 11
2007
                                                                                                                                                88

Monday, November 21, 11

FOAF	
  =	
  Friend	
  of	
  a	
  Friend.	
  	
  Hundreds	
  of	
  millions	
  of	
  RDFs/URIs.	
  	
  Fortunately	
  they	
  do	
  not	
  take	
  
much	
  space	
  in	
  memory!
89

Monday, November 21, 11

This	
  is	
  the	
  2011	
  graph	
  of	
  enGGes	
  supplying	
  RDFs	
  and	
  URIs.	
  	
  Now	
  the	
  populaGon	
  is	
  in	
  the	
  
hundreds	
  of	
  billions,	
  heading	
  to	
  trillions.
2011
                                                            90
                          hSp://inkdroid.org/lod-­‐graph/
Monday, November 21, 11
Encouragement
                          Examples


                                          91

Monday, November 21, 11
Linked'Open'Data'Value'Proposi4on'
     •  Linked'open'data'(LOD)'puts'informa4on'where'people'are'looking'for'it'–'on'
        the'Web;''
     •  LOD'can'expands'discoverability'of'our'content;''
     •  LOD'opens'opportuni4es'for'crea4ve'innova4on'in'digital'scholarship'and'
        par4cipa4on;''
     •  LOD'allows'for'open'con4nuous'improvement'of'data;''
     •  LOD'creates'a'store'of'machineDac4onable'data'on'which'improved'services'can'
        be'built;''
     •  Library'linked'open'data'might'facilitate'the'break'down'the'tyranny'of'domain'
        silos;''
     •  LOD'can'provide'direct'access'to'data'in'ways'that'are'not'currently'possible;''
     •  LOD'provides'unan4cipated'benefits'that'will'emerge'later'as'the'stores'of'LOD'
        expand'exponen4ally.''
     '
     A"product"of"the"Stanford/CLIR"Linked"Data"Workshop"June"2011."




                                                                                                                               92

Monday, November 21, 11

25	
  ParGcipants	
  from	
  the	
  BriGsh	
  Library,	
  the	
  Bibliothèque	
  naGonale	
  de	
  France,	
  the	
  Deutsch	
  
NaGonalbibliothek,	
  the	
  Royal	
  Library	
  of	
  Denmark,	
  Aalto	
  University	
  in	
  Finland,	
  the	
  Library	
  of	
  
Congress,	
  the	
  Bibliotheca	
  Alexandrina,	
  the	
  NaGonal	
  InsGtute	
  of	
  InformaGcs	
  of	
  Japan,	
  Google,	
  
Seme4,	
  Emory,	
  University	
  of	
  Virginia,	
  University	
  of	
  Michigan,	
  California	
  Digital	
  Library,	
  
Knowledge	
  MoGfs,	
  CLIR,	
  and	
  Stanford.	
  	
  
Google	
  using	
  Stanford	
  bib	
  facts	
  +	
  web	
  resources                                                                     93

Monday, November 21, 11

This	
  is	
  a	
  movie	
  of	
  a	
  live	
  interacGon	
  with	
  Freebase	
  using	
  bibliographic	
  facts	
  from	
  Stanford,	
  and	
  
linked	
  informaGon	
  resources	
  from	
  the	
  web.	
  	
  It	
  shows	
  in	
  a	
  limited	
  way	
  the	
  potenGal	
  for	
  discovery	
  
and	
  retrieval	
  in	
  the	
  Linked	
  Data	
  Web.	
  	
  
BnF	
  using	
  data	
  only	
  from	
  its	
  catalogs	
  &	
  Gallica
                                                                                                                                                94

Monday, November 21, 11

This	
  is	
  another	
  movie	
  of	
  the	
  Linked	
  Data	
  prototype	
  based	
  enGrely	
  on	
  bibliographic	
  facts	
  from	
  
the	
  BnF	
  catalogs	
  and	
  digital	
  texts	
  in	
  Gallica.	
  	
  There	
  are	
  no	
  other	
  web	
  resources	
  drawn	
  into	
  this	
  
prototype...yet.
95

Monday, November 21, 11
A"Bibliographic"Framework"for"the"
             Digital"Age"(October"31,"2011)!
  •  “The!new!bibliographic!framework!project!will!be!focused!on!
     the!Web!environment,!Linked!Data!principles!and!
     mechanisms,!and!the!Resource!Descrip?on!Framework!(RDF)!
     as!a!basic!data!model.!!The!protocols!and!ideas!behind!
     Linked!Data!are!natural!exchange!mechanisms!for!the!Web!
     that!have!found!substan?al!resonance!even!beyond!the!
     cultural!heritage!sector.!!Likewise,!it!is!expected!that!the!use!
     of!RDF!and!other!W3C!(World!Wide!Web!Consor?um)!
     developments!will!enable!the!integra?on!of!library!data!and!
     other!cultural!heritage!data!on!the!Web!for!more!expansive!
     user!access!to!informa?on.”!
  Deanna%Marcum,%Associate%Librarian%of%Congress,%introducing%a%
  transi7on%from%MARC.%

                                                                     96

Monday, November 21, 11
Value	
  Proposi-on	
  for	
  LAM’s
    We	
  in	
  the	
  cultural	
  heritage	
  and	
  knowledge	
  management	
  institutions	
  are	
  discovering	
  
    better	
  ways	
  of	
  publishing,	
  sharing,	
  and	
  using	
  information	
  by	
  linking	
  data	
  and	
  
    helping	
  others	
  do	
  the	
  same.	
  	
  Through	
  this	
  work,	
  we	
  have	
  come	
  to	
  value	
  and	
  to	
  
    promote	
  the	
  following	
  practices:
       1.	
     Publishing	
  data	
  on	
  the	
  web	
  for	
  discovery	
  and	
  use,	
  rather	
  than	
  preserving	
  it	
  in	
  
                dark,	
  more	
  or	
  less	
  unreachable	
  archives	
  that	
  are	
  often	
  proprietary	
  and	
  pro?it	
  
                driven;	
  	
  
       2.	
     Continuously	
  improving	
  data	
  and	
  Linked	
  Data,	
  rather	
  than	
  waiting	
  to	
  publish	
  
                “perfect”	
  data;
       3.	
     Structuring	
  data	
  semantically,	
  rather	
  than	
  preparing	
  ?lat,	
  unstructured	
  data;
       4.	
     Collaborating,	
  rather	
  than	
  working	
  alone;
       5.	
     Adopting	
  Web	
  standards,	
  rather	
  than	
  domain	
  speci?ic	
  ones;
       6.	
     Using	
  open,	
  commonly	
  understood	
  licenses,	
  rather	
  than	
  closed	
  and/or	
  local	
  
                licenses.


       from	
  the	
  Stanford/CLIR	
  Workshop	
  on	
  Linked	
  Data,	
  June	
  2011
                                                                                                                                      97

Monday, November 21, 11

In	
  each	
  couplet,	
  we	
  emphasize	
  the	
  second	
  half,	
  a[er	
  “rather	
  than”,	
  admitng	
  that	
  someGmes	
  
the	
  first	
  half	
  of	
  the	
  couplet	
  has	
  to	
  be	
  operaGve.
DARPA	
  Internet
                                                                 98

Monday, November 21, 11

This	
  is	
  where	
  we	
  started	
  2.5	
  decades	
  ago.
World	
  Wide	
  Web                                                                      99

Monday, November 21, 11

Thanks	
  to	
  Tim	
  Berners-­‐Lee	
  and	
  many	
  others,	
  we	
  advanced	
  in	
  this	
  environment	
  from	
  the	
  early	
  
1990s	
  unGl	
  today.
SOCIAL	
  WEB

                                                                                                                                            100

Monday, November 21, 11

We	
  cannot	
  ignore	
  the	
  social	
  web	
  that	
  exists	
  in	
  the	
  current	
  WWW,	
  but	
  think	
  how	
  much	
  more,	
  
some	
  of	
  it	
  scarey,	
  could	
  be	
  done	
  in	
  the	
  Linked	
  Data	
  Web	
  with	
  the	
  behaviors	
  of	
  the	
  Social	
  Web.
Linked	
  Data	
  Web                                                                                101

Monday, November 21, 11

Just	
  that	
  funny	
  reminder	
  of	
  the	
  fundamental	
  nature	
  of	
  the	
  Linked	
  Data	
  Web:	
  expressing	
  
machine	
  acGonable	
  relaGonships.
Seman+c	
  Web                                                                                                      102

Monday, November 21, 11

And	
  in	
  the	
  next	
  web,	
  the	
  SemanGc	
  Web,	
  who	
  knows	
  what	
  may	
  be	
  possible.	
  	
  
Ubiquitous	
  compu+ng




                                                                                                                                     103

Monday, November 21, 11

To	
  the	
  progression	
  of	
  network	
  types,	
  we	
  need	
  to	
  add	
  a	
  couple	
  of	
  enormously	
  important	
  
environmental	
  factors.	
  	
  Ubiquitous	
  compuGng	
  is	
  a	
  very	
  important	
  one.	
  	
  Having	
  lots	
  of	
  
computers	
  on	
  the	
  net	
  makes	
  the	
  possibility	
  of	
  an	
  open	
  global	
  linked	
  data	
  web	
  very	
  strong.
Mobility




                                                                                                                                      104

Monday, November 21, 11

And	
  our	
  ability	
  to	
  communicate	
  by	
  voice	
  (how	
  about	
  that	
  Siri?)	
  and	
  by	
  bits/bytes	
  from	
  
everywhere,	
  is,	
  perhaps,	
  just	
  another	
  aspect	
  of	
  ubiquitous	
  compuGng.
Ubiquitous	
  Compu4ng


                                                                                             Linked	
  Web
                                                                                                                                              M
                                                                                                                                              o
                                                                                                                                              b
                                                                                                                                              i
                                                                                                                                              l
                                                                                                                                              e




                                             Web

                                                                    Social	
  Web
    Internet




                                                                                                                                              105

Monday, November 21, 11

The	
  black	
  box	
  in	
  the	
  upper	
  right	
  corner	
  is	
  the	
  SemanGc	
  Web,	
  a	
  level	
  of	
  sophisGcaGon	
  yet	
  to	
  be	
  
achieved.	
  	
  The	
  linked	
  data	
  web	
  is	
  at	
  hand,	
  though.
Will	
  Librarians	
  and	
  Publishers	
  join	
  the	
  development	
  of	
  the	
  Linked	
  Open	
  Data	
  web?	
  	
  I	
  certainly	
  
think	
  we	
  should.
Monday, November 21, 11

NO MORE SILOS ARE NEEDED or wanted.
W3C Library Linked Data Incubator
                          Group
                          http://www.w3.org/2005/Incubator/lld/
                   A Bibliographic Framework
                   Initiative General Plan for the
                   Digital Age (October 31, 2011)
                   http://www.loc.gov/marc/
                   transition/news/
                   framework-103111.html

                    Linked	
  Data	
  Survey	
  &	
  Workshop	
  June	
  2011
                    hSp://www.clir.org/pubs/archives/linked-­‐data-­‐
                    survey/                                               107

Monday, November 21, 11
108

Monday, November 21, 11
109

Monday, November 21, 11
110

Monday, November 21, 11
111

Monday, November 21, 11
112

Monday, November 21, 11
113

Monday, November 21, 11

Mais conteúdo relacionado

Destaque

The 7 success practices of teachers
The 7 success practices of teachersThe 7 success practices of teachers
The 7 success practices of teachersRudy Esposo II
 
Row2Recovery Dinner
Row2Recovery DinnerRow2Recovery Dinner
Row2Recovery Dinnerjamesgrant50
 
Social Media y Marketing Digital para hoteles -Caso real: hotel Talaso Atlántico
Social Media y Marketing Digital para hoteles -Caso real: hotel Talaso AtlánticoSocial Media y Marketing Digital para hoteles -Caso real: hotel Talaso Atlántico
Social Media y Marketing Digital para hoteles -Caso real: hotel Talaso AtlánticoCelso G.
 
TIC MAGAZINE Edition N°07
TIC MAGAZINE Edition N°07TIC MAGAZINE Edition N°07
TIC MAGAZINE Edition N°07TIC MAGAZINE
 

Destaque (7)

The 7 success practices of teachers
The 7 success practices of teachersThe 7 success practices of teachers
The 7 success practices of teachers
 
Row2Recovery Dinner
Row2Recovery DinnerRow2Recovery Dinner
Row2Recovery Dinner
 
It
ItIt
It
 
Operetajsistemas
OperetajsistemasOperetajsistemas
Operetajsistemas
 
Social Media y Marketing Digital para hoteles -Caso real: hotel Talaso Atlántico
Social Media y Marketing Digital para hoteles -Caso real: hotel Talaso AtlánticoSocial Media y Marketing Digital para hoteles -Caso real: hotel Talaso Atlántico
Social Media y Marketing Digital para hoteles -Caso real: hotel Talaso Atlántico
 
Ergonomika
ErgonomikaErgonomika
Ergonomika
 
TIC MAGAZINE Edition N°07
TIC MAGAZINE Edition N°07TIC MAGAZINE Edition N°07
TIC MAGAZINE Edition N°07
 

Semelhante a The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

Ficod 2011 pdf (with notes)
Ficod 2011 pdf (with notes)Ficod 2011 pdf (with notes)
Ficod 2011 pdf (with notes)Tim O'Reilly
 
If it is broke do fix it
If it is broke do fix itIf it is broke do fix it
If it is broke do fix itAmy Warner
 
MiningTheSocialWeb.Ch2.Microformat
MiningTheSocialWeb.Ch2.MicroformatMiningTheSocialWeb.Ch2.Microformat
MiningTheSocialWeb.Ch2.MicroformatHyeonSeok Choi
 
Linked data and Muruca @ COST a32 - Munich
Linked data and Muruca @ COST a32 - MunichLinked data and Muruca @ COST a32 - Munich
Linked data and Muruca @ COST a32 - MunichChristian Morbidoni
 
Linked Data Challenge and Opportunity
Linked Data Challenge and OpportunityLinked Data Challenge and Opportunity
Linked Data Challenge and OpportunityRichard Wallis
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa
 
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...Alexandre Porcelli
 
Los Angeles R users group - Nov 17 2010 - Part 2
Los Angeles R users group - Nov 17 2010 - Part 2Los Angeles R users group - Nov 17 2010 - Part 2
Los Angeles R users group - Nov 17 2010 - Part 2rusersla
 

Semelhante a The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University (10)

Ficod 2011 pdf (with notes)
Ficod 2011 pdf (with notes)Ficod 2011 pdf (with notes)
Ficod 2011 pdf (with notes)
 
CCLSD and an Open Source ILS
CCLSD and an Open Source ILSCCLSD and an Open Source ILS
CCLSD and an Open Source ILS
 
If it is broke do fix it
If it is broke do fix itIf it is broke do fix it
If it is broke do fix it
 
MiningTheSocialWeb.Ch2.Microformat
MiningTheSocialWeb.Ch2.MicroformatMiningTheSocialWeb.Ch2.Microformat
MiningTheSocialWeb.Ch2.Microformat
 
Linked data and Muruca @ COST a32 - Munich
Linked data and Muruca @ COST a32 - MunichLinked data and Muruca @ COST a32 - Munich
Linked data and Muruca @ COST a32 - Munich
 
Linked Data Challenge and Opportunity
Linked Data Challenge and OpportunityLinked Data Challenge and Opportunity
Linked Data Challenge and Opportunity
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
 
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
 
Ted Talk
Ted TalkTed Talk
Ted Talk
 
Los Angeles R users group - Nov 17 2010 - Part 2
Los Angeles R users group - Nov 17 2010 - Part 2Los Angeles R users group - Nov 17 2010 - Part 2
Los Angeles R users group - Nov 17 2010 - Part 2
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

  • 1. Seman&c  Web  for   Libraries  &  Publishers Charleston  Conference   111103 Monday, November 21, 11 so, what’s the problem?
  • 2. The  Problem  Set 2 Monday, November 21, 11
  • 4. Monday, November 21, 11 More silos
  • 5. Monday, November 21, 11 Lots of different silos
  • 6. Monday, November 21, 11 Blue silos
  • 7. Monday, November 21, 11 Old Silos We in the library and publishing trades force readers, some of them who are authors as well, to search iteratively for information they want or need or thinks might exist, in many different silos, using many different search engines, forms, and vocabularies. We do not make it easy for them to discover what is locally available, what is more or less easy to get, or everything that might be available. No wonder the young and foolish depend upon and believe in Google’s searches. Google is quick...and in terms of search terms of relevance, very, very dirty.
  • 8. Monday, November 21, 11 We give them better interfaces, ones that permit refinement of results, to our holdings at the title level, BUT...
  • 9. Monday, November 21, 11 Simulateneously, we show them many other tools, each excellent in some ways, to continue their exploration of the literature. No single tool is comprehensive. We do not refer our clients to the Web, at least not on our own web sites! // Our OPACs refer to our holdings. While Indices and abstracts refer our readers to articles in journals to which we may have licensed. SFX and similar provide readers with links to titles revealed to which we have subscribed. Neither our opacs nor the secondary databases directly to more than a tiny, percentage of the vast collection of pages that is the World Wide Web. The Web, of course, refers in fragmentary fashion to information resources we might, I emphasize, MIGHT have on hand for our readers.
  • 10. Monday, November 21, 11 And the results of using other, often very good, discovery tools differ in relevance ranking, format, and options than the ones we provide for our OPAcs, thus adding confusion.
  • 11. Monday, November 21, 11 some of us provide our readers with lots of databases to search. Too many really, for all but a few are not forensic-level scholars.
  • 12. Monday, November 21, 11 Selecting a licensed data base is an art in itself! Once again notice that we rarely offer a web search engine as an option, and for good reasons. Nevertheless, the discoverable relevant information resources on the web apparently are not part of our repertory.
  • 13. !!! Monday, November 21, 11 We have not conspired to make the search for relevant information objects difficult. We just have not yet had the tools, the methods, the vision, and yes, the gumption to try something new.
  • 14. ATLAS at LHC -- 150*106 sensors Ntl Cntr for Biotech Info NSF CyberInfrastructure quake engineering simulation Monday, November 21, 11 Here’s a teensy slice of the information and communication environment in which our faculty and students find themselves. And it gets more complex every day. Alas the larger the number of websites indexed by Bing or Google or whatever search engine du jour, the more likely it is that the relevance of the returns will be less pointed and precisely matched to what the searcher hoped to find.
  • 15. Monday, November 21, 11 Too many silos. Here’s the biggest of the lot...
  • 17. One size fits all??? 17 Monday, November 21, 11 Does  one  size  fit  all?
  • 18. 18 Monday, November 21, 11 Not  quite.    Even  Google  has  silos  and  uses,  as  do  others,  clever  interfaces  to  hide  the  fact  of  the   silos.
  • 19. Monday, November 21, 11 Given all these silos and search engines, our users, our authors, and readers, and teachers, and students, people on the street, our nations...need us to find a better way. Facts about the information objects we have acquired or leased, facts about books, articles, films, and so forth that we have published need to be found in the wild, on the web. Ideally, we, librarians and publishers will get the facts about what we have and what we are making public, for fun or profit, discoverable on the Web.
  • 20. Discovery & Access ... the problems Monday, November 21, 11 Let’s dwell on the problems briefly...
  • 21. 1. Too many stovepipe systems 2. Too little precision with inadequate recall 3 3. Too far removed from W Web Wide World Monday, November 21, 11
  • 22. 1. Too many stovepipe systems Monday, November 21, 11
  • 23. 1. Too many stovepipe systems The landscape of discovery & access services is a shambles Monday, November 21, 11
  • 24. 1. Too many stovepipe systems The landscape of discovery & access services is a shambles It can’t be mapped in any logical way Monday, November 21, 11
  • 25. 1. Too many stovepipe systems The landscape of discovery & access services is a shambles It can’t be mapped in any logical way • not by us (the supposed information pros) • not by the faculty & students who must navigate the chaos Monday, November 21, 11
  • 26. 1. Too many stovepipe systems The landscape of discovery & access services is a shambles It can’t be mapped in any logical way • not by us (the supposed information pros) • not by the faculty & students who must navigate the chaos This state of affairs shouldn’t be a surprise Monday, November 21, 11
  • 27. 2. Too little precision with inadequate recall Monday, November 21, 11
  • 28. 2. Too little precision with inadequate recall Some of the problem ... too many stovepipe systems Monday, November 21, 11
  • 29. 2. Too little precision with inadequate recall Some of the problem ... too many stovepipe systems • dumbing-down effects of federation often hinder explicit searches • each interface has its own search-refinement tricks • numerous, overlapping discovery paths hamper full recall Monday, November 21, 11
  • 30. 2. Too little precision with inadequate recall Some of the problem ... too many systems • dumbing down effects of federation often hinder explicit searches • each interface has its own search-refinement tricks • numerous, overlapping discovery paths hamper full recall Most of the problem ... limitations in the design & execution of infrastructure that supports discovery & access Monday, November 21, 11
  • 31. the 1st limiting factor ... ambiguity Monday, November 21, 11
  • 32. the 1st limiting factor ... ambiguity Most of our metadata uses a string of bytes to label a semantic entity [people, places, things, events, ...] Monday, November 21, 11
  • 33. the 1st limiting factor ... ambiguity Most of our metadata uses a string of bytes to label a semantic entity [person, place, thing, event, ...] • discovery based on matching text labels • not on the gist of semantic entities Monday, November 21, 11
  • 34. the 1st limiting factor ... ambiguity Most of our metadata uses a string of bytes to label a semantic entity [person, place, thing, event, ...] • discovery based on matching text labels • not on the gist of semantic entities For libraries, the fix is authorities • authoritative forms of strings (names, organization, titles, places, events, topics, etc.) Monday, November 21, 11
  • 35. the 1st limiting factor ... ambiguity Most of our metadata uses a string of bytes to label a semantic entity [person, place, thing, event, ...] • discovery based on matching text labels • not on the gist of semantic entities For libraries, the fix is authorities • authoritative forms of strings (names, organization, titles, places, events, topics, etc.) work to improve precision and recall hold on ... what about cases where no one-to-one relationship exists between a string-of-text label & the underlying semantic entity Monday, November 21, 11
  • 36. the 1st limiting factor ... ambiguity Most of our metadata uses a string of bytes to label a semantic entity [person, place, thing, event, ...] • discovery based on matching text labels • not on the gist of semantic entities For libraries, the fix is authorities • authoritative forms of strings (names, organization, titles, places, events, topics, etc.) work to improve precision and recall hold on ... what about cases where no one-to-one relationship exists between a string-of-text label & the underlying semantic entity Take for example the text string: jaguar byte string: 4a 61 67 75 61 72 Monday, November 21, 11
  • 37. ... a rose is a rose is a rose company Ltd. cars XK series, in pro- duction since 1996 E-Type (UK) or XK-E (US) mftg 1961 to 1974 etc. hardware & software Atari video game console Macintosh OS X 10.2 John Giannandrea, CTO, Metaweb Monday, November 21, 11 Imagine this keyword search and realize the ambiguity of the term “jaquar” inspired by John Giannandrea, CTO, Metaweb ... from his presentation at PARC in April, 2008
  • 38. ... a rose is a rose is a rose company music Ltd. heavy metal band formed in Bristol, England. Dec 1979 cars Fender electric guitar, XK series, in pro- introduced in 1962 duction since 1996 Philadelphia-based singer/songwriter E-Type (UK) or Jaguar Wright XK-E (US) mftg 1961 to 1974 etc. military type 140 Jaguar class fast attack craft [torpedo], hardware & software Germany WWII Atari video game console Anglo-French ground attack aircraft Macintosh XF10F prototype swing-wing OS X 10.2 fighter, early 1950s, Grumman John Giannandrea, CTO, Metaweb Monday, November 21, 11 inspired by John Giannandrea, CTO, Metaweb ... from his presentation at PARC in April, 2008
  • 39. ... a rose is a rose is a rose company music Ltd. heavy metal band formed in Bristol, England. Dec 1979 cars Fender electric guitar, heros XK series, in pro- introduced in 1962 duction since 1996 The Jaguar is a superhero published by Archie Comics Philadelphia-based singer/songwriter E-Type (UK) or Jaguar Wright XK-E (US) mftg 1961 to 1974 DC Comics' Impact series, ... loosely based on Archie Comics' character etc. military type 140 Jaguar class fast attack pro footbal craft [torpedo], hardware & software Germany WWII Jacksonville Atari video game console Anglo-French ground attack aircraft Macintosh XF10F prototype swing-wing OS X 10.2 fighter, early 1950s, Grumman John Giannandrea, CTO, Metaweb Monday, November 21, 11 inspired by John Giannandrea, CTO, Metaweb ... from his presentation at PARC in April, 2008
  • 40. Prrrrr ... a rose is a rose is a rose company music Ltd. heavy metal band formed in Bristol, England. Dec 1979 cars Fender electric guitar, heros XK series, in pro- introduced in 1962 duction since 1996 The Jaguar is a superhero published by Archie Comics Philadelphia-based singer/songwriter E-Type (UK) or Jaguar Wright XK-E (US) mftg 1961 to 1974 DC Comics' Impact series, ... loosely based on Archie Comics' character etc. military type 140 Jaguar class fast attack pro footbal craft [torpedo], hardware & software Germany WWII Jacksonville Atari video game console Anglo-French ground attack aircraft Macintosh XF10F prototype swing-wing OS X 10.2 fighter, early 1950s, Grumman John Giannandrea, CTO, Metaweb Monday, November 21, 11 inspired by John Giannandrea, CTO, Metaweb ... from his presentation at PARC in April, 2008
  • 41. the 2nd limiting factor ... instance-based metadata Monday, November 21, 11
  • 42. the 2nd limiting factor ... instance-based metadata Most of our metadata uses focuses on publication artifacts • identify responsibility for its creation • list topical headings Monday, November 21, 11
  • 43. the 2nd limiting factor ... instance-based metadata Most of our metadata uses focuses on publication artifacts • identify responsibility for its creation • list topical headings For simple cases ... few worries • as with ambiguity, one-to-one relationships pose few problems • things work for authors with a few books in several editions Monday, November 21, 11
  • 44. the 2nd limiting factor ... instance-based metadata Most of our metadata uses focuses on publication artifacts • identify responsibility for its creation • list topical headings For simple cases ... few worries • as with ambiguity, one-to-one relationships pose few problems • things work for authors with a few books in several editions But, as complexity increases, precision & recall suffer Monday, November 21, 11
  • 45. Prolific authors ... search: Shakespeare’s Hamlet Wading thru search results for authors 811 entries like Shakespeare shows clearly the effects that instance-based metadata has on precision & recall Monday, November 21, 11 A Socrates (Stanford Libraries OPAC) keyword search for the terms shakespeare and hamlet
  • 46. Prolific authors ... search: Shakespeare’s Hamlet Wading thru search results for authors 811 entries like Shakespeare shows clearly the effects that instance-based metadata has on precision & recall Unflagging patience marks the task of flipping back & forth between hundreds of brief and full records to sort thru the varied instances of a single entity Monday, November 21, 11
  • 47. Prolific authors ... search: Shakespeare’s Hamlet Wading thru search results for authors 811 entries like Shakespeare shows clearly the effects that instance-based metadata has on precision & recall Unflagging patience marks the task of flipping back & forth between hundreds of brief and full records to sort thru the varied instances of a single entity, e.g. • critical editions based on primary sources • 18th & 19th century collections of the plays • social, historical and literary essays • histories & critiques of such writings • video and audio recordings of performances • reviews and indices of the same • treatments of stagecraft, costumes, music • life & works of notables associated with the plays (e.g., performers, directors) • other art forms inspired by the plays Monday, November 21, 11
  • 48. 3 3. Too far removed from W Web Wide World Monday, November 21, 11
  • 49. 3 3. Too far removed from W Web Wide World Together, our metadata & collections make up a big chunk of the “dark web” [ info resources that search-engine spiders can’t see ] Monday, November 21, 11
  • 50. 3 3. Too far removed from W Web Wide World Together, our metadata & collections make up a big chunk of the “dark web” [ info resources that search-engine spiders can’t see ] It’s clear that visibility on the web promotes dramatic increases in discovery and access Monday, November 21, 11
  • 51. 3 3. Too far removed from W Web Wide World Together, our metadata & collections make up a big chunk of the “dark web” [ info resources that search-engine spiders can’t see ] It’s clear that visibility on the web promotes dramatic increases in discovery and access • Library of Congress & Smithsonian images (FLICKR) Monday, November 21, 11
  • 52. 3 3. Too far removed from W Web Wide World Together, our metadata & collections make up a big chunk of the “dark web” [ info resources that search-engine spiders can’t see ] It’s clear that visibility on the web promotes dramatic increases in discovery and access • Library of Congress & Smithsonian images (FLICKR) • SULAIR’s Highwire Press ( > 2x increase via Google) Monday, November 21, 11
  • 53. 3 3. Too far removed from W Web Wide World Together, our metadata & collections make up a big chunk of the “dark web” [ info resources that search-engine spiders can’t see ] It’s clear that visibility on the web promotes dramatic increases in discovery and access • Library of Congress & Smithsonian images (FLICKR) • SULAIR’s Highwire Press ( > 2x increase via Google) The state of affairs is well known ... Monday, November 21, 11
  • 54. Our  Working  Environment 54 Monday, November 21, 11
  • 55. academy publisher pr od ce u library pr Scholars ov &  students e id Monday, November 21, 11 Here is a schematic to suggest how our ecosystem works. It is more complex, of course, but the basics are embodied here.
  • 56. Once  upon  a  &me…the  Internet internet Monday, November 21, 11 And here is the way the e-discovery and e-communication environment is developing. First there was the Internet. Prophets such as Vannevar Bush, Ted Nelson, and Doug Englebart showed us the way.
  • 57. Then…the  World  Wide  Web web of pages internet Monday, November 21, 11 Thanks to another profit, Tim Berners-Lee, the Internet, a network of communicating computers, became a web of pages of information. Scholarly journal publishers and some librarians realized early on that there were functional advantages to scholarship and to publishing in the web of pages. Yahoo, Google, and others realized that mining the web o pages by words on those pages, could make the rapidly growing web of pages reveal mor through indexing and cataloging the web. Indexing won out as we now know over catalog The next thing is the subject of this talk. It is the web of data. It is the web of relationships constructed and expressed so that both computers and humans can identify and understa relationships in that web. The web of data lives with the web of pages and is carried on th Internet, the global carrier.
  • 58. web Under  construc&on of data web of pages internet Monday, November 21, 11 This web of data is the next big thing in discovering relevant information objects and the n big thing in empowering individuals, communities, and industries in making better use of information that they or others create. What distinguishes this web of data, this linked dat environment, is the principal of identifying entities, virtual & real by statements of relations and descriptions in machine readable form. More about this as we go along.
  • 59. web Under  construc&on of data web of pages internet aka Linked Data Monday, November 21, 11 We  are  calling  this  next  phase  the  Linked  Data  phase,  because  it  is  enGrely  dependent  upon   statements  of  relaGonships  and  descripGons  in  machine  readable  form,  but  this  phase  may  be  onl a  pre-­‐cursor  to  another,  more  complex  and  more  difficult  web  world  to  engineer.  The  next  phase  i the  SemanGc  Web,  which  in  theory  allows  the  machine  readable  relaGonships  and  descripGons  to interoperate  to  saGsfy  a  person’s  requirements,  albeit  without    constant  interacGon.    In  short,  in  th SemanGc  Web,  the  machines  will  understand  meaning  and  presumably  act  on  it.    Scarey,  eh?
  • 60. ConstrucGon  Tools 60 Monday, November 21, 11 How  to  we  work  to  alleviate  our  problems  as  informaGon  professionals,  librarians  and   publishers?
  • 61. Recipe  for  crea+ng  the  web  of  data • identify people, places, things, events, and other entities embedded in the knowledge resources that a research university consumes and produces Monday, November 21, 11
  • 62. Recipe  for  crea+ng  the  web  of  data • identify people, places, things, events, and other entities embedded in the knowledge resources that a research university consumes and produces • tie those facts together with named connections Monday, November 21, 11
  • 63. Recipe  for  crea+ng  the  web  of  data • identify people, places, things, events, and other entities embedded in the knowledge resources that a research university consumes and produces • tie those facts together with named connections • publish the relationships as crawl-able links on the web Monday, November 21, 11
  • 64. Recipe  for  crea+ng  the  web  of  data • identify people, places, things, events, and other entities embedded in the knowledge resources that a research university consumes and produces • tie those facts together with named connections • publish the relationships as crawl-able links on the web  Build/use apps supporting discovery via the web of data Monday, November 21, 11
  • 65. 65 Monday, November 21, 11 Here  is  a  pile  of  words  represenGng  all  the  words  on  the  web  that  most  search  engines  index   constantly.    Good  search  engines  today  can  do  a  lot  with  this  pile.    BUT,  the  search  engines   create  the  percepGon  of  relaGonships,  not  based  on  meaning,  but  on  other  factors,  such  as   number  of  links  to  a  site  containing  the  words  of  interest  OR  the  traffic  to  a  site.
  • 66. From  this  pile  of  words,  structure! 66 Monday, November 21, 11 The  Linked  Data  approach  aSempts  to  structure  the  pile  in  anGcipaGon  of  the  need  for   discovery.    That  structure  is  based  on  meaning,  on  relaGonships.    I  will  make  this  clearer  in  the   next  slides.
  • 67. 67 Monday, November 21, 11 Here’s  a  graph  of  a  very  few  relaGonships  to  Yo  Yo  Ma,  the  great  ‘cellist.
  • 68. Linked  Data  Web 68 Monday, November 21, 11 Here’s  a  graph  of  relaGonships  to  Haggis,  just  a  fun  one  I  could  not  resist  throwing  in.    Meaning   is  provided  by  understanding  relaGonships.
  • 69. RDF$triples$&$URIs$ •  RDF$triples$=$subject$–$object$–$predicate$ –  A$way$to$describe$objects$or$even$ideas$on$the$web$ –  An$object$or$idea$might$have$many$RDF$triples$describing$it$ –  Objects$or$ideas$need$not$exist$on$the$web!$ •  URIs$=$Uniform$Resource$IdenDfiers$ –  Allows$machine$interacDon$among$Web$objects$ –  Various$syntacDcal$schemes$&$protocols$used$to$construct$ URIs$ –  At$least$3$needed$to$support$an$RDF$(subject$–$objectJ$ predicate)$ 69 Monday, November 21, 11 Geek  ingredients  to  the  construcGon  of  the  Linked  DAta  Web.  RDF  means  Resource  DescripGon   Framework,  always  expressed  as  a  simple  sentence,  though  mulGple  such  statements  might   aSach  to  a  single  enGty.    In  fact,  we  need  mulGple  RDFs  in  this  scheme.
  • 70. 70 Monday, November 21, 11 A  graph  of  RDF  statements  and  URIs
  • 71. The Linked Data Principles 1. Use Resource Description Frameworks as names of things (people, places, times, objects, ideas...anything really) 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful RDF information 4. Include RDF statements that link to other URIs so that they can discover related things 71 Monday, November 21, 11 The  really  great  aspect  of  RDFs  is  that  they  can  refer  to  ideas,  not  just  to  physical  or  virtual   enGGes.    Any  kind  of  idea  could  be  treated.
  • 72. Library'Metadata' •  Library'metadata'standards'closed' •  ' “Passive”'metadata,'searchable,'but…' •  In'Silos '' •  Readable,'but'not'ac=onable' •  Search'results'refinable,'but'final' 72 Monday, November 21, 11 These  are  some  of  the  edges  of  the  problem  of  library  metadata.
  • 73. Library'Metadata' Seman/c'Web'Metadata' Library'Metadata' Seman/c'Web'Metadata' •  Library'metadata'standards' •  Open' •  Library'metadata'standards' •  Open' closed' closed' •  “Passive”'metadata,' •  Dynamic,'Contextualized' •  “Passive”'metadata,' •  Dynamic,'Contextualized' searchable,'but…' searchable,'but…' •  In'Silos '' •  In'the'wild' •  In'Silos '' •  In'the'wild' •  Readable,'but'not' •  Interac<ve,'Responsive' •  Readable,'but'not' •  Interac<ve,'Responsive' ac<onable' ac<onable' •  Search'results'refinable,'but' •  Search'results'refinable,'but' •  Leading'to'other'queries'&' final' •  Leading'to'other'queries'&' final' views' views' 73 Monday, November 21, 11 And  here  is  the  comparison  between  the  library  metadata  scene  now  and  the  one  we  advocate   for  the  Linked  Data/SemanGc  Web.    Library  metadata  in  the  Linked  Data  Web  should  be  freely   available,  constantly  updated,  o[en  reconciled  with  RDF  triple  statements  from  non-­‐library   sources.    Library  Linked  Data  should  be  enGrely  open  on  the  web.
  • 74. Make  Library    bibliographic  facts in  to  RDFs  &  URIs; Release  them  into  the  wild. Make  Library  Linked  Data  OPEN. 74 Monday, November 21, 11 I  should  add  that  accounGng  for  physical  objects  in  our  collecGons,  locaGng  them,  making  our   collecGons  auditable,  and  managing  our  collecGons  seems  to  be  possible  using  Linked  Data  too,   at  least  in  principal.
  • 75. What  about  Publishers? 75 Monday, November 21, 11
  • 76. Publishers*&*Socie/es** making*use*of*Linked*Data* •  Aggregate*content*in*their*own*realms*&*beyond* •  Aggregate*informa/on*about* –  Conferences* –  Career*building*&*employment*opportuni/es* –  Communi/es*in*collabora/on* –  Commercial*&*other*services*suppor/ng*research*with* specimens,*source*material,*processing,*trials* –  Produc/ve*rela/onships*with*others* •  Provide*ac/onable,*constantly*updated*links*in* support*of*scholars,*teachers,*and*learners* •  Provide*compelling*services*tying*users*to*them* 76 Monday, November 21, 11 Libraries  too  can  use  Linked  Data  to  reveal  and  adverGse  compelling  services  offered  to  their   clients.
  • 77. Seman4c  Web  adopters 77 Monday, November 21, 11 Here  are  some  of  the  big  players  in  the  Linked  Data  /  SemanGc  Web  world.    The  BriGsh  Library   has  released  RDFs/URIs  for  the  enGre  BriGsh  NaGonal  Bibliography.    The  Library  of  Congress  has   released  the  same  for  LCSH  &  Name  Authority  Files.    LCSH  includes  links  to  AGROVOC,  RAMEAU,   DNB,  GLIN  Subject  Thesaurus,  and  the  NaGonal  Agriculture  Library's  Subject  Index.    Every   Personal  and  Corporate  entry  in  LC/NAF  links  to  VIAF,  the  Virtual  InternaGonal  Authority  File   based  at  OCLC.        The  N  Y  Times  18  months  ago  made  all  500,000  (and  growing)  of  its  index   terms  available  in  the  wild  as  RDFs  and  URIs.
  • 78. 78 Monday, November 21, 11 For  publishers  and  libraries...though  we  should  not  neglect  services.
  • 79. ...if  users  can  find  it  in  their  own  context 79 Monday, November 21, 11
  • 80. Context Users Content Users  =  readers,  authors,  teachers,  students 80 Monday, November 21, 11
  • 81. Context Users Content Publishers  must  make  content  VISIBLE 81 Monday, November 21, 11 I  am  using  the  imperaGve  here,  because  invisible  published  content  means  invisible  benefit  to   the  author  and/or  the  publisher.
  • 82. 82 Monday, November 21, 11 Here  is  a  recent  PLoS  arGcle  from  PLoS  Neglected  Tropical  Diseases.    
  • 83. 83 Monday, November 21, 11 And  here  is  the  semanGcally  enhanced  version  of  this  arGcle,  enhancements  provided  by  David   ShoSen  et  al.  in  the  form  of  links  to  further  informaGon,  interacGve  figures,  re-­‐orderable   reference  list,  citaGons  in  context  and  tag  trees.  These  enhancements  took  10  man  weeks  in   2009!    However,  with  the  growing  ecology  of  linked  data,  much  of  this  could  be  accomplished   by  auto-­‐tagging  and  algorithmic  construcGon  of  the  basic  RDFs  &  URIs  for  the  unique  arGcle.     Microdata  submiSed  by  some  publishers  and  their  supporGng  services  to  schema.org  lead  to   these  exciGng  possibiliGes.
  • 84. aggrega+on 84 Monday, November 21, 11 AggregaGon  counts,  but  think  how  much  more  we  would  get  if  we  could  aggregate  from   libraries,  publishers,  and  the  wild  and  weird  variety  of  sources  on  the  web?
  • 86. Disambigua4on 86 Monday, November 21, 11 RDFs  and  URIs  can  operate  in  many  languages  and  relaGonships  can  be  expressed  across   languages,  a  potenGal  big  benefit  to  research  and  collaboraGon  in  research.
  • 87. Web  of  Data  Progress 87 Monday, November 21, 11
  • 88. 2007 88 Monday, November 21, 11 FOAF  =  Friend  of  a  Friend.    Hundreds  of  millions  of  RDFs/URIs.    Fortunately  they  do  not  take   much  space  in  memory!
  • 89. 89 Monday, November 21, 11 This  is  the  2011  graph  of  enGGes  supplying  RDFs  and  URIs.    Now  the  populaGon  is  in  the   hundreds  of  billions,  heading  to  trillions.
  • 90. 2011 90 hSp://inkdroid.org/lod-­‐graph/ Monday, November 21, 11
  • 91. Encouragement Examples 91 Monday, November 21, 11
  • 92. Linked'Open'Data'Value'Proposi4on' •  Linked'open'data'(LOD)'puts'informa4on'where'people'are'looking'for'it'–'on' the'Web;'' •  LOD'can'expands'discoverability'of'our'content;'' •  LOD'opens'opportuni4es'for'crea4ve'innova4on'in'digital'scholarship'and' par4cipa4on;'' •  LOD'allows'for'open'con4nuous'improvement'of'data;'' •  LOD'creates'a'store'of'machineDac4onable'data'on'which'improved'services'can' be'built;'' •  Library'linked'open'data'might'facilitate'the'break'down'the'tyranny'of'domain' silos;'' •  LOD'can'provide'direct'access'to'data'in'ways'that'are'not'currently'possible;'' •  LOD'provides'unan4cipated'benefits'that'will'emerge'later'as'the'stores'of'LOD' expand'exponen4ally.'' ' A"product"of"the"Stanford/CLIR"Linked"Data"Workshop"June"2011." 92 Monday, November 21, 11 25  ParGcipants  from  the  BriGsh  Library,  the  Bibliothèque  naGonale  de  France,  the  Deutsch   NaGonalbibliothek,  the  Royal  Library  of  Denmark,  Aalto  University  in  Finland,  the  Library  of   Congress,  the  Bibliotheca  Alexandrina,  the  NaGonal  InsGtute  of  InformaGcs  of  Japan,  Google,   Seme4,  Emory,  University  of  Virginia,  University  of  Michigan,  California  Digital  Library,   Knowledge  MoGfs,  CLIR,  and  Stanford.    
  • 93. Google  using  Stanford  bib  facts  +  web  resources 93 Monday, November 21, 11 This  is  a  movie  of  a  live  interacGon  with  Freebase  using  bibliographic  facts  from  Stanford,  and   linked  informaGon  resources  from  the  web.    It  shows  in  a  limited  way  the  potenGal  for  discovery   and  retrieval  in  the  Linked  Data  Web.    
  • 94. BnF  using  data  only  from  its  catalogs  &  Gallica 94 Monday, November 21, 11 This  is  another  movie  of  the  Linked  Data  prototype  based  enGrely  on  bibliographic  facts  from   the  BnF  catalogs  and  digital  texts  in  Gallica.    There  are  no  other  web  resources  drawn  into  this   prototype...yet.
  • 96. A"Bibliographic"Framework"for"the" Digital"Age"(October"31,"2011)! •  “The!new!bibliographic!framework!project!will!be!focused!on! the!Web!environment,!Linked!Data!principles!and! mechanisms,!and!the!Resource!Descrip?on!Framework!(RDF)! as!a!basic!data!model.!!The!protocols!and!ideas!behind! Linked!Data!are!natural!exchange!mechanisms!for!the!Web! that!have!found!substan?al!resonance!even!beyond!the! cultural!heritage!sector.!!Likewise,!it!is!expected!that!the!use! of!RDF!and!other!W3C!(World!Wide!Web!Consor?um)! developments!will!enable!the!integra?on!of!library!data!and! other!cultural!heritage!data!on!the!Web!for!more!expansive! user!access!to!informa?on.”! Deanna%Marcum,%Associate%Librarian%of%Congress,%introducing%a% transi7on%from%MARC.% 96 Monday, November 21, 11
  • 97. Value  Proposi-on  for  LAM’s We  in  the  cultural  heritage  and  knowledge  management  institutions  are  discovering   better  ways  of  publishing,  sharing,  and  using  information  by  linking  data  and   helping  others  do  the  same.    Through  this  work,  we  have  come  to  value  and  to   promote  the  following  practices: 1.   Publishing  data  on  the  web  for  discovery  and  use,  rather  than  preserving  it  in   dark,  more  or  less  unreachable  archives  that  are  often  proprietary  and  pro?it   driven;     2.   Continuously  improving  data  and  Linked  Data,  rather  than  waiting  to  publish   “perfect”  data; 3.   Structuring  data  semantically,  rather  than  preparing  ?lat,  unstructured  data; 4.   Collaborating,  rather  than  working  alone; 5.   Adopting  Web  standards,  rather  than  domain  speci?ic  ones; 6.   Using  open,  commonly  understood  licenses,  rather  than  closed  and/or  local   licenses. from  the  Stanford/CLIR  Workshop  on  Linked  Data,  June  2011 97 Monday, November 21, 11 In  each  couplet,  we  emphasize  the  second  half,  a[er  “rather  than”,  admitng  that  someGmes   the  first  half  of  the  couplet  has  to  be  operaGve.
  • 98. DARPA  Internet 98 Monday, November 21, 11 This  is  where  we  started  2.5  decades  ago.
  • 99. World  Wide  Web 99 Monday, November 21, 11 Thanks  to  Tim  Berners-­‐Lee  and  many  others,  we  advanced  in  this  environment  from  the  early   1990s  unGl  today.
  • 100. SOCIAL  WEB 100 Monday, November 21, 11 We  cannot  ignore  the  social  web  that  exists  in  the  current  WWW,  but  think  how  much  more,   some  of  it  scarey,  could  be  done  in  the  Linked  Data  Web  with  the  behaviors  of  the  Social  Web.
  • 101. Linked  Data  Web 101 Monday, November 21, 11 Just  that  funny  reminder  of  the  fundamental  nature  of  the  Linked  Data  Web:  expressing   machine  acGonable  relaGonships.
  • 102. Seman+c  Web 102 Monday, November 21, 11 And  in  the  next  web,  the  SemanGc  Web,  who  knows  what  may  be  possible.    
  • 103. Ubiquitous  compu+ng 103 Monday, November 21, 11 To  the  progression  of  network  types,  we  need  to  add  a  couple  of  enormously  important   environmental  factors.    Ubiquitous  compuGng  is  a  very  important  one.    Having  lots  of   computers  on  the  net  makes  the  possibility  of  an  open  global  linked  data  web  very  strong.
  • 104. Mobility 104 Monday, November 21, 11 And  our  ability  to  communicate  by  voice  (how  about  that  Siri?)  and  by  bits/bytes  from   everywhere,  is,  perhaps,  just  another  aspect  of  ubiquitous  compuGng.
  • 105. Ubiquitous  Compu4ng Linked  Web M o b i l e Web Social  Web Internet 105 Monday, November 21, 11 The  black  box  in  the  upper  right  corner  is  the  SemanGc  Web,  a  level  of  sophisGcaGon  yet  to  be   achieved.    The  linked  data  web  is  at  hand,  though. Will  Librarians  and  Publishers  join  the  development  of  the  Linked  Open  Data  web?    I  certainly   think  we  should.
  • 106. Monday, November 21, 11 NO MORE SILOS ARE NEEDED or wanted.
  • 107. W3C Library Linked Data Incubator Group http://www.w3.org/2005/Incubator/lld/ A Bibliographic Framework Initiative General Plan for the Digital Age (October 31, 2011) http://www.loc.gov/marc/ transition/news/ framework-103111.html Linked  Data  Survey  &  Workshop  June  2011 hSp://www.clir.org/pubs/archives/linked-­‐data-­‐ survey/ 107 Monday, November 21, 11