Open Library at Make Books Apparent

Hello.

http://ﬂic.kr/p/4Pg28f

My name is George Oates, and I’m leading the Open Library project.

http://ﬂic.kr/p/6iCLgP

Joined about 6 months ago
Redesigning everything, and I thought I’d tell you a little bit about that.

First Steps

• Listen to people Answer help emails

• Meet team in person Met in San Francisco in June

• Streamline deploys 1 button!

• Redraw sitemap Refocus on core

• Dream a little
Ask silly questions, assess competition

Get acclimatized

Understand relationships

http://ﬂic.kr/p/6xCJQS

So, what have we got, and how does it all inter-relate?

Any relationship can be made into a hyperlink.

Reach into the network

twitter.com/openlibrary
- we’ve also arranged a little Flickr integration, so if people take photos of books, they can
link them to Open Library records. We’re not using them yet.
- as you may have noticed last night, we also added a link from Internet ARchive book pages
into Open Library. We reckon that’s almost doubled our modest traffic. (About 250k unique
IPs per day)

Challenges
• Dense library metadata
• Designed for classic institutional
search/retrieve practice

• Data is “dry”, sometimes poor quality
• No insight into the community
• Distributed team US, India, UK

-so one thing I began was to start reading and answering enquiries that come to
info@openlibrary (this is a good thing to have new people do for a while)
- found that some questions repeated themselves and there was a key mismatch in
understanding what Open Library was about. e.g. people would write in to us asking us to
correct errors, not knowing they were able to do it themselves.

There are 4 Agatha Christies in this list, 2 of which appear to the eye to be identical.
Computers have trouble recognising that these Authors are the same woman. It’s easy for
humans to do. How could we build a UI to help people help us to merge these duplicates?

What have we got?

• Loads of data 23 million records

• Small user base < 20,000

• Small team 6 people

• Small architecture 12 servers

• Good framework infogami, web.py

Certainly there are challenges to trying to make use of a large but shallow dataset, but Open
Library has lots of advantages in terms of a small team & system being able to change
rapidly. This ﬂexibility will hopefully help us.

Began experimenting with the data we have to try to see the catalog “landscape”. What do we
already have that we’re not showing to people yet? Look at all these subjects! These
timeframes! How can we make use of them?

Look at all these new links! ISBN -> Publisher names -> Show me all the books this publisher
has published... Show me all the subjects related to cheese... Add links and hey presto!
You’re bouncing around the catalog.

What if?

• Adjacent books
• Not efficiency, but effectiveness
(conversation broker, records improve
over time) - Shirky

• Not a purchasing engine, but a library

As an exercise, it’s fun to ask what might happen if there were no search box on Open
Library? Could you still use it?

Changing the look of the logo will hopefully encourage people to come inside and look
around. Break the conventional “library look” and try to warm it up a little... We are literally
open - both at the software level, but also all of Open Library’s records are editable, by
anyone.

Add a Book?
So, let’s take a look at one of the key UIs on Open Library - How to add a new record. This is
the current form. Basically just a web UI to a pretty dense, librarian-centric form. A lot of the
ﬁelds are difficult for not-librarians to complete - a deﬁnite barrier to entry for both adding
new records and editing existing things.

The idea is to break it into two steps. This is step 1.

The most important thing to do is to make it feel easy to add a record. This ﬁrst step also
gathers enough info to allow us to do a decent search for any existing records. If we ﬁnd a
match, we can direct people towards the Edit view of that record. If there’s no match, we
move on...

Step 2 is a massive form. There’s no way to hide that basically. All the ﬁelds are potentially
useful. What we can do is organize the info a little, so related things (the physical object,
pagination) are grouped together. We’re also going to try adding a tabbed view to try to
soften the blow a little. Also, hopefully, adopting a conversational tone with the form labels
might help direct people a little more about the sort of data we want.

It would be awesome if we could start to collect excerpts from books. A personal touch from
people about particular bits they’ve enjoyed and why. Also, these excerpts could be indexed
to help boost books in our search.

Links, links, links.... This “networked catalog” is all about how many things we can connect
books to. This is the principal of metadata giving records a sort of “surface tension” to keep
them from sinking into the depths.

Those first 3 tabs (About, Excerpts, Links) are about the Work level of our records. We’re
going to try this first version not worrying about exposing this slightly weird metadata-y
thing called Work to visitors, but still attempt to collect data at the Work-y level. There’s a
specific tab just for Editions too, that contain fields mainly about publishing info and the
physical (or virtual!) object itself.

Another experiment we’re looking forward to trying is about identifiers. We’re not particularly
concerned about canonical identifiers. Perhaps it’s a waste of time to wait for one, so instead,
we’re going to try and attach as many ID types to our records as we can. (This list is just a
braindump - not active yet.) The idea is that people could add a URL or actual identifier and
Open Library would just do the right thing. A suggestion (after this presentation was
delivered) was that people could ping Open Library with an identifier, not even knowing what
TYPE of ID it is. Perhaps Open Library could help “triangulate” this query towards a book
record. “Record laundering.”

Key Features
• History
• Activity, life, cause, effect
• Notifications thereof
• List(s)
• More small, ad hoc collections
• Public / private
• Exportable (ad hoc catalogs)
- Planning two features that play off the strengths of the underlying Wiki: History & Lists
- AD HOC (so, BookServer feeds should be expected to be ad hoc. No point in trying to agree
on a hierarchy etc for feeds. Waste of time.)

We’re excited about how we might improve the display and linkage from history of our
records. They are another source of connections into and around the catalog, so we should
“activate” them where we can to connect to people, subjects, publishers, even dates. “See
everything that happened on Open Library on May the 4th, 2009. Version 1 probably won’t be
quite this robust :)

http://ﬂic.kr/p/6zyU3U Tension?

http://loc.gov

- I’m not sure how much we’re going to be able to assist the Library of Congress

Small Collections

http://ﬂic.kr/p/34WGhL

• Catalogues to & from from book lovers who may or may not be professional
librarians
• Effective & Personal; Inefficient & Charming, Detailed
• Looking to integrate cool cataloging services like Koha, Delicious Monster -
Anyone??
• It was only last night I met a woman who is cataloguing a business’s library of
some 1,100 books. She had said she was looking on Open Library for a way to
upload a CSV ﬁle to us. We should do that, and note it on each edition’s history.
(*Note: Design that CSV and get it online!)

History
http://ﬂic.kr/p/6NHecm

- there was some talk about timestamps yesterday. Being able to slice things by time will only
increase in importance as the web gets older, so, I’d suggest putting timestamps on anything
you can think of.

Substrate:
any surface on which a plant or animal lives or on
which a material sticks

http://ﬂic.kr/p/4itJcB

What if we position library records
like that?

http://ﬂic.kr/p/4itJcB

“Build it so anyone can
contribute any amount.”
Clay Shirky

http://ﬂic.kr/p/v5uNz

The act of adding a book to a library catalog is a bit like playing tetris.

http://ﬂic.kr/p/6pmtQL

But, librarians are (very clever) humans too. And everyone who’s responsible for putting
books into a traditional catalogue must work within patterns. Patterns that have grown
semantically remarkable and deeply complex.

"But here’s a question for you, let’s say you
have an 856 URL to full text for a serial. And
you know what date ranges it covers. What
sub-field would you put that in? $3 or $z? I
see it in both."
Jonathan Rochkind, Bibliographic Wilderness

http://ﬂic.kr/p/6pmtQL

I’m glad I don’t have to either ask or answer this question.

“Library metadata is
diabolically rational.”
Karen Coyle, kcoyle.net

Hic sunt dracones.

http://www.lib.cam.ac.uk/exhibitions/Fantasy_to_Federation/Blaeu.jpg

A detail from a map of the East Indies showing, outlined in pink, the first European
discoveries along the Cape York Peninsula. Early in 1606, towards the northern tip of the
peninsula, Willem Jansz made here what was almost certainly the first landing by Europeans
in Australia. This map first appeared in 1635 and was reprinted unchanged until 1664.

Here be dragons.

http://www.lib.cam.ac.uk/exhibitions/Fantasy_to_Federation/Blaeu.jpg

A detail from a map of the East Indies showing, outlined in pink, the first European
discoveries along the Cape York Peninsula. Early in 1606, towards the northern tip of the
peninsula, Willem Jansz made here what was almost certainly the first landing by Europeans
in Australia. This map first appeared in 1635 and was reprinted unchanged until 1664.

http://www.lib.cam.ac.uk/exhibitions/Fantasy_to_Federation/Bellin1753.jpg

This is one of the few maps in the eighteenth century devoted entirely to Australia. Jacques
Bellin was hydrographer to the French King Louis XIV. He has added a hypothetical coast line
joining Australia, New Guinea and Tasmania - a note says that this is included without proof.
It is further suggested that New Zealand might be part of the great southern continent.

I wonder if librarians are trying to make catalogs look like this... Highly “accurate”; deeply
organized; the perfect information system...

http://ﬂic.kr/p/38TZ

What if a catalog looks like this? Is crystalline?

From the artist of this iamge, Jared Tarbell: “Lines like crystals form at perpendicular angles
to existing lines. A complex form emerges.
1000 classic computational substrate, color palette stolen from Jackson Pollock: A simple
perpendicular growth rule creates intricate city-like structures. The simple rule, the complex
results, the enormous potential for modiﬁcation; this has got to be one of my all time favorite
self-discovered algorithms. Lines likes crystals grow on a computational substrate.”

Deconstruction

http://ﬂickr.com/photos/tupwanders/3356077817/

I’ve learned a wee bit about the history of library metadata... And museum metadata for that
matter.... It seems like the 1960s are a bit of a blight for human understanding, since that’s
the time when we got all excited about computers and their processing power, and seemingly
overwrote a lot of the crafty, poetic description and allusion that was done to describe cultural
works, in favour of the Tetris approach.

What happens if you blow it up?

600
13 $a Marie Antoinette $c Queen, Consort of Louis XVI,
King of France $d
1755-1793

650
2 $a Queens $z France $v Biography
1 $a Queens $z France $x Biography

651
2 $a France $x History $y Louis XVI, 1774-1793
1 $a France $x History $y Revolution, 1789-1799
1 $a France $x Queens $x Biography

- I don’t want Open Library to jettison librarianship, or neglect to acknowledge the brilliant
hard work of librarians over the years...
- You could argue that this sort of computer-y librarianship (or any type of “educated
classiﬁcation”) was (perhaps unintentionally) designed to obscure the personal... the
practical... the human

- How might we adapt or extend (or revert?) this librarians’ work to appeal to a broader
audience?
- Let’s see what happens when you explode Library of Congress Subject Headings. This data
isn’t even in Open Library - we borrowed it from loc.gov then pulled out the dynamite...

600 (people)
13 $a Marie Antoinette $c Queen, Consort of Louis XVI,
King of France $d
1755-1793

650 (subjects)
2 $a Queens $z France $v Biography
1 $a Queens $z France $x Biography

651 (places)
2 $a France $x History $y Louis XVI, 1774-1793
1 $a France $x History $y Revolution, 1789-1799
1 $a France $x Queens $x Biography

These numbers are subsections of a thing called a MARC record - MAchine-Readable
Cataloging
Since librarianship is “diabolically rational” of course, everything is in it’s place, whether it’s a
reference to a person, a place, a thing, an author or, whatever...

(people)
Marie Antoinette, Louis XVI

(subjects)
Queens, France, Biography

(places)
France, History, Louis XVI, 1774-1793, Revolution,
1789-1799, Queens, Biography

So, if we get rid of all that machine readable gumpf, we start to have things that humans can
parse as well...

Marie Antoinette, Louis XVI, Queens, France, Biography,
History, 1774-1793, Revolution, 1789-1799

Marie Antoinette, Louis XVI, Queens, France, Biography,
History, 1774-1793, Revolution, 1789-1799

Then, make them into links, but retain their interconnection.

Subject
Related subjects

Books about...

“Collections”

Publishing over
Related authors time

Information from If it’s a place,
the network show a map!

Subject
Related subjects

Books about...

“Collections”

Publishing over
Related authors time

openlibrary.org/subjects/places/bordeaux
Information from If it’s a place,
the network show a map!

Give it a URL

I used to use this image to represent contact networks on Flickr, but I think itʼs equally applicable as a visual for what a networked library
catalog might look like. How many things can we connect book records to? Not only identiﬁers, but blog posts, reviews, subjects, publishers,
booksellers etc etc

Release

http://ﬂickr.com/photos/swamibu/3191787234/

- launch with what we’ve got

- the records are still the same... just easier to skip around
- allow people to collect books around them, and then share or export that collection

Connect

- exploring partnerships, connections
- reach into existing networks
- Library Thing, Good Reads, open source systems, etc
- open data, improve API

Observe

http://ﬂickr.com/photos/odreiuqzide/3195647925/

- see what people do
- provide tools to let people see what everyone else is doing
- monitor activity, like popular records, top editors, sign ups per day etc
- and ABOVE ALL, participate!!!

• Navigation
Enhance Tidbits • Key Processes
• Branding
• Recognition
• Contribution
Gather Small Collections • Curated content
• Original clusters

• Content, content, content
Inhale Web Services • BookServer
• APIs in & out
• Workflow
Streamline Library Catalogs • Updates
• Expansion
• Respect

To summarise, here are the 4 levels of stuff we’re trying to focus on in the coming months...

Next Steps
• We’re hiring! SOLR, Sys Admin, Web Dev

• Find money! Want to join forces?

• Release the redesign And watch what happens...

Short term... Want to come and work on an awesome project playing with the very nature of a
library catalog? Let me know!

Thank You!
glo@archive.org
http://ﬂickr.com/photos/roadsidepictures/244926428/

Open Library at Make Books Apparent

Open Library at Make Books Apparent

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Open Library at Make Books Apparent

Similar to Open Library at Make Books Apparent (20)

More from George Oates

More from George Oates (20)

Recently uploaded

Recently uploaded (20)

Open Library at Make Books Apparent