Manvsmachinewithnotes

Man
vs
Machine

Main theme, Web 2.0 is as much about machine consumable as human consumable data.

Web 1.0

Web 2.0
DoubleClick
Google AdSense
Ofoto

Flickr
Akamai

BitTorrent
mp3.com
Napster
Britannica Online

Wikipedia
personal websites

blogging
evite
upcoming.org and EVDB
domain name speculation

search engine optimization
page views

cost per click
screen scraping

web services
publishing participation
CMS wikis
directories (taxonomy)
tagging (folksonomy)
stickiness syndication

The meme of Web 2.0 was inﬂuenced by comparing pre-dot com bubble companies and post
dot com bubble companies.

What is the difference between the list on the left and the list on the right?

Let’s take the example of Brtiannica vs Wikipedia.

The information in Britannica is centrally controlled. It has a relatively small number of contributors.
The workload per contributor is high.

Wikipedia is open to anyone to contribute. A collaboration of 1000’s can lead to a work of equal
quality to
a more centrally controlled method.

Britannica’s revenues decreased from 650M to 50M over a 10 year period!

The new sites make it easy to add information and use that information to
answer or solve problems for people.

y
easy
contributing

hard mining easy

Two key parts to Web 2.0 are easy addition of information into
the system (user generated content), followed by ways of mining
that information.

One of the thesis that we are following by trying to work in this context
is that by realizing the nature of the ﬂow of information
and the availability of ways of mining that information
we can create useful solutions to real problems.

Companies that ﬁnd ways to do this should succeed.

y
easy
contributing

semantic web

hard mining easy

that information.



y
easy
contributing plain text, emails

semantic web

hard mining easy

that information.



y
easy plain text, emails hyperlinks
views
tags
citations?
contributing

semantic web

hard mining easy

that information.



y
views
tags
citations?
contributing

academic papers semantic web

hard mining easy

that information.



y
views
tags
citations?
contributing

microformats
MicroFormats

academic papers semantic web

hard mining easy

that information.



The Kind of Information that we can capture with Connotea is typical of many sites.
For Connotea we have:
- citation information
- usage patterns, (when did an item get added to our DB, how many times has it been added)
- user generated meta-data such as tags
- Potentially social network information, how many of my friends have added this item?

Gatherin Trustin Integrat Analyz Triangl
g g ing ing es

del.icio.us

Many Web 2.0 sites, have created islands of data.
Some key technologies for bridging these islands include fire eagle, OpenId and OAuth.
- rfid, fire eagle point the way to merging these islands with the real world

Whats the process?

• Gathering The data
• Trusting the data
• Integration / Disambiguating
• Understanding and analyzing the data

DOI

Some key technologies for bridging these islands include ﬁre eagle, OpenId and OAuth.
In the publishing world DOIʼs are a key technology

Internet

Cf

Site
or Internet Site
Application

OpenID cf OAuth

OpenID allows a single person to interact with multiple web sites using one log-in mechanisim
OAuth allows both desktop and web applications to share data using one authentication mechanisim

Rated 5/5 Rated 1/5

Redemption Based-on-Play
Android Love Refugee
Spacecraft
Time-Travel Soldier Famous-Score Hope
Alien
Blockbuster Alien Broken-Heart Blockbuster
Space
War
Futuristic Based-on-Novel Racism
Artiﬁcial-Intelligence Hero Melodrama

Once you merge the data, you have to understand it.

The tags that a person uses across different services can give you a more holistic picture of their interests

However tags can be ambiguous.

Some technologies that are addressing this a semantic web technologies, look at projects such as
Tagora http://www.tagora-project.eu/
DBpedia http://dbpedia.org/
SIOC http://sioc-project.org/
FOAF http://www.foaf-project.org/

Open
Science Web 2.0

Semantic
Web

Though not exactly the same, web 2.0, Open science and the semantic web work well together
and they share some common traits, namely sharing, openness and minability of information.

Growth in submissions to the arXiv, demonstrating growth in scientiﬁc output
certainly growth in output of available data online in e-format
There is some discussion about whether there is an information overload, as the main journals
are still the important ones, but reading habits have changed

Discussion Groups and Mailing lists contain a huge amount of information from
from snippets of computer code, to long discussions about topics.

Mark Mail, from MarkLogic, have a site that mines this information. Here we see
a comparison of a search for FORTRAN vs a search for Java.

At the moment these kinds of archives are mainly relevant in the computer science area, but
these kinds of conversations are going on all the time in every ﬁeld.

http://markmail.org/

Amazon use page views and a database of user purchases to ﬁnd things you might like.

Again, here they are using data that they get for free from people using their site.

Google page rank is another canonical example

Crystal Eye

Social/Knowledge
Networking

An example of two type of uses in science:

CrystalEye http://wwmm.ch.cam.ac.uk/crystaleye/
example bond length for a structure: http://wwmm.ch.cam.ac.uk/crystaleye/bondlengths/H-Rb.svg

Nature Network: human-human interaction

Nature Web Publishing
group

OTMI

The main products that we have developed so far are

- database gateways
- OTMI (open text mining interface)
- podcasts
- scintilla
- nature network
- nature preceedings
- connotea

There are also other tools out there that are doing the same kind of thing, but I’m partial.

Repository

Discuss how social silo’s can be interchange locations between repositories
and also between repositories and applications that we might also be built on top
of the social silos.

Repository
Repository
Repository
Repository

Repository


Repository
Repository
Repository
Repository

Repository

Citation Pubmed Activity
Management Integration Listing


Connotea citation parsing modules

This model was quick and easy to implement but using the URL as the unique key.

Amazon.pm DOI.pm LivingReviews.pm
PLoS.pm RIS.pm SpamDNSBL.pm
autodiscovery.pm
BibTeX.pm Dlib.pm NASA.pm
PMC.pm Scitation.pm Springer.pm
blog.pm
Blackwell.pm Highwire.pm NPG.pm
PNAS.pm Self.pm Wiley.pm
ePrints.pm
BmcPdf.pm Hubmed.pm OUP.pm
Pubmed.pm Simple.pm arXiv.pm

We have a bunch of citation modules

they currently have to be written in perl, and this is a problem,
there is nothing similar to the scaffold infrastructure that Zotero has

Title
Author Date

PMID/DOI

Getting data in, part 2

The meta-data from the paper has been captured

When you begin to add tags suggested tags are presented based on
tags you have already used

paper by Huberman et all shows that displaying all tags drives tag-onomies to stable state (Polya-
Renyi urn model)
You need to display the full community tags, which we don’t do ... yet.

user home page,
toolbox, on right
user tags
related tags
related users, groups

Getitng data out

Open Data, important

Export only gets out the citation data, and not extra meta data that the user
has added such as comments or tags.

Formats: txt, rdf, BibTex,RIS,EndNote an api??

perl
mod_perl
Template Toolkit
MySQL
Open Source, GPL2.5 v 1.8.1
web1.75 application

Discuss reasons for OS, discuss web1.8.1
- hope for community involvement,
- Code is not MVC structured, this has led to some problems with adoption
- We do have some people running their own instances, with some feedback ,
but we would like to eventually make the code easier to work with
- Why not port it? That’s a big can of worms, and someone needs to convince me of
the beneﬁts.
- If for some reason we choose to no longer support connotea then the data and the code could be
hosted be someone else,
- Someone asked me what do how do they know we don’t cheat, and preferentially
return NPG articles in searches, well the code is open so if you are that paranoid
you can go and run an instance yourself and check up on us.

http://www.connotea.org/user/IanMulvany

http://www.connotea.org/users/tag/scifoo

http://www.connotea.org/user/IanMulvany/tag/scifoo

http://www.connotea.org/user/IanMulvany/tag/science

http://www.connotea.org/user/IanMulvany/tag/
science2.0+citation

Example of calls to query the data, html output

http://www.connotea.org/data/user/IanMulvany

http://www.connotea.org/data/users/tag/scifoo

http://www.connotea.org/data/user/IanMulvany/tag/scifoo

http://www.connotea.org/data/user/IanMulvany/tag/
science

http://www.connotea.org/data/user/IanMulvany/tag/
science2.0+citation

Example of API calls
(you don’t have to type them in green when making the call)

http://www.connotea.org/rss/user/IanMulvany

http://www.connotea.org/rss/users/tag/scifoo

http://www.connotea.org/rss/user/IanMulvany/tag/scifoo

http://www.connotea.org/rss/user/IanMulvany/tag/science

http://www.connotea.org/rss/user/IanMulvany/tag/
science2.0+citation

Example of RSS calls
(you don’t have to type them in green when making the call)

We create an rss feed of everything

Thousands
Ja
n

100
200
300
400
500
600

0
-0
M 5
ar
-0
M 5
ay
-0
5
Ju
l-0
Se 5

Growth in Connotea bookmarks
p-
0
N 5
ov
-0
Ja 5
n-
0
Entries in All Libraries

M 6
ar
-0
M 6
ay
-0
6
Ju
l-0
Se 6
p-
0
N 6
ov
-0
Bookmark Growth in Connotea

Ja 6
n-
0
M 7
ar
-0
M 7
ay
-0
7
Ju
l-0
Se 7
p-
0
N 7
ov
-0
Ja 7
n-
0
M 8
ar
-0
8

Mirko Gontek at the university of Colonge
information visualization of links in connotea

These social links can create networks of information on top of the basic
information.

This is what we want to use to start building collaborative intelligence into
these systems.

Manvsmachinewithnotes

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Manvsmachinewithnotes

Similar to Manvsmachinewithnotes (20)

More from Ian Mulvany

More from Ian Mulvany (11)

Recently uploaded

Recently uploaded (20)

Manvsmachinewithnotes