Food processing presentation for bsc agriculture hons
JISC CNI Meeting, Edinburgh 2010
1. Supporting Technical
Innovation in the UK:
RepositoriesUK
Paul Walk
p.walk@ukoln.ac.uk
UKOLN is supported by:
www.ukoln.ac.uk
A centre of expertise in digital information management
2. innovation support
• UKOLN is now one of two JISC-funded Innovation
Support Centres
• this role is being worked out
• UKOLN has a long-standing role supporting and
helping to develop the JISC Information
Environment
• repositories
• UKOLN has an increasing role in supporting
developers in UK HE
• RepositoriesUK is a JISC-funded UKOLN project
2
3. provenance....
• Intute IRS
• nothing to do with taxes....
• Intute Institutional Repository Search
• a managed aggregation underpinning a search interface for
researchers
• ePrints UK and the Resource Discovery Network
3
4. lessons
• the aggregation has general potential value
• a cache on the network
• a search service is only one realisation of that
potential value
• separation of concerns was needed
• a particular service (such as search) should not
dictate the entire infrastructure
• lessons from this project complemented some
thinking I was doing elsewhere....
4
5. familiar?
machine interfaces
API AP
I I
AP
some aggregated data of broad
interest and potential usefulness
UI
end-user 5
6. a pessimistic view....
end-user
end-user end-user
UI
UI UI
Future
Future 3rd-party Future
3rd-party dev 3rd-party
dev dev
API AP
I I
AP
some aggregated data of broad
interest and potential usefulness
= certainty UI
= belief
= speculation
end-user
6
7. why is this?
• funding follows services & happy users (& new
features?)
• funders like to see their investment showcased
• infrastructure is mostly invisible - hard to ascertain
impact from users
• so, there is strong motivation to develop a user-
facing service, and then concentrate resources on
this
7
8. a better pattern?
= certainty
= belief
end-user end-user
= speculation
UI UI
3rd-party focussed
app app
application
pre-existing user- developed for
facing service specific
(OPAC, VLE, API requirement
Facebook, (might be simply
NetVibes....) for research and
some aggregated data of broad
interest and potential usefulness development)
http://blog.paulwalk.net/2009/12/07/an-infrastructure-service-anti-pattern/ 8
9. RepUK
• RepositoriesUK
• a managed aggregation of repository metadata
from UK HE institutions
• un-normalised records
• well-formed XML (no check for validity)
• focussed on academic papers
• goals:
1. support innovation
2. develop some business intelligence
3. develop infrastructure component for services
9
10. design principles
• tiered service model (quasi SOA)
• serving intermediaries
• negotiated supply to consumers
• built around an unnormalised cache of metadata
• well-formed is good enough Local
Service
•
Common
just as well really.... Service
Local
Core Service
Services API
Common
Service Local
Service
closely integrated
loosely coupled 10
11. RepUK 2
XML
XML
XML
Files
Files
WorldCat
Google LCSH
Identities &
language &
MIMAS
identifier JACS
Names 3
SOLR
SOLR
Index
Index
Operational
Metadata
MySQL
Registry Database
(OpenDOAR) Export
Export
Process
Export
Process
Process
RDF
Scheduler Database
Harvester &
Admin
4
XQuery HTML
&XML
RDFaXML
Files
5 Files
Repository XML
Repository
Repository Database
1 Document
HTTP
Server
11
12. progress
• 750,000+ metadata records
• ~140 repositories
• 6 consuming projects so far....
12
13. ‘consumers’ to date
• RIDIR
• identifiers
• Writeslike.us & FixRep
• metadata & full-text
• RKBExplorer & sameas
• metadata to inform linked data
• NaCTeM
• full-text (text-mining)
• Talis....?
• hosting linked data
13
14. developer appreciation
"We have found that the RepUK aggregated repository
datasets are a very useful basis on which to build, and
have used the data in a number of projects....
The ability to build on other services means that we can
reuse what has been done, rather than replicating
functionality, freeing more time to work on the key
functionality of our own projects."
14
15. issues
• state management is the real challenge!
• deletions
• changes
• federation is consequently non-trivial
• scale & inequality (one repository = half of all the
records)
• linking?
• should the records in the aggregation ever be the target of a
link? Or, should such links point to the source repository?
• if we succeed with SEO, are we undermining source
repositories?
15
16. new lessons
• developers need infrastructure too!
• finding the right place to intervene
• funders need to find ways to measure value which
does not necessarily stem from direct end-user
satisfaction
• a leap of faith....
• doing what no one else wants to do, to paraphrase
Prof. David Baker
• creating the right environmental conditions to
allow innovative services to emerge
16
Notas do Editor
the cache is valuable without having to layer on added value ourselves
SOA?
Who recognises this?
lots of standards based apis allowing seamless interoperability
I think this is an antipattern In software engineering terms, an anti-pattern is a design approach which seems plausible and attractive but which has been shown, with practice to be non-optimal or even counter-productive.
what this often means in reality (pessimistic but frequently observed)
orange stuff is what actually gets built and delivered
the users are yellow because they represent an expected demand, rather than an actual demand
major investment in UI is wasted. Investment in APIs is also wasted
neither infrastructure, nor focussed end-user service
a slightly better version
investment in API is immediately realised - service is built on API - both infrastructure and service
risk of locally built focussed app is reduced because API is developed anyway. This might be orange if properly understood. It might be OK to be yellow because might be R&D
reality will be more than this.
we have concentrated on 1 and 2, with 2 being the test for the approach being taken in 1
rapid innovation projects - 6 months, small grants, waste of time and money assembling the data. Lot’s of interest in linked data R&D on this data set.
business intelligence - shape of UK research, gap analysis, topic maps etc.
we started to think about infrastructure. Infrastructure might not serve end users. It might serve those who provide services to end users.
opportunistic developers
white is an external system
blue is wholly controlled by the project - we might call this infrastructure
yellow is negotiated between RepUK and developer projects. This might eventually become a candidate for infrastructure
google & SEO from HTML & RDFa