Data keynote delivered at NEIC 2013 conference in Trondheim, Norway. Argues that research infrastructure providers are all in the data business. Video of presentation online at http://www.youtube.com/watch?feature=player_embedded&v=oCSqYoaRWR0#! (from 4:00 to 34:24).
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
What business are we in? Data-centric research, service requirements and national responses
1. What business are we in?
Data-centric research, service
requirements and national
responses
Data Keynote, NEIC 2013
Dr Andrew Treloar
Australian National Data Service
2. Overview
• What business are we really in?
• Service requirements
• Infrastructure responses
• Research Data Alliance
• Conclusions
CC-BY @atreloar 2
5. What Business are you in?
Theodore Levitt, The Changing Character of Capitalism,
Harvard Business Review, July–August 1956
“The railroads did not stop growing because the need for
passenger and freight transportation declined. That grew.
The railroads are in trouble today not because that need
was filled by others (cars, trucks, airplanes, and even
telephones) but because it was not filled by the railroads
themselves. They let others take customers away from
them because they assumed themselves to be in the
railroad business rather than in the transportation
business. The reason they defined their industry
incorrectly was that they were railroad oriented instead of
transportation oriented; they were product oriented
instead of customer oriented....”
CC-BY @atreloar 5
11. We are all in the Data business!
• Researchers
– with some exceptions
• Research infrastructure providers
– with no exceptions
• But what about publications?
CC-BY @atreloar 11
12. LHC output from 2009-2013
= 100PB
(www.symmetrymagazine.org/article/february-
2013/achievement-unlocked-100-petabytes-of-data)
Journal Literature size in context…
@atreloar
14. eResearch infrastructure
requirements
• Create/Capture
– automated with capture of associated
metadata
• Store
– with appropriate levels of preservation
• Describe
– information for discovery, determination of
value, access, re-use
• Identify
– indirection operator to reduce brittlenessCC-BY @atreloar 14
15. eResearch infrastructure
requirements
• Register
– in institutional/national/discipline registries
• Discover
– via general or specialised search interfaces
• Access
– with appropriate levels of control, including
humans
• Exploit
– by re-analysis or combination
CC-BY @atreloar 15
17. I come from a land
downunder…
CC-BY @atreloar 17
AU
• 6 States
• 2 Territories
• 2 islands
• 23M people
NZ
• 2 islands
• 4.5M people
18. You come from the frozen North…
CC-BY @atreloar 18
Nordic Countries
• 5 Countries
• 4 Territories
• So many islands
• 26M people
19. And yet there are some
similarities
CC-BY @atreloar 19
• Australia+NZ – 27.5M people
• Scandinavia – 26M people
20. Australian National Data Service
An initiative of the Australian Government being
conducted as part of the National Collaborative
Research Infrastructure Strategy ($A24M) and the
Super Science Initiative ($A48M)
A collaboration between Monash University, the
Australian National University and CSIRO
30 staff, funded to mid 2015
More researchers re-using more data more often
Data as a first-class object
CC-BY @atreloar 20
21. ANDS enables transformation of:
Data that are:
Unmanaged
Disconnected
Invisible
Single use
To Structured Collections that are:
Managed
Connected
Findable
Reusable
so that Australian researchers can easily publish,
discover, access and use/re-use research data.
CC-BY @atreloar 21
23. ANDS activities/services
Plan
Data management planning tools and resources (N)
Create/Capture
69 Data Capture projects at 23 universities
Store
working closely with national Research Data Storage
Infrastructure (N)
Describe
25 institutional Metadata Stores projects
National Vocabulary Services (N)
CC-BY @atreloar 23
24. CC-BY @atreloar 24
Identify (N)
DataCite DOIs
Register (N)
Repository Interchange Format – Collections and Services
(RIF-CS) – based on ISO2146:2010
Discover (N)
Research Data Australia
ANDS activities/services
25. ANDS activities/services
Access
enforced by underlying data stores
Exploit
25 institutionally-focussed projects to demonstrate value of
combining data
Advocate (N)
Be the voice for data
Work with Government and Research Funders to change
settings in favour of data sharing
CC-BY @atreloar 25
26. 26
Research Data Alliance
The Research Data Alliance (RDA) is a new international
organization (driven now by EC, US, AU, more soon) forming to
facilitate specific, short-term efforts that accelerate the sharing and
exchange of research data
Unofficial motto: rough consensus and exchanged data
Working groups will run over 12-18 months to produce
Adopted standards
Deployed infrastructure
Adopted policy
Implemented best practice, etc.
Second Plenary in Washington DC, September 16-18
Slide by Fran Berman
27. 27
Data Type Registries
Data Foundation and
Terminology
Practical Policy
PID Information Types
Metadata Standards WG
Community Capability Model
Working Group on Data Citation:
Making Data Citable
Structural Biology
Defining Urban Data Exchange for
Science
Marine Data Harmonization
Repository Audit and Certification
Big Data Analytics
Metadata Standards Directory
Interest Group (MSDIG)
The Engagement Group
Legal Interoperability
Preservation e-Infrastructure
UPC Code for Data
Publishing Data
Data in Context
Citation of Dynamic Data
Agricultural Data Interoperability
Working Groups Interest Groups
Research Data Alliance
Slide by Fran Berman
28. Conclusion
• We are all in the data business
• Researchers need data services from their
infrastructure providers
• A number of services can best be provided
at national or regional level
• Research Data Alliance is working to
develop international solutions for data
interoperability – join us!
CC-BY @atreloar 28
Let me start with a quotation:“The railroads did not stop growing because the need for passenger and freight transportation declined. That grew. The railroads are in trouble today not because that need was filled by others (cars, trucks, airplanes, and even telephones) but because it was not filled by the railroads themselves. They let others take customers away from them because they assumed themselves to be in the railroad business <CLICK>.”Bergen Railway
“…rather than in the transportation business [this is Dubai International Terminal]. The reason they defined their industry incorrectly was that they were railroad oriented instead of transportation oriented; they were product oriented instead of customer oriented....”Dubai International Terminal
Talk about the importance of recognising what business you are actually in, as opposed to the business you think you are in.
If the only tool you have is a hammer, then everything looks like a nail (apparently no direct Norwegian equivalent according to my hopefully future daughter-in-law native speaker informant)Yes, I work for a data organisation, and so I might be biased, but let’s look hard at some e-Research infrastructure businesses
Networks exist to move what around? Data, and data derivatives (to a first approximation)
Storage exists to store what? Data, and data derivatives
HPC exists to generate and process what? DataI could go on: Visualisation? DataCalculation? Dataetc.
Of course, it’s possible to take this too far. I look at this and see a data-collection instrument ;-)
Of course, researchers also generate publications too, but they need the data in order to be able to do so.
So, if we are all in the data business, what does that mean for researchers? How do we support what they need to do as they create, publish and reuse data?Here is one way of thinking about the functions that need to be supported (based on work by me and Dr Adrian Burton from ANDS)NOTE: This is somewhat idealised, and some of the steps are often done poorly or not at all. Publish = Store+Describe+Register+Identify
And now, let me provide a more Australian flavour to the talk
Recap
Store – we don’t do storageDescribe – 25 of 40 universities
Discover – quick demo if timeAdvocate – new verb
Before I close, let me talk briefly about the Research Data Alliance.Nordic involvement in Organising Group? Nomination for Council?