This document summarizes a presentation about the future of scientific information and communication given by Antony Williams at SUNY Potsdam on April 12th 2013. The presentation discusses how the internet and online platforms are influencing scientists and the way scientific work is conducted and shared. It explores how scientists are building online profiles and becoming "quantified" based on various online metrics. The presentation envisions a future where all historical scientific data is mapped and integrated online in an open and collaborative manner to enable new discoveries.
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
The Future of Scientific Information & Communication Online
1. The future of scientific
information & communication
Antony Williams
SUNY Potsdam, April 12th 2013
2. How does the internet influence you?
• How many of you visit the internet/check your
email less than a dozen times per day?
• Where do you go for fact-checking?
• How many on Facebook? How many on Twitter?
• You know you have an online profile right?
• Scientists…how many of you are working on
building a scientific profile online?
• How many of you online now???
19. Scientists are “Quantified”
• Stats are gathered and analyzed
• Employers can find them, tenure will depend
on them, funding are affected by them
• Scientists Impact Factors, H-index and many
other variants
• Science is both competitive and collaborative
20. If it was not just about me…
• Together we might:
– build an encyclopedia
– …and rate restaurants
– …share book reviews
– …and movie reviews
– …and reviews of service providers
– …organize sit-ins and social action
– …and more data might just be Open
21. If it was not just about me…
• Together we might:
– build an encyclopedia
– …and rate restaurants
– …provide book reviews to each other
– …or movie reviews
– …or reviews of service providers
– …organize sit-ins and social action
– …and more data might just be Open
– …more scientists might collaborate and share
22. It is so difficult to navigate…
IP?
IP?
What’s the
What’s the
structure?
structure?
Are they in
Are they in
our file?
our file?
What’s
What’s
similar?
similar?
What’s the
What’s the
Pharmacology
Pharmacology target?
target?
data?
data?
Known
Known
Pathways?
Pathways?
Competitors?
Competitors?
Working On
Working On
Connections to
Connections to Now?
Now?
disease?
disease?
Expressed in
Expressed in
right cell type?
right cell type?
23. Let’s Change the World
• Let’s map together all historical chemistry data
and build systems to integrate new data
• Heck, let’s integrate chemistry and biology data
and add in disease data too
• Lets model the data and see if we can extract
new relationships – quantitative and qualitative
• Let’s make it all available on the web
25. What About Something Smaller?
• We’re going to map the world
• We’re going to take photos of as many places
as we can and link them together
• We’ll let people annotate and curate the map
• Then let’s make it available free on the web
• We’ll make it available for decision making
• Put it on Mobile Devices, Give it Away
35. Whoa…
• So the world can be mapped…
• We can enter a 3D environment within the map
• We can add annotations
• We can use the data, we can reference it, we
can extract it, we can make decisions with it
• And we can do it on our lap, in our hands
• Let’s crowdsource chemistry and biology!!!
36. Science is being Crowdsourced
• Crowdsourcing science is happening…
– Contribution of data
• Our data, About us
• Our data, generated in labs
• Open Data, data validation and curation
– Contribution of software
• Open Source, Open Standards
– Contribution of funding
37. If we can map the planet…
• …then we should map the Galaxy!
48. Back to this….
• Let’s map together all historical chemistry data
and build systems to integrate new data
• Heck, let’s integrate chemistry and biology data
and add in disease data too
• Lets model the data and see if we can extract
new relationships – quantitative and qualitative
• Let’s make it all available on the web
49. How can I contribute to chemistry?
• Publish data, share data, validate and curate data
• Publish chemicals, syntheses and data
• “Publish” – Papers, Blogs, Reports, Tweets,
Presentations, Videos
• Contribute to Wikipedia
• Participate in chemistry communities
• Contribute to the Big Data
50. About Me…as a Chemist
• I’ve performed a few dozen chemical syntheses
• I’ve run thousands of analytical spectra
• I’ve generated thousands of NMR assignments
• I’ve probably published <5% of all work
• Most of it has been lost
• But things can be different today….
59. Data & Curations to ChemSpider
• The Royal Society of Chemistry free database
• 28.5 million chemicals and growing daily
• Software interfaces to integrate to
• Amenable to community contribution
– Deposit structures, property data, spectral data
– Data annotation, validation and curation
60.
61. • 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using semantic
web technologies
• Open source code, open data and open standards
• Academics, Pharma companies, Publishers….
63. (Some) Publishers are Changing?
• Data cannot be copyrighted and we have lots
• Scientists contribute data in document form
• Most publishers are open to Open Access
• Scientific publications are built on data so what
can be done to release the data? Much data is
not published? Many scientists will not share…
64. Publications - a summary of work
• Scientific publications are a summary of work
– Is all work reported?
– How much science is lost to pruning?
– What of value sits in notebooks and is lost?
• How much data is lost?
– How many compounds never reported?
– How many syntheses fail or succeed?
– How many characterization measurements?
65. Community Repository for Data
• Funding agencies encourage sharing of data
• Increasing availability of “Open Data”
• Institutional repositories have no specific domain
support
• Why not develop a community repository for
chemistry data – private, public, embargoed?
• Provides data to develop models/algorithms?
66. Chemical Database
Service
• National Chemical Database Service
for UK Academics
• Integrating Commercial Databases
and Services
• Chemicals, analytical data,
prediction algorithms
• Development of data repository
67. Model Building with Community Data
• Community data as a basis of model building
– Consume data from available databases, community
data, new publications and build predictive
algorithms for the community
– How many algorithms are reported and lost? How
much repeat work is done in the domain of
algorithmic development?
68. Pulling Data from our Archive
• Our contribution to the world of chemistry data
• DERA – digitally enabling the RSC archive
– Text mining
• Find chemicals, reactions, analytical data, properties
– Algorithmic checking
• Validate algorithmically what we can - robots
– “Web 2.0 interfaces” for curating and validating
69. What if we could capture it all?
Digitally Enhancing the RSC Archive
71. Web 2.0 Contribution
• We have been contributing
to the web for a along time
already – but how much in
chemistry?
• A few blogs, an increasing
amount of tweeting but
what about data sharing in
chemistry?
86. Chemistry is Dangerous
• Florida DJs May Face Felony for April Fools'
Water Joke Worse Than Rubio's
“… told their listeners that "dihydrogen
monoxide" was coming out of the taps
throughout the Fort Myers area.”
90. Junk vs Real
“We then established a collaboration with
professor Sum Ting Wong, a fugitive from the
North Korean University Hu Yu Hai Ding”
“..identified as the new protein Wai So Dim”
98. Remember Quantifying Scientists
• Scientists Impact Factors. Science is both
competitive and collaborative
• Can we measure ALL contributions to science?
106. Enabled by
• Persistent unique digital identifier
• Integrates to workflows such as manuscript
and grant submission
• Supports automated linkages with your
professional activities
107. Micropublishing
How much data is lost?
• How many reactions never get published?
• How much data could be shared?
• How many properties are measured and lost?
• What stands in the way of sharing?
– Is it technology?
– Permissions? “The Boss”, Licensing?
115. Rewards and Recognition
• The badgesonomy culture
of recognition is growing.
• Badges are commonplace
– FourSquare
– Klout
116. Rewards and Recognition
• Rewards and Recognition
starting with CSSP then
expands to other platforms
• Including paths to expose
such recognition on
AltMetrics platforms – in
discussion…
117. Impact by Data Set on
Data
IC50 Measurements for 62 substituted benzoxazoles
ChemSpider Data Repository: DOI: 10.1356/CSID784.4