SlideShare a Scribd company logo
1 of 107
Class 7…giant balancing
'if I have seen further it is by
standing on the shoulders of
giants'.
Scott Edmunds, HKU Data Curation MLIM7350
Communicating in-class
• Chat channel:
• http://backchannelchat.com/chat/dw131
• Feel free to ask questions, requests to speed
up/slow down
Also feel free to email: scott@gigasciencejournal.com
About me:
• Scott Edmunds
• Molecular biology, sci editing & comms
• Scientific journal & (big) data publishing
• Reproducibility & open science
• Open Data Hong Kong & Citizen Science
Journal, data-platform and database for
large-scale biological data
www.gigasciencejournal.com
About me:
• Formerly Beijing Genomics Institute
• Founded in 1999 (1% of HGP)
• China’s 1st citizen managed not-for-profit research
institute funded by commercial sequencing-as-a-service
(BGI Tech)
• Now largest genomic organization in the world
• HQ in Shenzhen, international data production in BGI HK
(Tai Po)
About my employer:
Open Data Hong Kong
ExCom member
for Open Science
Open Science
Working Group
WHY CURATE DATA?
WHY SHARE DATA?
WHY SHARE DATA?
https://okfn.org/
WHAT EXACTLY IS “OPEN DATA"?
What is open data (公开数据)?
http://opendefinition.org/od/2.0/en/
OKFN: 8 types of open data
http://science.okfn.org/
Research Data ≈ Government Data
Canada's Action Plan on Open Government 2014-16
http://open.canada.ca/en/content/canadas-action-plan-open-government-2014-16
Research Data policies growing globally
http://ec.europa.eu/research/openscience/index.cfm?section=monitor&pg=researchdata#1
https://data.gov.hk
HK has “Public Sector Information"
Why Licensing is Important for:
http://dx.doi.org/10.1186/1756-0500-5-494
Placing restrictions on the reuse of scientific information,
particularly data, slows down the pace of research. Furthermore,
legal requirements for attribution ingrained in licenses such as CC-BY
can prohibit future research across large collections of content – as
commonly happens in data mining.
Therefore, to eliminate legal impediments to integration and re-use
of data, such as this stacking of attribution requirements in large
collections of data, and to help enable long-term interoperability an
appropriate license or waiver specific to data should be applied.
Panton Principles
http://pantonprinciples.org/
=
CC0 better than CC-BY for datasets to prevent “attribution stacking”
Levels of openness: 5★’s of open data
http://5stardata.info
Levels of openness: 5★’s of open data
http://5stardata.info
★ - make your stuff available on the Web (whatever format)
under an open license
★★ - make it available as structured data (e.g., Excel instead of
image scan of a table)
★★★ - make it available in a non-proprietary open format (e.g.,
CSV as well as of Excel)
★★★★ - use URIs to denote things, so that people can point at
your stuff
★★★★★ - link your data to other data to provide context
Levels of openness: 5★’s of open data
Exercise: What star rating is this data?
Example: Hong Kong: Dengue Mosquito Breeding
Habitatshttp://www.fehd.gov.hk/english/safefood/dengue_fever/images/montlyO
vitrap_2003-2016.pdf
http://www.fehd.gov.hk/english/safefood/dengue_fever/
Static PDFs, images, not on data.gov.hk, no licensing information = ?
Levels of openness: 5★’s of open data
http://5stardata.info
Exercise: What star rating is this data?
1. HK FEHD: Distribution of the number of live pigs sold at different
auction prices on the day https://data.gov.hk/en-data/dataset/hk-
fehd-fehdsh-daily-auction
2. Singapore: Dengue Mosquito Breeding Habitats
https://data.gov.sg/dataset/dengue-mosquito-breeding-habitats
3. Linked Drug-Drug Interactions (LIDDI)
https://datahub.io/dataset/linked-drug-drug-interactions-liddi
Why closed data sucks?
https://commons.wikimedia.org/wiki/File:Inner_door_in_forbidden_city.jpg
Hong Kong Edition
https://data.gov.hk
Gov't spend on open data platform =
$1.2M
Gov't spend on 20 rubbish apps =
$20M
https://www.hongkongfp.com/2015/09/14/public-finance-concern-
group-raps-10-rubbish-govt-apps-one-has-only-10-downloads/
Why closed data sucks?
What the Gov't builds for $20M What open data can build for free
http://gazetteer.hk/
Hong Kong Edition
Why closed data sucks?
Open Data as a revenue stream...
Hong Kong Edition
Why closed data sucks?
Open Data as a revenue stream means can't share conservation data...
Why closed data kills spoonbills?
Climate change, global hunger, pollution, cancer,
disease outbreaks…
http://www.nature.com/news/data-sharing-make-outbreak-research-open-access-1.16966
Why closed data kills people?
Open Data as a revenue stream means can't share cancer data...
https://www.change.org/p/mark-c-capone-ceo-of-myriad-genetics-myriad-genetics-give-us-our-damn-brca-data
Why closed data kills women?
Open Data as a revenue (publishing) stream means nobody is sharing ethnic Chinese
control data to enable pharmacogenomics to work on Chinese populations...
Why closed data kills Chinese populations?
THE REPRODUCIBILITY CRISIS
How research is disseminated
18121665 1869
Consequences of 351 year old incentive systems…
Buckheit & Donoho: Scholarly articles are
merely advertisement of scholarship. The
actual scholarly artifacts, i.e. the data and
computational methods, which support
the scholarship, remain largely
inaccessible.
The consequences: growing replication gap
1. Ioannidis et al., (2009). Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8)
Out of 18 microarray papers, results
from 10 could not be reproduced
1. http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.1001747
The challenge: reproducibility
Replication rates as low as 11%
http://www.nature.com/nature/journal/v483/n7391/full/483531a.html
https://osf.io/e81xl/wiki/home/
Growing Issue: increasing number of retractions
>15X increase in last decade
Strong correlation of “retraction index” with
higher impact factor
1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?
Growing Issue: increasing number of retractions
>15X increase in last decade
Strong correlation of “retraction index” with
higher impact factor
At current % increase by 2045 as
many papers published as
retracted!
1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
2. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
Problem: growing replication gap
1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
More retractions:
>15X increase in last decade
At current % > by 2045 as many papers published as retracted
Insufficient methods
The Cost of Scientific Retractions?
A: $400,000 per paper
https://elifesciences.org/content/3/e02956
Only policy that counts…IMPACT FACTOR
What is the journal Impact Factor (jIF)?
• Citation Index concept first developed
by Eugene Garfield in 1955 (Science)
• Formed Institute of Scientific
Information (ISI) in 1960
• Science Citation Index (SCI) launched
in 1963.
• Web version (Web of Science)
launched in 1997.
• ISI purchased by Thomson-Reuters in
1992.
• Sold as part of their Intellectual Property & Science portfolio in July 2016
for $3.55B USD to private equity funds.
https://commons.wikimedia.org/wiki/File:Eugene_Garfield_HD2007_Ric
hard_J._Bolte_Sr._Award.TIF
How do you calculate the jIF?
1. Count the total number of citations from the two
years before the IF release year.
2. Count total number of papers published in the two
years before IF release year
3. Divide number of citations by number of papers
2015 IF = # Citations for 2013-2014
# of Papers in 2013-2014
2015 20132014
1. Count the total number of citations from the two
years before the IF release year.
2. Count total number of papers published in the two
years before IF release year
3. Divide number of citations by number of papers
2015 IF = # Citations for 2013-2014
# of Papers in 2013-2014
2015 20132014
TWO PROBLEMS
1. Count the total number of citations from the two
years before the IF release year.
2. Count total number of papers published in the two
years before IF release year
3. Divide number of citations by number of papers
2015 IF = # Citations for 2013-2014
# of Papers in 2013-2014
2015 20132014
TWO PROBLEMS
1. Rewards/incentivizes short term citations only
2015 20132014
Two PROBLEMS
1. Rewards/incentivizes short term citations only
Impact factor driven science =
JIFBAIT Network
more
GWAS
GWAS
JIFBAIT NEWS
Arsenic Life forms, will
they take over the planet?
By Melba Ketchum, PhD
Which Overhyped, Unreproducible
Experiment Are You?
Want rapid citations for 2 years only? Carry out this quiz.
You got: STAP Cells
Of course dipping cells in
coffee will make them
pluripotent. Even if the
research gets discredited, it’ll
still get 100’s of citations in
two years.
1. Count the total number of citations from the two
years before the IF release year.
2. Count total number of papers published in the two
years before IF release year
3. Divide number of citations by number of papers
2015 IF = # Citations for 2013-2014
# of Papers in 2013-2014
2015 20132014
TWO PROBLEMS
2. How do you count denominator? Negotiated.
https://quantixed.wordpress.com/2016/01/05/the-great-curve-ii-citation-distributions-and-
reverse-engineering-the-jif/
http://bjoern.brembs.net/2016/01/even-without-retractions-
top-journals-publish-the-least-reliable-science/
http://iai.asm.org/content/79/10/3855.full
http://iai.asm.org/content/79/10/3855.full
Growing # of journals addressing this
http://dx.doi.org/10.1371/journal.pmed.1001607
QUANTIFYING REPRODUCIBILITY
Data
Same Different
Code
Same
Reproducible Replicable
Different
Robust Generalisabl
https://figshare.com/articles/Publishing_a_reproducible_paper/4720996
http://reproducibility.cs.arizona.edu/
Arizona Repeatability in
Computer Science Experiment
• 2015 study examining extent Computer Systems
researchers share their research artifacts (code)
• NSF policies on sharing code since 2005
• Examined 613 papers from ACM conferences & journals
•
• Attempted to locate source code that backed up results
• If found, tried to build the code.
http://reproducibility.cs.arizona.edu/
Arizona Repeatability in
Computer Science Experiment
• Manual curation/look for
code that backed up results
• If missing, emailed authors
• Chased if no reply
• If found, tried to build the
code
• Resolve issues
• Survey results
http://reproducibility.cs.arizona.edu/
613 papers
tested
123 successful
Reproductions (20%)
Arizona Repeatability in
Computer Science Experiment
Questions? | 15 minute break
The Hong Kong context
http://web.archive.org/web/20131127073400/http://openaccess.hk/about.html
Asia’s Academic City?
8 Universities, many ranked top 50 worldwide
100K students (UG/PG/FT/PT)
1 major research funder (UGC/RGC)
Grant budget = $17.5 BN HKD/yr ($2.3BN USD)
UGC Policy: “Realization of
making Hong Kong Asia's
world city is only possible if it
is based upon the platform of
a very strong education and
higher education sector. “
http://www.ugc.edu.hk/eng/ugc/policy/policy.htm
Asia’s Academic City?
8 Universities, many ranked top 50 worldwide
100K students (UG/PG/FT/PT)
1 major research funder (UGC/RGC)
Grant budget = $17.5 BN HKD/yr ($2.3BN USD)
UGC Policy: “Realization of
making Hong Kong Asia's
world city is only possible if it
is based upon the platform of
a very strong education and
higher education sector. “
http://www.ugc.edu.hk/eng/ugc/policy/policy.htm
Data: WorldBank
R&D spending in HK amongst lowest in
Developed World
Hong Kong’s focus…
“The plot earmarked for expansion of Hong Kong Science Park might now be used to
build apartment blocks instead. Is the government backing down on its commitment to
project Hong Kong as a major technology hub?” http://bit.ly/1TxCRj3
“The plot earmarked for expansion of Hong Kong Science Park might now be used to
build apartment blocks instead. Is the government backing down on its commitment to
project Hong Kong as a major technology hub?” http://bit.ly/1TxCRj3
Hong Kong’s focus…
https://osf.io/cgpzb/
Open Science (Open Access & Open
Data) survey of Hong Kong
Any comments?
Science & Technology players in HK
Political forum Legislative Council (LegCo)
Policy
makers
Government Advisory Committee on Innovation and Technology
Innovation and Technology Bureau (ITB) Innovation and Technology Commission (ITC)
Financing Government EB Private Sector
ITC -> ITF Innov. & Tech. Venture Fund RGC UGC
Operators Universities Public Technology Support Organizations Private Sector
R&D Centres ASTRI
Facilitators HKPC HKTDC HKSTPC Cyberport HKIB
Commercialization Agents Business Enterprises New High Tech Ventures Multination Corporations
Researched policy, collected case studies,
FOI, interviewed many key players (funders,
libraries, administrators…)
HK: good with some parts of open…
http://hub.hku.hk/
http://index.okfn.org/
HK: bad with the rest…
https://data.gov.hk
HK: bad with the rest…
Signatories to Berlin OA Declaration
OA Policies in Hong Kong
Hidden at the back of RGC guidelines
http://www.ugc.edu.hk/eng/doc/rgc/form/srfdp_sr2.pdf
IR: infrastructure is (mostly) there
http://www.julac.org/?page_id=79
IR: infrastructure is (mostly) there
http://repositories.webometrics.info/en/Asia/Hong%20Kong
IR: infrastructure is (mostly) there
No policies, Mo’ problems
Q: How much is spent on Open/Closed Access in HK?
A: Nobody has any idea!
https://lists.okfn.org/pipermail/open-access/2014-May/001888.html
In China publication + JIF = money = fraud
Attempts to “game the peer-review system on an industrial
scale”
1. http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/
2. http://www.grassley.senate.gov/sites/default/files/about/upload/Senator-Grassley-Report.pdf
Companies offering authorship of papers made to order by “paper
mills”1. Common ghostwriting medical papers by pharma2
Guaranteed publication in JIF journal, often using fake referees, ID
theft, etc.
1. http://dx.doi.org/10.1087/20110203
2. http://blog.thegrandlocus.com/2014/10/a-flurry-of-copycats-on-pubmed
3. http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/
What is the cost of the jIF?
JIF 2 = $10,000 USD
JIF 5 = $20,000 USD
Buy Sell
C/N/S = $30,000 USD
JIF 10 = $1,500 USD
1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic-
incentives-curb-research
Created by skewed incentive systems in China…
“While we are rightly proud of Hong Kong’s highly regarded and ranked
universities system, we are not immune to the same pressures. While
funders in Europe have moved away from using citation based metrics such
as JIF in their research assessments, the Hong Kong University Grants
Committee states in their Research Assessment Exercise guidelines that they
may informally use it.”
1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic-
incentives-curb-research
And this is now happening in Hong Kong too!
JIF 2 = $8,000 USD
JIF 5 = $15,000 USD
Buy
How to fight back: Sign DORA.
http://www.ascb.org/dora/
Political forum Legislative Council (LegCo)
Policy
makers
Government Advisory Committee on Innovation and Technology
Innovation and Technology Bureau (ITB) Innovation and Technology Commission (ITC)
Financing Government EB Private Sector
ITC -> ITF Innov. & Tech. Venture Fund RGC UGC
Operators Universities Public Technology Support Organizations Private Sector
R&D Centres ASTRI
Facilitators HKPC HKTDC HKSTPC Cyberport HKIB
Commercialization Agents Business Enterprises New High Tech Ventures Multination Corporations
Who needs to provide leadership?
What new infrastructure do we need?
Science & Technology players in HK
Who needs to provide leadership?
RGC/UGC & new ITB
What new infrastructure do we need?
New “HK Data Service”, stewardship & platforms
Science & Technology players in HK
Political forum Legislative Council (LegCo)
Policy
makers
Government Advisory Committee on Innovation and Technology
Innovation and Technology Bureau (ITB) Innovation and Technology Commission (ITC)
Financing Government EB Private Sector
ITC -> ITF Innov. & Tech. Venture Fund RGC UGC
Operators Universities Public Technology Support Organizations Private Sector
R&D Centres ASTRI
Data Curators & Stewards (Libraries, OGCIO, Data Studio@SP)
Facilitators HKPC HKTDC HKSTPC Cyberport HKIB
Data Disseminators (HARNET, data.gov.hk, "HK Data Service")
Commercialization Agents Business Enterprises New High Tech Ventures Multination Corporations
Downstream Users (Researchers, Innovators, Citizens)
Academic/com
mercial cloud
If Government doesn’t act,
Universities need to lead way
http://hub.hku.hk/advanced-search?location=crisdataset
If Government doesn’t act,
Universities need to lead way
http://www.rss.hku.hk/integrity/research-data-records-management
First CRIS in HK, built upon ScholarsHub
http://hub.hku.hk/advanced-search?location=crisdataset
First CRIS in HK, built upon ScholarsHub
http://lib.hku.hk/researchdata/rpg.htm
“Beginning with the September 2017 intake, all HKU
research postgraduate (rpg) students have responsibility
for 1) using a data management plan (DMP), where
applicable, to describe the use of data in preparation for,
or in the generation of their theses, and 2) depositing,
where applicable, a dataset in the HKU Scholars Hub.”
First CRIS in HK, built upon ScholarsHub
http://hub.hku.hk/advanced-search?location=crisdataset
First CRIS in HK, built upon ScholarsHub
http://hub.hku.hk/advanced-search?location=crisdataset
First CRIS in HK, built upon ScholarsHub
http://hub.hku.hk/advanced-search?location=crisdataset
CC-BY NC by default
First CRIS in HK, built upon ScholarsHub
http://hub.hku.hk/advanced-search?location=crisdataset
Licensing T&Cs
HK CRIS: Further reading/resources
https://youtu.be/focv1z3lpPI
RPg Students -- Instructions for Data:
http://lib.hku.hk/researchdata/rpg.htm
Depositor's User Guide:
http://lib.hku.hk/researchdata/deposit_page.htm
Seminar slides from HKU Library
http://www.rss.hku.hk/integrity/rcr/rcr-info/seminars
See also ReShare
video guide:
The cost to Hong Kong of not doing this?
• Estimates lack of citation impact not being OA = 50% ($8.75B?)2
• How much is the HK taxpayer losing through missing out on potential
collaborations, wider engagement & unrepeatable work?
HK UCG grant budget = $17.5 Billion HKD/yr (4% of Gov spending)
Taking lowest reported reproducibility rates (11%) = >$15 billion wasted1
$$
$
1. http://www.nature.com/nature/journal/v483/n7391/full/483531a.html
2. http://www.ecs.soton.ac.uk/~harnad/Temp/research-australia.doc
https://osf.io/cgpzb/
Open Science (Open Access & Open
Data) survey of Hong Kong
Reading/Reflection for
next class
Thoughts and ideas why Hong Kong is
lagging behind US/EU?
Any ideas what we need to do to move
forward?
Any feedback on the survey?
QUANTIFYING REPRODUCIBILITY IN HK
HKU Repeatability in HK
Research Experiment
• HKU policy on data sharing from 2015
• PLOS policy mandating sharing of supporting March 1,
2014
• HKU has published 267 PLOS ONE papers 2014-date
• Can we quantify reproducibility in a sample of these?
• Easy exercise in literature curation
• 2016 HKU PLOS publications = 49 papers
http://hub.hku.hk/simple-
search?query=&location=publication&sort_by=bi_sort_2_sort&order=asc&rpp=25&filter_field_1=journal&filter_type_
1=equals&filter_value_1=plos+one&filter_field_2=dateIssued&filter_type_2=equals&filter_value_2=[2014+TO+2017]&
filter_field_3=dctype&filter_type_3=equals&filter_value_3=article&etal=0&filtername=dateIssued&filterquery=2016&f
iltertype=equals
HKU Repeatability in HK
Research Experiment
• Everyone assigned 5 2016 HKU PLOS papers
• Quickly scan paper looking for supporting data
• If no data, ignore
• If uses data, is it all associated with the paper?
• If external data, is it available from URL or accession?
• If “data available on request”, are they contactable?
• Don’t spend more than 5mins per article
• Add data into googledoc, and we’ll go through results &
feedback next class
Homework/Case study: literature curation exercise
HKU Repeatability in HK
Research Experiment
Example 1.
https://docs.google.com/spreadsheets/d/15BszEhUodygyu4eGckR2b5p153nyeY
mB3Uh4U23HX-o/edit?usp=sharing
HKU Repeatability in HK
Research Experiment
Example 1.
Is there data presented in the paper? – Yes
Is there external data, and if so what is the
link/accession? – No
Is all the data in the paper available? – No
Comments - Has questionnaire, but not data as
says "minimal anonymized dataset will be made
available upon request”
Enter data here:
https://docs.google.com/spreadsheets/d/15BszEhUodygyu4eGckR2b5p153nye
YmB3Uh4U23HX-o/edit?usp=sharing
HKU Repeatability in HK
Research Experiment
Example 1.
OPTIONAL: Optional: If data missing, do the authors respond if contacted?
Enter data here:
https://docs.google.com/spreadsheets/d/15BszEhUodygyu4eGckR2b5p153nye
YmB3Uh4U23HX-o/edit?usp=sharing
Final Project
• For the final project for this course, you can
choose from 3 assignment options.
• The assignment is due on the 15th May and it
is worth 40% of your grade.
• Time will be set aside for presenting a
provisional draft of this during the final class
on the 24th April.
Final Project: Option 1
Write an Annotated Bibliography about data curation practices in an
academic discipline of your choosing.
• Choose a discipline (sciences, social sciences, & humanities) OR choose the topic of
“open data.”
• Summarize data practices in your chosen discipline or topic. (5-7 sentences)
• Find 7-10 sources that relate that discipline or topic to data creation, management,
and/or curation.
• Provide a citation for the source in APA style.
• Write a short annotation that summarizes the content of the source. You may
include quotes from the source sparingly, but the annotations should be mostly, if
not entirely, in your own words. (3-5 sentences)
• Explain the relevance of the source with relation to the data practices of your
chosen discipline or topic. (1-2 sentences)
• Find a few example public datasets to demonstrate the above points. Cite the data
in the relevant places in the Bibliography according to the Data Citation Principles.
• Refer to this guide for more information about annotated bibliographies:
http://sites.umuc.edu/library/libhow/bibliography_tutorial.cfm. Your annotation
should be in the “Descriptive” style.
Final Project: Option 2
Using a relevant dataset (this can either be from the literature
curation exercise, a BYO dataset, or one given to you), write a report
that includes a description of the dataset, a Data Management Plan,
and a guidelines document for the researcher(s).
• Describe the dataset that explains the form of the data and the academic discipline in which it
was created. This paragraph should provide context for the (3-5 sentences) 1-2 page Data
Management Plan following the guidelines from HKU or a granting body such as NSF.
• 1 page guidelines document that could be presented to the researcher(s) that provides
guidelines for their data (extant and forthcoming):
– Preservation
– Appraisal
– Documentation
• For the DMP and the guidelines document, you can extrapolate from the your dataset to
imagine additional details about the research practices that created the dataset and will create
more data in the future.
• Look for suitable data repositories that can host this data (institutional, general purpose, or
subject specific), and if there is one relevant then publish the data if you have permission, and
correctly cite the data in the relevant places in your report.
Final Project: Option 3
Prepare a 30 minute data curation workshop that you could teach to
researchers that would provide them the necessary details to
understand why data curation is relevant to them and best practices
they should follow.
• Slide deck that introduces data curation for a researcher audience. (No
more than 40 slides.)
• Presenter outline that describes the important points for each slide.
• Topics that might be addressed in your workshop: the value of data
management, writing a data management plan, data repository options.
You can assume your audience is researchers are at HKU.
• Make sure all of the content is copyright free, and share the final material
openly (e.g. figshare, scholarhub, OER commons, etc.), and with sufficient
metadata to make it discoverable.
Looking ahead…
• Next class on Monday 27th March we’ll go
from open to FAIR data
• We’ll also go through the reflection & curation
case studies
– Bring ideas & feedback, and we’ll look at the data
• Final project due 10th May
– Need to present preliminary version on 26th April
to get feedback before completion

More Related Content

What's hot

National Academy of Sciences - Improving the quality of scientific research t...
National Academy of Sciences - Improving the quality of scientific research t...National Academy of Sciences - Improving the quality of scientific research t...
National Academy of Sciences - Improving the quality of scientific research t...gphelan
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Towards a scientific data policy
Towards a scientific data policy Towards a scientific data policy
Towards a scientific data policy Roberto de Pinho
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
 
Scott Edmunds: Using FAIR principles for more Open & Democratic Science
Scott Edmunds: Using FAIR principles for more Open & Democratic ScienceScott Edmunds: Using FAIR principles for more Open & Democratic Science
Scott Edmunds: Using FAIR principles for more Open & Democratic ScienceGigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...GigaScience, BGI Hong Kong
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsManuel Corpas
 
High throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and thesesHigh throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and thesespetermurrayrust
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteDeep Kayal
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ NettabDuncan Hull
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search petermurrayrust
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open ScienceTheContentMine
 
Meyer Big Data SDP13
Meyer Big Data SDP13Meyer Big Data SDP13
Meyer Big Data SDP13Eric Meyer
 

What's hot (20)

RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
The new alchemy: Online networking, data sharing and research activity distri...
The new alchemy: Online networking, data sharing and research activity distri...The new alchemy: Online networking, data sharing and research activity distri...
The new alchemy: Online networking, data sharing and research activity distri...
 
National Academy of Sciences - Improving the quality of scientific research t...
National Academy of Sciences - Improving the quality of scientific research t...National Academy of Sciences - Improving the quality of scientific research t...
National Academy of Sciences - Improving the quality of scientific research t...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Towards a scientific data policy
Towards a scientific data policy Towards a scientific data policy
Towards a scientific data policy
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 
Scott Edmunds: Using FAIR principles for more Open & Democratic Science
Scott Edmunds: Using FAIR principles for more Open & Democratic ScienceScott Edmunds: Using FAIR principles for more Open & Democratic Science
Scott Edmunds: Using FAIR principles for more Open & Democratic Science
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
High throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and thesesHigh throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and theses
 
Reproducibility
ReproducibilityReproducibility
Reproducibility
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
Web 3.0 Emerging
Web 3.0 EmergingWeb 3.0 Emerging
Web 3.0 Emerging
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open Science
 
Meyer Big Data SDP13
Meyer Big Data SDP13Meyer Big Data SDP13
Meyer Big Data SDP13
 

Similar to HKU Data Curation MLIM7350 Class 7

HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open DataRoss Mounce
 
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sgScott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sgGigaScience, BGI Hong Kong
 
OSFair2017 | Barriers to Open Science for junior researchers
OSFair2017 | Barriers to Open Science for junior researchersOSFair2017 | Barriers to Open Science for junior researchers
OSFair2017 | Barriers to Open Science for junior researchersOpen Science Fair
 
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
Learn to speak open
Learn to speak openLearn to speak open
Learn to speak openLilian Juma
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global EcosystemPhilip Bourne
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangePhilip Bourne
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015William Gunn
 
Open Research Practices in the Age of a Papermill Pandemic
Open Research Practices in the Age of a Papermill PandemicOpen Research Practices in the Age of a Papermill Pandemic
Open Research Practices in the Age of a Papermill PandemicDorothy Bishop
 
Biomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterpriseBiomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterprisePhilip Bourne
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
 
Reward, reproducibility and recognition in research - the case for going Open
Reward, reproducibility and recognition in research - the case for going OpenReward, reproducibility and recognition in research - the case for going Open
Reward, reproducibility and recognition in research - the case for going OpenDanny Kingsley
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchGigaScience, BGI Hong Kong
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...Fiona Nielsen
 

Similar to HKU Data Curation MLIM7350 Class 7 (20)

HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sgScott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
 
OSFair2017 | Barriers to Open Science for junior researchers
OSFair2017 | Barriers to Open Science for junior researchersOSFair2017 | Barriers to Open Science for junior researchers
OSFair2017 | Barriers to Open Science for junior researchers
 
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
Learn to speak open
Learn to speak openLearn to speak open
Learn to speak open
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
Open Research Practices in the Age of a Papermill Pandemic
Open Research Practices in the Age of a Papermill PandemicOpen Research Practices in the Age of a Papermill Pandemic
Open Research Practices in the Age of a Papermill Pandemic
 
Biomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterpriseBiomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital Enterprise
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
 
Reward, reproducibility and recognition in research - the case for going Open
Reward, reproducibility and recognition in research - the case for going OpenReward, reproducibility and recognition in research - the case for going Open
Reward, reproducibility and recognition in research - the case for going Open
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do research
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 

More from Scott Edmunds

Free the Data! Pitch to Hong Kong Open Data Day 2019
Free the Data! Pitch to Hong Kong Open Data Day 2019Free the Data! Pitch to Hong Kong Open Data Day 2019
Free the Data! Pitch to Hong Kong Open Data Day 2019Scott Edmunds
 
Scott Edmunds: Access to Information Consultation Recomendations
Scott Edmunds: Access to Information Consultation RecomendationsScott Edmunds: Access to Information Consultation Recomendations
Scott Edmunds: Access to Information Consultation RecomendationsScott Edmunds
 
Open Data Hong Kong Update: CCCHK@10
Open Data Hong Kong Update: CCCHK@10Open Data Hong Kong Update: CCCHK@10
Open Data Hong Kong Update: CCCHK@10Scott Edmunds
 
Scott Edmunds Lightning talk: Experiences of NGO
Scott Edmunds Lightning talk: Experiences of NGOScott Edmunds Lightning talk: Experiences of NGO
Scott Edmunds Lightning talk: Experiences of NGOScott Edmunds
 
HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10Scott Edmunds
 
Emblematic education to know thy DNA? TEDxEduHK
Emblematic education to know thy DNA? TEDxEduHKEmblematic education to know thy DNA? TEDxEduHK
Emblematic education to know thy DNA? TEDxEduHKScott Edmunds
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
Hong Kong 2017 Open Data Day hackathon results: RacismWatch:HK
Hong Kong 2017 Open Data Day hackathon results: RacismWatch:HKHong Kong 2017 Open Data Day hackathon results: RacismWatch:HK
Hong Kong 2017 Open Data Day hackathon results: RacismWatch:HKScott Edmunds
 
Bauhinia Genome talk at the Galaxy Australasia Meeting
Bauhinia Genome talk at the Galaxy Australasia MeetingBauhinia Genome talk at the Galaxy Australasia Meeting
Bauhinia Genome talk at the Galaxy Australasia MeetingScott Edmunds
 
David Palmer: China Open Access week
David Palmer: China Open Access weekDavid Palmer: China Open Access week
David Palmer: China Open Access weekScott Edmunds
 
Bauhina Genome talk: Grass Roots Genomics: Using Hong Kong's Emblem to Crack ...
Bauhina Genome talk: Grass Roots Genomics: Using Hong Kong's Emblem to Crack ...Bauhina Genome talk: Grass Roots Genomics: Using Hong Kong's Emblem to Crack ...
Bauhina Genome talk: Grass Roots Genomics: Using Hong Kong's Emblem to Crack ...Scott Edmunds
 
ODHK.Meet.37 Intro to Research Data Policies and Platforms
ODHK.Meet.37 Intro to Research Data Policies and PlatformsODHK.Meet.37 Intro to Research Data Policies and Platforms
ODHK.Meet.37 Intro to Research Data Policies and PlatformsScott Edmunds
 
Scott Edmunds pitch Mosquito Alert at the Earthwatch HK Citizen Science meetup
Scott Edmunds pitch Mosquito Alert at the Earthwatch HK Citizen Science meetupScott Edmunds pitch Mosquito Alert at the Earthwatch HK Citizen Science meetup
Scott Edmunds pitch Mosquito Alert at the Earthwatch HK Citizen Science meetupScott Edmunds
 
Scott Edmunds talking Bauhina Genome at DIYBIOHK
Scott Edmunds talking Bauhina Genome at DIYBIOHKScott Edmunds talking Bauhina Genome at DIYBIOHK
Scott Edmunds talking Bauhina Genome at DIYBIOHKScott Edmunds
 
Introductory slides for the MakerBay/ODHK #ZikaHackathon
Introductory slides for the MakerBay/ODHK #ZikaHackathonIntroductory slides for the MakerBay/ODHK #ZikaHackathon
Introductory slides for the MakerBay/ODHK #ZikaHackathonScott Edmunds
 
Bauhina Genome slides for school visit
Bauhina Genome slides for school visitBauhina Genome slides for school visit
Bauhina Genome slides for school visitScott Edmunds
 
Intro for ODHK.meet.32 on Hacking the "Human Genome"
Intro for ODHK.meet.32 on Hacking the "Human Genome"Intro for ODHK.meet.32 on Hacking the "Human Genome"
Intro for ODHK.meet.32 on Hacking the "Human Genome"Scott Edmunds
 
BauhinaGenome preview at #ICG10
BauhinaGenome preview at #ICG10BauhinaGenome preview at #ICG10
BauhinaGenome preview at #ICG10Scott Edmunds
 
Amanda Meng at ODHK meet.29: Open Government Data & Social Impact
Amanda Meng at ODHK meet.29: Open Government Data & Social ImpactAmanda Meng at ODHK meet.29: Open Government Data & Social Impact
Amanda Meng at ODHK meet.29: Open Government Data & Social ImpactScott Edmunds
 
#ODHK: Open Data Pub Quiz
#ODHK: Open Data Pub Quiz#ODHK: Open Data Pub Quiz
#ODHK: Open Data Pub QuizScott Edmunds
 

More from Scott Edmunds (20)

Free the Data! Pitch to Hong Kong Open Data Day 2019
Free the Data! Pitch to Hong Kong Open Data Day 2019Free the Data! Pitch to Hong Kong Open Data Day 2019
Free the Data! Pitch to Hong Kong Open Data Day 2019
 
Scott Edmunds: Access to Information Consultation Recomendations
Scott Edmunds: Access to Information Consultation RecomendationsScott Edmunds: Access to Information Consultation Recomendations
Scott Edmunds: Access to Information Consultation Recomendations
 
Open Data Hong Kong Update: CCCHK@10
Open Data Hong Kong Update: CCCHK@10Open Data Hong Kong Update: CCCHK@10
Open Data Hong Kong Update: CCCHK@10
 
Scott Edmunds Lightning talk: Experiences of NGO
Scott Edmunds Lightning talk: Experiences of NGOScott Edmunds Lightning talk: Experiences of NGO
Scott Edmunds Lightning talk: Experiences of NGO
 
HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10
 
Emblematic education to know thy DNA? TEDxEduHK
Emblematic education to know thy DNA? TEDxEduHKEmblematic education to know thy DNA? TEDxEduHK
Emblematic education to know thy DNA? TEDxEduHK
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Hong Kong 2017 Open Data Day hackathon results: RacismWatch:HK
Hong Kong 2017 Open Data Day hackathon results: RacismWatch:HKHong Kong 2017 Open Data Day hackathon results: RacismWatch:HK
Hong Kong 2017 Open Data Day hackathon results: RacismWatch:HK
 
Bauhinia Genome talk at the Galaxy Australasia Meeting
Bauhinia Genome talk at the Galaxy Australasia MeetingBauhinia Genome talk at the Galaxy Australasia Meeting
Bauhinia Genome talk at the Galaxy Australasia Meeting
 
David Palmer: China Open Access week
David Palmer: China Open Access weekDavid Palmer: China Open Access week
David Palmer: China Open Access week
 
Bauhina Genome talk: Grass Roots Genomics: Using Hong Kong's Emblem to Crack ...
Bauhina Genome talk: Grass Roots Genomics: Using Hong Kong's Emblem to Crack ...Bauhina Genome talk: Grass Roots Genomics: Using Hong Kong's Emblem to Crack ...
Bauhina Genome talk: Grass Roots Genomics: Using Hong Kong's Emblem to Crack ...
 
ODHK.Meet.37 Intro to Research Data Policies and Platforms
ODHK.Meet.37 Intro to Research Data Policies and PlatformsODHK.Meet.37 Intro to Research Data Policies and Platforms
ODHK.Meet.37 Intro to Research Data Policies and Platforms
 
Scott Edmunds pitch Mosquito Alert at the Earthwatch HK Citizen Science meetup
Scott Edmunds pitch Mosquito Alert at the Earthwatch HK Citizen Science meetupScott Edmunds pitch Mosquito Alert at the Earthwatch HK Citizen Science meetup
Scott Edmunds pitch Mosquito Alert at the Earthwatch HK Citizen Science meetup
 
Scott Edmunds talking Bauhina Genome at DIYBIOHK
Scott Edmunds talking Bauhina Genome at DIYBIOHKScott Edmunds talking Bauhina Genome at DIYBIOHK
Scott Edmunds talking Bauhina Genome at DIYBIOHK
 
Introductory slides for the MakerBay/ODHK #ZikaHackathon
Introductory slides for the MakerBay/ODHK #ZikaHackathonIntroductory slides for the MakerBay/ODHK #ZikaHackathon
Introductory slides for the MakerBay/ODHK #ZikaHackathon
 
Bauhina Genome slides for school visit
Bauhina Genome slides for school visitBauhina Genome slides for school visit
Bauhina Genome slides for school visit
 
Intro for ODHK.meet.32 on Hacking the "Human Genome"
Intro for ODHK.meet.32 on Hacking the "Human Genome"Intro for ODHK.meet.32 on Hacking the "Human Genome"
Intro for ODHK.meet.32 on Hacking the "Human Genome"
 
BauhinaGenome preview at #ICG10
BauhinaGenome preview at #ICG10BauhinaGenome preview at #ICG10
BauhinaGenome preview at #ICG10
 
Amanda Meng at ODHK meet.29: Open Government Data & Social Impact
Amanda Meng at ODHK meet.29: Open Government Data & Social ImpactAmanda Meng at ODHK meet.29: Open Government Data & Social Impact
Amanda Meng at ODHK meet.29: Open Government Data & Social Impact
 
#ODHK: Open Data Pub Quiz
#ODHK: Open Data Pub Quiz#ODHK: Open Data Pub Quiz
#ODHK: Open Data Pub Quiz
 

Recently uploaded

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

HKU Data Curation MLIM7350 Class 7

  • 1. Class 7…giant balancing 'if I have seen further it is by standing on the shoulders of giants'. Scott Edmunds, HKU Data Curation MLIM7350
  • 2. Communicating in-class • Chat channel: • http://backchannelchat.com/chat/dw131 • Feel free to ask questions, requests to speed up/slow down Also feel free to email: scott@gigasciencejournal.com
  • 3. About me: • Scott Edmunds • Molecular biology, sci editing & comms • Scientific journal & (big) data publishing • Reproducibility & open science • Open Data Hong Kong & Citizen Science Journal, data-platform and database for large-scale biological data www.gigasciencejournal.com
  • 5. • Formerly Beijing Genomics Institute • Founded in 1999 (1% of HGP) • China’s 1st citizen managed not-for-profit research institute funded by commercial sequencing-as-a-service (BGI Tech) • Now largest genomic organization in the world • HQ in Shenzhen, international data production in BGI HK (Tai Po) About my employer:
  • 6. Open Data Hong Kong ExCom member for Open Science Open Science Working Group
  • 7.
  • 11. WHAT EXACTLY IS “OPEN DATA"?
  • 12. What is open data (公开数据)? http://opendefinition.org/od/2.0/en/
  • 13. OKFN: 8 types of open data http://science.okfn.org/
  • 14.
  • 15. Research Data ≈ Government Data Canada's Action Plan on Open Government 2014-16 http://open.canada.ca/en/content/canadas-action-plan-open-government-2014-16
  • 16. Research Data policies growing globally http://ec.europa.eu/research/openscience/index.cfm?section=monitor&pg=researchdata#1
  • 18. Why Licensing is Important for: http://dx.doi.org/10.1186/1756-0500-5-494 Placing restrictions on the reuse of scientific information, particularly data, slows down the pace of research. Furthermore, legal requirements for attribution ingrained in licenses such as CC-BY can prohibit future research across large collections of content – as commonly happens in data mining. Therefore, to eliminate legal impediments to integration and re-use of data, such as this stacking of attribution requirements in large collections of data, and to help enable long-term interoperability an appropriate license or waiver specific to data should be applied.
  • 19. Panton Principles http://pantonprinciples.org/ = CC0 better than CC-BY for datasets to prevent “attribution stacking”
  • 20. Levels of openness: 5★’s of open data http://5stardata.info
  • 21. Levels of openness: 5★’s of open data http://5stardata.info ★ - make your stuff available on the Web (whatever format) under an open license ★★ - make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ - make it available in a non-proprietary open format (e.g., CSV as well as of Excel) ★★★★ - use URIs to denote things, so that people can point at your stuff ★★★★★ - link your data to other data to provide context
  • 22. Levels of openness: 5★’s of open data Exercise: What star rating is this data? Example: Hong Kong: Dengue Mosquito Breeding Habitatshttp://www.fehd.gov.hk/english/safefood/dengue_fever/images/montlyO vitrap_2003-2016.pdf http://www.fehd.gov.hk/english/safefood/dengue_fever/ Static PDFs, images, not on data.gov.hk, no licensing information = ?
  • 23. Levels of openness: 5★’s of open data http://5stardata.info Exercise: What star rating is this data? 1. HK FEHD: Distribution of the number of live pigs sold at different auction prices on the day https://data.gov.hk/en-data/dataset/hk- fehd-fehdsh-daily-auction 2. Singapore: Dengue Mosquito Breeding Habitats https://data.gov.sg/dataset/dengue-mosquito-breeding-habitats 3. Linked Drug-Drug Interactions (LIDDI) https://datahub.io/dataset/linked-drug-drug-interactions-liddi
  • 24. Why closed data sucks? https://commons.wikimedia.org/wiki/File:Inner_door_in_forbidden_city.jpg
  • 25. Hong Kong Edition https://data.gov.hk Gov't spend on open data platform = $1.2M Gov't spend on 20 rubbish apps = $20M https://www.hongkongfp.com/2015/09/14/public-finance-concern- group-raps-10-rubbish-govt-apps-one-has-only-10-downloads/ Why closed data sucks?
  • 26. What the Gov't builds for $20M What open data can build for free http://gazetteer.hk/ Hong Kong Edition Why closed data sucks?
  • 27. Open Data as a revenue stream... Hong Kong Edition Why closed data sucks?
  • 28. Open Data as a revenue stream means can't share conservation data... Why closed data kills spoonbills?
  • 29. Climate change, global hunger, pollution, cancer, disease outbreaks… http://www.nature.com/news/data-sharing-make-outbreak-research-open-access-1.16966 Why closed data kills people?
  • 30. Open Data as a revenue stream means can't share cancer data... https://www.change.org/p/mark-c-capone-ceo-of-myriad-genetics-myriad-genetics-give-us-our-damn-brca-data Why closed data kills women?
  • 31. Open Data as a revenue (publishing) stream means nobody is sharing ethnic Chinese control data to enable pharmacogenomics to work on Chinese populations... Why closed data kills Chinese populations?
  • 33. How research is disseminated 18121665 1869
  • 34. Consequences of 351 year old incentive systems… Buckheit & Donoho: Scholarly articles are merely advertisement of scholarship. The actual scholarly artifacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible.
  • 35. The consequences: growing replication gap 1. Ioannidis et al., (2009). Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8) Out of 18 microarray papers, results from 10 could not be reproduced
  • 37. Replication rates as low as 11% http://www.nature.com/nature/journal/v483/n7391/full/483531a.html https://osf.io/e81xl/wiki/home/
  • 38. Growing Issue: increasing number of retractions >15X increase in last decade Strong correlation of “retraction index” with higher impact factor 1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?
  • 39. Growing Issue: increasing number of retractions >15X increase in last decade Strong correlation of “retraction index” with higher impact factor At current % increase by 2045 as many papers published as retracted! 1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 2. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
  • 40. Problem: growing replication gap 1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950 More retractions: >15X increase in last decade At current % > by 2045 as many papers published as retracted Insufficient methods
  • 41. The Cost of Scientific Retractions? A: $400,000 per paper https://elifesciences.org/content/3/e02956
  • 42. Only policy that counts…IMPACT FACTOR
  • 43. What is the journal Impact Factor (jIF)? • Citation Index concept first developed by Eugene Garfield in 1955 (Science) • Formed Institute of Scientific Information (ISI) in 1960 • Science Citation Index (SCI) launched in 1963. • Web version (Web of Science) launched in 1997. • ISI purchased by Thomson-Reuters in 1992. • Sold as part of their Intellectual Property & Science portfolio in July 2016 for $3.55B USD to private equity funds. https://commons.wikimedia.org/wiki/File:Eugene_Garfield_HD2007_Ric hard_J._Bolte_Sr._Award.TIF
  • 44. How do you calculate the jIF? 1. Count the total number of citations from the two years before the IF release year. 2. Count total number of papers published in the two years before IF release year 3. Divide number of citations by number of papers 2015 IF = # Citations for 2013-2014 # of Papers in 2013-2014 2015 20132014
  • 45. 1. Count the total number of citations from the two years before the IF release year. 2. Count total number of papers published in the two years before IF release year 3. Divide number of citations by number of papers 2015 IF = # Citations for 2013-2014 # of Papers in 2013-2014 2015 20132014 TWO PROBLEMS
  • 46. 1. Count the total number of citations from the two years before the IF release year. 2. Count total number of papers published in the two years before IF release year 3. Divide number of citations by number of papers 2015 IF = # Citations for 2013-2014 # of Papers in 2013-2014 2015 20132014 TWO PROBLEMS 1. Rewards/incentivizes short term citations only
  • 47. 2015 20132014 Two PROBLEMS 1. Rewards/incentivizes short term citations only Impact factor driven science =
  • 48. JIFBAIT Network more GWAS GWAS JIFBAIT NEWS Arsenic Life forms, will they take over the planet? By Melba Ketchum, PhD Which Overhyped, Unreproducible Experiment Are You? Want rapid citations for 2 years only? Carry out this quiz. You got: STAP Cells Of course dipping cells in coffee will make them pluripotent. Even if the research gets discredited, it’ll still get 100’s of citations in two years.
  • 49. 1. Count the total number of citations from the two years before the IF release year. 2. Count total number of papers published in the two years before IF release year 3. Divide number of citations by number of papers 2015 IF = # Citations for 2013-2014 # of Papers in 2013-2014 2015 20132014 TWO PROBLEMS 2. How do you count denominator? Negotiated.
  • 53. Growing # of journals addressing this http://dx.doi.org/10.1371/journal.pmed.1001607
  • 55. Data Same Different Code Same Reproducible Replicable Different Robust Generalisabl https://figshare.com/articles/Publishing_a_reproducible_paper/4720996
  • 56. http://reproducibility.cs.arizona.edu/ Arizona Repeatability in Computer Science Experiment • 2015 study examining extent Computer Systems researchers share their research artifacts (code) • NSF policies on sharing code since 2005 • Examined 613 papers from ACM conferences & journals • • Attempted to locate source code that backed up results • If found, tried to build the code.
  • 57. http://reproducibility.cs.arizona.edu/ Arizona Repeatability in Computer Science Experiment • Manual curation/look for code that backed up results • If missing, emailed authors • Chased if no reply • If found, tried to build the code • Resolve issues • Survey results
  • 58. http://reproducibility.cs.arizona.edu/ 613 papers tested 123 successful Reproductions (20%) Arizona Repeatability in Computer Science Experiment
  • 59. Questions? | 15 minute break
  • 60. The Hong Kong context http://web.archive.org/web/20131127073400/http://openaccess.hk/about.html
  • 61. Asia’s Academic City? 8 Universities, many ranked top 50 worldwide 100K students (UG/PG/FT/PT) 1 major research funder (UGC/RGC) Grant budget = $17.5 BN HKD/yr ($2.3BN USD) UGC Policy: “Realization of making Hong Kong Asia's world city is only possible if it is based upon the platform of a very strong education and higher education sector. “ http://www.ugc.edu.hk/eng/ugc/policy/policy.htm
  • 62. Asia’s Academic City? 8 Universities, many ranked top 50 worldwide 100K students (UG/PG/FT/PT) 1 major research funder (UGC/RGC) Grant budget = $17.5 BN HKD/yr ($2.3BN USD) UGC Policy: “Realization of making Hong Kong Asia's world city is only possible if it is based upon the platform of a very strong education and higher education sector. “ http://www.ugc.edu.hk/eng/ugc/policy/policy.htm
  • 63. Data: WorldBank R&D spending in HK amongst lowest in Developed World
  • 64. Hong Kong’s focus… “The plot earmarked for expansion of Hong Kong Science Park might now be used to build apartment blocks instead. Is the government backing down on its commitment to project Hong Kong as a major technology hub?” http://bit.ly/1TxCRj3
  • 65. “The plot earmarked for expansion of Hong Kong Science Park might now be used to build apartment blocks instead. Is the government backing down on its commitment to project Hong Kong as a major technology hub?” http://bit.ly/1TxCRj3 Hong Kong’s focus…
  • 66. https://osf.io/cgpzb/ Open Science (Open Access & Open Data) survey of Hong Kong Any comments?
  • 67. Science & Technology players in HK Political forum Legislative Council (LegCo) Policy makers Government Advisory Committee on Innovation and Technology Innovation and Technology Bureau (ITB) Innovation and Technology Commission (ITC) Financing Government EB Private Sector ITC -> ITF Innov. & Tech. Venture Fund RGC UGC Operators Universities Public Technology Support Organizations Private Sector R&D Centres ASTRI Facilitators HKPC HKTDC HKSTPC Cyberport HKIB Commercialization Agents Business Enterprises New High Tech Ventures Multination Corporations Researched policy, collected case studies, FOI, interviewed many key players (funders, libraries, administrators…)
  • 68. HK: good with some parts of open… http://hub.hku.hk/
  • 71. Signatories to Berlin OA Declaration
  • 72. OA Policies in Hong Kong
  • 73. Hidden at the back of RGC guidelines http://www.ugc.edu.hk/eng/doc/rgc/form/srfdp_sr2.pdf
  • 74. IR: infrastructure is (mostly) there http://www.julac.org/?page_id=79
  • 75. IR: infrastructure is (mostly) there http://repositories.webometrics.info/en/Asia/Hong%20Kong
  • 76. IR: infrastructure is (mostly) there
  • 77. No policies, Mo’ problems
  • 78. Q: How much is spent on Open/Closed Access in HK? A: Nobody has any idea! https://lists.okfn.org/pipermail/open-access/2014-May/001888.html
  • 79. In China publication + JIF = money = fraud Attempts to “game the peer-review system on an industrial scale” 1. http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/ 2. http://www.grassley.senate.gov/sites/default/files/about/upload/Senator-Grassley-Report.pdf Companies offering authorship of papers made to order by “paper mills”1. Common ghostwriting medical papers by pharma2 Guaranteed publication in JIF journal, often using fake referees, ID theft, etc.
  • 80. 1. http://dx.doi.org/10.1087/20110203 2. http://blog.thegrandlocus.com/2014/10/a-flurry-of-copycats-on-pubmed 3. http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/ What is the cost of the jIF? JIF 2 = $10,000 USD JIF 5 = $20,000 USD Buy Sell C/N/S = $30,000 USD JIF 10 = $1,500 USD
  • 81. 1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic- incentives-curb-research Created by skewed incentive systems in China… “While we are rightly proud of Hong Kong’s highly regarded and ranked universities system, we are not immune to the same pressures. While funders in Europe have moved away from using citation based metrics such as JIF in their research assessments, the Hong Kong University Grants Committee states in their Research Assessment Exercise guidelines that they may informally use it.”
  • 83. How to fight back: Sign DORA. http://www.ascb.org/dora/
  • 84. Political forum Legislative Council (LegCo) Policy makers Government Advisory Committee on Innovation and Technology Innovation and Technology Bureau (ITB) Innovation and Technology Commission (ITC) Financing Government EB Private Sector ITC -> ITF Innov. & Tech. Venture Fund RGC UGC Operators Universities Public Technology Support Organizations Private Sector R&D Centres ASTRI Facilitators HKPC HKTDC HKSTPC Cyberport HKIB Commercialization Agents Business Enterprises New High Tech Ventures Multination Corporations Who needs to provide leadership? What new infrastructure do we need? Science & Technology players in HK
  • 85. Who needs to provide leadership? RGC/UGC & new ITB What new infrastructure do we need? New “HK Data Service”, stewardship & platforms Science & Technology players in HK Political forum Legislative Council (LegCo) Policy makers Government Advisory Committee on Innovation and Technology Innovation and Technology Bureau (ITB) Innovation and Technology Commission (ITC) Financing Government EB Private Sector ITC -> ITF Innov. & Tech. Venture Fund RGC UGC Operators Universities Public Technology Support Organizations Private Sector R&D Centres ASTRI Data Curators & Stewards (Libraries, OGCIO, Data Studio@SP) Facilitators HKPC HKTDC HKSTPC Cyberport HKIB Data Disseminators (HARNET, data.gov.hk, "HK Data Service") Commercialization Agents Business Enterprises New High Tech Ventures Multination Corporations Downstream Users (Researchers, Innovators, Citizens) Academic/com mercial cloud
  • 86. If Government doesn’t act, Universities need to lead way http://hub.hku.hk/advanced-search?location=crisdataset
  • 87. If Government doesn’t act, Universities need to lead way http://www.rss.hku.hk/integrity/research-data-records-management
  • 88. First CRIS in HK, built upon ScholarsHub http://hub.hku.hk/advanced-search?location=crisdataset
  • 89. First CRIS in HK, built upon ScholarsHub http://lib.hku.hk/researchdata/rpg.htm “Beginning with the September 2017 intake, all HKU research postgraduate (rpg) students have responsibility for 1) using a data management plan (DMP), where applicable, to describe the use of data in preparation for, or in the generation of their theses, and 2) depositing, where applicable, a dataset in the HKU Scholars Hub.”
  • 90. First CRIS in HK, built upon ScholarsHub http://hub.hku.hk/advanced-search?location=crisdataset
  • 91. First CRIS in HK, built upon ScholarsHub http://hub.hku.hk/advanced-search?location=crisdataset
  • 92. First CRIS in HK, built upon ScholarsHub http://hub.hku.hk/advanced-search?location=crisdataset CC-BY NC by default
  • 93. First CRIS in HK, built upon ScholarsHub http://hub.hku.hk/advanced-search?location=crisdataset Licensing T&Cs
  • 94. HK CRIS: Further reading/resources https://youtu.be/focv1z3lpPI RPg Students -- Instructions for Data: http://lib.hku.hk/researchdata/rpg.htm Depositor's User Guide: http://lib.hku.hk/researchdata/deposit_page.htm Seminar slides from HKU Library http://www.rss.hku.hk/integrity/rcr/rcr-info/seminars See also ReShare video guide:
  • 95. The cost to Hong Kong of not doing this? • Estimates lack of citation impact not being OA = 50% ($8.75B?)2 • How much is the HK taxpayer losing through missing out on potential collaborations, wider engagement & unrepeatable work? HK UCG grant budget = $17.5 Billion HKD/yr (4% of Gov spending) Taking lowest reported reproducibility rates (11%) = >$15 billion wasted1 $$ $ 1. http://www.nature.com/nature/journal/v483/n7391/full/483531a.html 2. http://www.ecs.soton.ac.uk/~harnad/Temp/research-australia.doc
  • 96. https://osf.io/cgpzb/ Open Science (Open Access & Open Data) survey of Hong Kong Reading/Reflection for next class Thoughts and ideas why Hong Kong is lagging behind US/EU? Any ideas what we need to do to move forward? Any feedback on the survey?
  • 98. HKU Repeatability in HK Research Experiment • HKU policy on data sharing from 2015 • PLOS policy mandating sharing of supporting March 1, 2014 • HKU has published 267 PLOS ONE papers 2014-date • Can we quantify reproducibility in a sample of these? • Easy exercise in literature curation • 2016 HKU PLOS publications = 49 papers http://hub.hku.hk/simple- search?query=&location=publication&sort_by=bi_sort_2_sort&order=asc&rpp=25&filter_field_1=journal&filter_type_ 1=equals&filter_value_1=plos+one&filter_field_2=dateIssued&filter_type_2=equals&filter_value_2=[2014+TO+2017]& filter_field_3=dctype&filter_type_3=equals&filter_value_3=article&etal=0&filtername=dateIssued&filterquery=2016&f iltertype=equals
  • 99. HKU Repeatability in HK Research Experiment • Everyone assigned 5 2016 HKU PLOS papers • Quickly scan paper looking for supporting data • If no data, ignore • If uses data, is it all associated with the paper? • If external data, is it available from URL or accession? • If “data available on request”, are they contactable? • Don’t spend more than 5mins per article • Add data into googledoc, and we’ll go through results & feedback next class Homework/Case study: literature curation exercise
  • 100. HKU Repeatability in HK Research Experiment Example 1. https://docs.google.com/spreadsheets/d/15BszEhUodygyu4eGckR2b5p153nyeY mB3Uh4U23HX-o/edit?usp=sharing
  • 101. HKU Repeatability in HK Research Experiment Example 1. Is there data presented in the paper? – Yes Is there external data, and if so what is the link/accession? – No Is all the data in the paper available? – No Comments - Has questionnaire, but not data as says "minimal anonymized dataset will be made available upon request” Enter data here: https://docs.google.com/spreadsheets/d/15BszEhUodygyu4eGckR2b5p153nye YmB3Uh4U23HX-o/edit?usp=sharing
  • 102. HKU Repeatability in HK Research Experiment Example 1. OPTIONAL: Optional: If data missing, do the authors respond if contacted? Enter data here: https://docs.google.com/spreadsheets/d/15BszEhUodygyu4eGckR2b5p153nye YmB3Uh4U23HX-o/edit?usp=sharing
  • 103. Final Project • For the final project for this course, you can choose from 3 assignment options. • The assignment is due on the 15th May and it is worth 40% of your grade. • Time will be set aside for presenting a provisional draft of this during the final class on the 24th April.
  • 104. Final Project: Option 1 Write an Annotated Bibliography about data curation practices in an academic discipline of your choosing. • Choose a discipline (sciences, social sciences, & humanities) OR choose the topic of “open data.” • Summarize data practices in your chosen discipline or topic. (5-7 sentences) • Find 7-10 sources that relate that discipline or topic to data creation, management, and/or curation. • Provide a citation for the source in APA style. • Write a short annotation that summarizes the content of the source. You may include quotes from the source sparingly, but the annotations should be mostly, if not entirely, in your own words. (3-5 sentences) • Explain the relevance of the source with relation to the data practices of your chosen discipline or topic. (1-2 sentences) • Find a few example public datasets to demonstrate the above points. Cite the data in the relevant places in the Bibliography according to the Data Citation Principles. • Refer to this guide for more information about annotated bibliographies: http://sites.umuc.edu/library/libhow/bibliography_tutorial.cfm. Your annotation should be in the “Descriptive” style.
  • 105. Final Project: Option 2 Using a relevant dataset (this can either be from the literature curation exercise, a BYO dataset, or one given to you), write a report that includes a description of the dataset, a Data Management Plan, and a guidelines document for the researcher(s). • Describe the dataset that explains the form of the data and the academic discipline in which it was created. This paragraph should provide context for the (3-5 sentences) 1-2 page Data Management Plan following the guidelines from HKU or a granting body such as NSF. • 1 page guidelines document that could be presented to the researcher(s) that provides guidelines for their data (extant and forthcoming): – Preservation – Appraisal – Documentation • For the DMP and the guidelines document, you can extrapolate from the your dataset to imagine additional details about the research practices that created the dataset and will create more data in the future. • Look for suitable data repositories that can host this data (institutional, general purpose, or subject specific), and if there is one relevant then publish the data if you have permission, and correctly cite the data in the relevant places in your report.
  • 106. Final Project: Option 3 Prepare a 30 minute data curation workshop that you could teach to researchers that would provide them the necessary details to understand why data curation is relevant to them and best practices they should follow. • Slide deck that introduces data curation for a researcher audience. (No more than 40 slides.) • Presenter outline that describes the important points for each slide. • Topics that might be addressed in your workshop: the value of data management, writing a data management plan, data repository options. You can assume your audience is researchers are at HKU. • Make sure all of the content is copyright free, and share the final material openly (e.g. figshare, scholarhub, OER commons, etc.), and with sufficient metadata to make it discoverable.
  • 107. Looking ahead… • Next class on Monday 27th March we’ll go from open to FAIR data • We’ll also go through the reflection & curation case studies – Bring ideas & feedback, and we’ll look at the data • Final project due 10th May – Need to present preliminary version on 26th April to get feedback before completion