SlideShare uma empresa Scribd logo
1 de 18
iDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of
Biodiversity Collections Program. Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the author(s) and do not necessarily reflect the views of
the National Science Foundation.
GUODA: A Unified Platform for
Large-Scale Computational
Research on Open-Access
Biodiversity Data
Matthew Collins, Alexander Thompson, Jorrit
Poelen, Jennifer Hammock
2
What is GUODA?
Global Unified Open Data Access
An informal collaboration between technologists from
organizations like EOL , ePANDDA, and iDigBio as well as
independent biodiversity informaticists. We share data use
cases, best practices, infrastructure, code, and ideas
around the science that can be done by analyzing large open-
access biodiversity datasets.
http://guoda.bio
3
What our members are interested in
Computation with biodiversity data
• Research at scale
• Lowering barriers to accessing computation
• Reproducibility
Matthew Collins
Technical Operations
Manager - iDigBio
Jorrit Poelen
Independant
Alexander Thompson
Software Products
Lead - iDigBio
Jennifer Hammock
Marine Theme
Coordinator - EOL
Nathan Bird
Software
Developer - iDigBio
4
An example use of GUODA
Does anyone use catalog numbers in
remarks fields to document relationships
between specimen records in iDigBio?
(We’re at TDWG so we’ve got to do
something with identifiers, right?)
5
A term-document index of iDigBio
(idb_df
.select(idb_df["uuid"],
idb_df["uuid"])
.where(sql.column("note") != "")
.withColumn("tokens",
udf_tokenize(sql.column("note")))
.select(sql.explode(sql.column("tokens")))
.groupBy(sql.column("uuid"),
sql.column("token"))
.count()
)
6
What terms match catalognumber?
joined = (idb_df_ids
.join(idb_tf_df,
on=idb_df_ids["idb_catalognumber"]
== idb_tf_df["token"])
.join(idb_df_notes,
on=sql.column("uuid") ==
idb_df_notes["note_uuid"])
.withColumn("catalognumber_len",
sql.length(sql.column("idb_catalognumber")
))
)
7
What do we find?
A few things like record
bd347847…
Has a remark
Part of Collection at FH:
barcode-00374180.
Which matches record
826da57a...
Histogram of matching
catalognumber length
8
How long did that take to write?
< 200 lines of code (including whitespace
and comments)
1 intermittent day of coding
https://github.com/iDigBio/idb-spark
9
How long did that take to run?
73.5 million records in iDigBio
to 151 million document:term:counts
40 minutes
Joined back to iDigBio resulting in 2.9 billion
terms found in the catalognumber field
3 hours 40 minutes
10
Good tools in the hands of people
with good ideas:
IDEAS RESULTSWORK
11
Servers!
Mesos
HDFS
Spark
Marathon
Docker
Cassandra
Infrastructure
Advanced Computing and Information Systems Lab
http://acis.ufl.edu
12
Data is half the tool
Copies of whole datasets
• Stored locally
• Refreshed automatically
Re-represent datasets in a useful structure for
high performance computing - parquet on
HDFS:
https://github.com/bio-guoda/guoda-datasets
13
Interfaces to GUODA
• Jupyter Notebooks for end-users
• Containers for API and web services
• Persistent storage for application state
• Hangouts calls every 2-4 weeks
14
The front door to GUODA
Notebooks
“Literate Programming”
Comments, code, and outputs all
together in a readable document
that describes what is being done
15
Here’s what it looks like
16
GUODA Jupyter notebook interface
17
What would you do with it?
Have a Github account and want to write
code? This is an alpha quality system.
http://jupyter.idigbio.org
Or talk to us if you want to host an
application on our systems
mcollins@acis.ufl.edu godfoder@acis.ufl.edu
iDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of
Biodiversity Collections Program. Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the author(s) and do not necessarily reflect the views of
the National Science Foundation.
idigbio.org/wiki
facebook.com/iDigBio
twitter.com/iDigBio
vimeo.com/iDigBio
idigbio.org/rss-feed.xml
idigbio.org/events-calendar/export.ics
Get involved!

Mais conteúdo relacionado

Semelhante a GUODA: A Unified Platform for Large-Scale Computational Research on Open-Access Biodiversity Data

Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
EGI Federation
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
dgarijo
 
The Reasons Why the Science Gateways Community Needs an Institute
The Reasons Why the Science Gateways Community Needs an InstituteThe Reasons Why the Science Gateways Community Needs an Institute
The Reasons Why the Science Gateways Community Needs an Institute
Sandra Gesing
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013
ECNOfficer
 

Semelhante a GUODA: A Unified Platform for Large-Scale Computational Research on Open-Access Biodiversity Data (20)

e-infrastructural needs to support informatics
e-infrastructural needs to support informaticse-infrastructural needs to support informatics
e-infrastructural needs to support informatics
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...
Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...
Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
e-Infrastructure @ Science
e-Infrastructure @ Sciencee-Infrastructure @ Science
e-Infrastructure @ Science
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
The Reasons Why the Science Gateways Community Needs an Institute
The Reasons Why the Science Gateways Community Needs an InstituteThe Reasons Why the Science Gateways Community Needs an Institute
The Reasons Why the Science Gateways Community Needs an Institute
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
 
Big data and hadoop introduction
Big data and hadoop introductionBig data and hadoop introduction
Big data and hadoop introduction
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
Dr H K Kaul
Dr H K KaulDr H K Kaul
Dr H K Kaul
 
Big data
Big dataBig data
Big data
 
Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...
Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...
Mining Whole Museum Collections Datasets for Expanding Understanding of Colle...
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 

Último

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 

Último (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 

GUODA: A Unified Platform for Large-Scale Computational Research on Open-Access Biodiversity Data

  • 1. iDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. GUODA: A Unified Platform for Large-Scale Computational Research on Open-Access Biodiversity Data Matthew Collins, Alexander Thompson, Jorrit Poelen, Jennifer Hammock
  • 2. 2 What is GUODA? Global Unified Open Data Access An informal collaboration between technologists from organizations like EOL , ePANDDA, and iDigBio as well as independent biodiversity informaticists. We share data use cases, best practices, infrastructure, code, and ideas around the science that can be done by analyzing large open- access biodiversity datasets. http://guoda.bio
  • 3. 3 What our members are interested in Computation with biodiversity data • Research at scale • Lowering barriers to accessing computation • Reproducibility Matthew Collins Technical Operations Manager - iDigBio Jorrit Poelen Independant Alexander Thompson Software Products Lead - iDigBio Jennifer Hammock Marine Theme Coordinator - EOL Nathan Bird Software Developer - iDigBio
  • 4. 4 An example use of GUODA Does anyone use catalog numbers in remarks fields to document relationships between specimen records in iDigBio? (We’re at TDWG so we’ve got to do something with identifiers, right?)
  • 5. 5 A term-document index of iDigBio (idb_df .select(idb_df["uuid"], idb_df["uuid"]) .where(sql.column("note") != "") .withColumn("tokens", udf_tokenize(sql.column("note"))) .select(sql.explode(sql.column("tokens"))) .groupBy(sql.column("uuid"), sql.column("token")) .count() )
  • 6. 6 What terms match catalognumber? joined = (idb_df_ids .join(idb_tf_df, on=idb_df_ids["idb_catalognumber"] == idb_tf_df["token"]) .join(idb_df_notes, on=sql.column("uuid") == idb_df_notes["note_uuid"]) .withColumn("catalognumber_len", sql.length(sql.column("idb_catalognumber") )) )
  • 7. 7 What do we find? A few things like record bd347847… Has a remark Part of Collection at FH: barcode-00374180. Which matches record 826da57a... Histogram of matching catalognumber length
  • 8. 8 How long did that take to write? < 200 lines of code (including whitespace and comments) 1 intermittent day of coding https://github.com/iDigBio/idb-spark
  • 9. 9 How long did that take to run? 73.5 million records in iDigBio to 151 million document:term:counts 40 minutes Joined back to iDigBio resulting in 2.9 billion terms found in the catalognumber field 3 hours 40 minutes
  • 10. 10 Good tools in the hands of people with good ideas: IDEAS RESULTSWORK
  • 12. 12 Data is half the tool Copies of whole datasets • Stored locally • Refreshed automatically Re-represent datasets in a useful structure for high performance computing - parquet on HDFS: https://github.com/bio-guoda/guoda-datasets
  • 13. 13 Interfaces to GUODA • Jupyter Notebooks for end-users • Containers for API and web services • Persistent storage for application state • Hangouts calls every 2-4 weeks
  • 14. 14 The front door to GUODA Notebooks “Literate Programming” Comments, code, and outputs all together in a readable document that describes what is being done
  • 15. 15 Here’s what it looks like
  • 17. 17 What would you do with it? Have a Github account and want to write code? This is an alpha quality system. http://jupyter.idigbio.org Or talk to us if you want to host an application on our systems mcollins@acis.ufl.edu godfoder@acis.ufl.edu
  • 18. iDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. idigbio.org/wiki facebook.com/iDigBio twitter.com/iDigBio vimeo.com/iDigBio idigbio.org/rss-feed.xml idigbio.org/events-calendar/export.ics Get involved!