SlideShare a Scribd company logo
1 of 45
Download to read offline
Save the Cows!




Cyberinfrastructure for the
         rest of us
                              Dorothea Salo
                 Digital Repository Librarian
                    University of Wisconsin
                             11 March 2009
Cyberinfrastructure
    Petabytes
               Y TE S
                 Data mining

     X A B
Grid Computing
  E       tio n    Terabytes
                      Id
         aE-Research entit
       or E-Science
     ab ta
  oll da
                             y
C a
              G H !Stan
    et Data Curation s  dard
  M
         A A R
 IT?
   A A    Faculty?
                    Libraries?
It’s simpler than that.

    (thank goodness!)
Scholars use




in their research
This produces


DATA.
In addition to


DATA.
So now we have
to support that.
    Data generation
   Data management
      Data storage
   Data certification
Data discovery and reuse
That’s all this is
 about. Really.
What I will not
 talk about today
• Collaboration technology
• Identity-management, authentication,
  authorization, etc.
• Grid computing
• Instrument science
• Open Notebook Science
  Of course these are important.
 I’m just not competent to opine.
 Fortunately, you have Melissa!
What I’m on about



DATA.
Data?
Charts and graphs
    are DEAD data

      Killed!    Cut in pieces!

  Ground up!     Unrecognizable!

Not revivable!   Not reusable!
Okay, what’s
    data, then?




We have to save the cows!
In case you’re
       wondering...
                                                  ike
                                              it l ”
                                        ab
                          ML nto co
                                   is            ws.
                F to X ers i                                ay/
             PD          rg                          l K 607 >
           ng ambu                                aev/200html
       erti g h
   nv in
                                              ichl-de 0509.
                                      —Mes/xm sg0
“Co vert                              arc
                                          hiv    m

  co n                       l .org
                                    /

                    ist s.xm
                               /l
                       ht tp:/
                   <
Do we have to
   keep data?

SOMETIMES.
 (but it’s often a good idea even if
         you don’t have to)
Funders may
 require it.
Journals may
 require it.
Here’s the catch




Some of these places
 have built barns      Many haven’t.
   for the cows.
Guess who’s on   if they don’t?
What can be
done with data?
• Experimental validation
• Meta-analysis, data-mining, mashups
• Interdisciplinary investigation
• Historical investigation
• Modeling and model validation
• ... the possibilities are endless—IF we
  have the cows the data.
Is all data from
“BIG SCIENCE”?
Absolutely not.

(they don’t even need our help)
“Small Science”
Less money


 Less know-how


    In aggregate? MORE COWS.
Arts & Humanities
Here’s the catch.
Nobody knows
how to do all this.
       (yet)
But we do know
 a few things...
Cows are dumb.




They will not save
   themselves.
It takes a village




to save the cows.
Researchers
 Can you tell a Holstein from an Angus?




              Me neither.

But researchers know their cows.
Information
Technologists
Librarians
                                                   i ful f
                                           e aut re o oes
                                        s b uctu at g le
                                     thi str
                                  ... he                e th peop
                             g is ng t
                          in di              e ocod the                   talk t’s
                      pen tan           g th le t ed to tha
                   ap ers
                  h d              n din sab              us now
             see f un         rsta e it u at we —
         at I n o
       h io            u nde mak k th rian                                                     5”
                                                                                            n 1 ..”
     w
B ut inat n, and w to I thin libra                                                  b
                                                                                “Li Suc
                                                                                         ia
                                                                                      rar cess.
       b tio           o        .        d
 c om ma          nd h ess it hybri                                              rs o
                                                                                      f
       or     t, a acc the
   inf ind i t to                                                       g Fac
                                                                              to
                          o r                                      fyin
     b eh wan ded,                                        I den
                                                                ti
                   n                                 l., “
         ho t ble n.
       w u                                   e re
                                                  ta
                    ia                   alm
        abo librar                      P

          the
Grant
   administrators




Cows don’t corral themselves.
   Neither do researchers.
The big gray area

Informaticists?

 Researchers who code?

    IT pros who grok metadata?

       Librarians who model data?
Great. So now what?
Find use cases
Plan for
infrastructure
Build alliances
Start conversations
Ten Questions
1.  What is the story of your data?
2.  What form and format are the data in?
3.  What is the expected lifecycle of your data?
4.  How could your data be used, reused, and repurposed?
5.  How large is your dataset, and what is its rate of
    growth?
6. Who are the potential audiences for your data?
7. Who owns the data?
8. Does the dataset include any sensitive information?
9. What publications or discoveries have resulted from the
    data?
10. How should the data be made accessible?
                           —Michael Witt and Jake Carlson, Purdue University
Keep an eye out
If this seems like
 common sense...




... good! It mostly is!
Thank you!




(and save a cow today!)
Credits
•   Title slide: http://www.flickr.com/photos/flikr/131673772/
•   Server rack: http://www.flickr.com/photos/dumbledad/3276756770/
•   Command centre: http://www.flickr.com/photos/soundman1024/2054512893/
•   Laptop: http://www.flickr.com/photos/arbron/56216464/
•   Dual-monitor setup: http://www.flickr.com/photos/blakespot/2372432028/
•   Photo-data: http://www.flickr.com/photos/51114580@N00/1597765466/
•   Word cloud: http://www.flickr.com/photos/55772089@N00/3291287830/
•   Internet map: http://www.flickr.com/photos/jurvetson/63009926/
•   Dhaka image: http://www.flickr.com/photos/ahaqueusa/1268467179/
•   Plant cross-section: http://www.flickr.com/photos/tonios-pics/387510805/
•   Journals: http://www.flickr.com/photos/emdot/56157732/
•   Books: http://www.flickr.com/photos/guwashi999/2635608241/
•   Manuscript: http://www.flickr.com/photos/86624586@N00/10187684/
•   Hamburger: http://www.flickr.com/photos/nadya/1019816514/
•   Row of cows: http://www.flickr.com/photos/flikr/230379411/
•   Beware of cow: http://www.flickr.com/photos/tm-tm/2339539399/
•   Cowboys: http://www.flickr.com/photos/bistrosavage/30710414/
•   Hands: http://www.flickr.com/photos/iandesign/1204632335/
•   Money: http://www.flickr.com/photos/emraya/2867188734/
Thank you!




(and save a cow today!)

More Related Content

Viewers also liked

So are we winning yet?
So are we winning yet?So are we winning yet?
So are we winning yet?Dorothea Salo
 
RDF, RDA, and other TLAs
RDF, RDA, and other TLAsRDF, RDA, and other TLAs
RDF, RDA, and other TLAsDorothea Salo
 
Social Networks And Private Life
Social Networks And Private LifeSocial Networks And Private Life
Social Networks And Private LifeFreelancer
 
Altctrl Presentation Geek
Altctrl Presentation GeekAltctrl Presentation Geek
Altctrl Presentation GeekFreelancer
 
Web Stock09 Viorel Spinu
Web Stock09 Viorel SpinuWeb Stock09 Viorel Spinu
Web Stock09 Viorel SpinuFreelancer
 
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)Dorothea Salo
 
Din Cascada, Prin Spirala, Inspre Programari
Din Cascada, Prin Spirala, Inspre ProgramariDin Cascada, Prin Spirala, Inspre Programari
Din Cascada, Prin Spirala, Inspre ProgramariFreelancer
 
Canoe the Open Content Rapids
Canoe the Open Content RapidsCanoe the Open Content Rapids
Canoe the Open Content RapidsDorothea Salo
 
Avoiding the Heron's Way
Avoiding the Heron's WayAvoiding the Heron's Way
Avoiding the Heron's WayDorothea Salo
 
Who owns our work? (notes)
Who owns our work? (notes)Who owns our work? (notes)
Who owns our work? (notes)Dorothea Salo
 
Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Dorothea Salo
 
Grab a bucket! It's raining data!
Grab a bucket! It's raining data!Grab a bucket! It's raining data!
Grab a bucket! It's raining data!Dorothea Salo
 
Prezentare Webstock Daedalus
Prezentare Webstock DaedalusPrezentare Webstock Daedalus
Prezentare Webstock DaedalusFreelancer
 
Grab a bucket! It's raining data!
Grab a bucket! It's raining data!Grab a bucket! It's raining data!
Grab a bucket! It's raining data!Dorothea Salo
 
Bannerul Pe Tel Mobil
Bannerul Pe Tel MobilBannerul Pe Tel Mobil
Bannerul Pe Tel MobilFreelancer
 
I own copyright, so I pwn you!
I own copyright, so I pwn you!I own copyright, so I pwn you!
I own copyright, so I pwn you!Dorothea Salo
 

Viewers also liked (19)

So are we winning yet?
So are we winning yet?So are we winning yet?
So are we winning yet?
 
RDF, RDA, and other TLAs
RDF, RDA, and other TLAsRDF, RDA, and other TLAs
RDF, RDA, and other TLAs
 
Social Networks And Private Life
Social Networks And Private LifeSocial Networks And Private Life
Social Networks And Private Life
 
Altctrl Presentation Geek
Altctrl Presentation GeekAltctrl Presentation Geek
Altctrl Presentation Geek
 
Web Stock09 Viorel Spinu
Web Stock09 Viorel SpinuWeb Stock09 Viorel Spinu
Web Stock09 Viorel Spinu
 
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
 
Din Cascada, Prin Spirala, Inspre Programari
Din Cascada, Prin Spirala, Inspre ProgramariDin Cascada, Prin Spirala, Inspre Programari
Din Cascada, Prin Spirala, Inspre Programari
 
Codnuita IAB
Codnuita IABCodnuita IAB
Codnuita IAB
 
Canoe the Open Content Rapids
Canoe the Open Content RapidsCanoe the Open Content Rapids
Canoe the Open Content Rapids
 
Avoiding the Heron's Way
Avoiding the Heron's WayAvoiding the Heron's Way
Avoiding the Heron's Way
 
Who owns our work? (notes)
Who owns our work? (notes)Who owns our work? (notes)
Who owns our work? (notes)
 
Open Content
Open ContentOpen Content
Open Content
 
Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?
 
Grab a bucket! It's raining data!
Grab a bucket! It's raining data!Grab a bucket! It's raining data!
Grab a bucket! It's raining data!
 
Prezentare Webstock Daedalus
Prezentare Webstock DaedalusPrezentare Webstock Daedalus
Prezentare Webstock Daedalus
 
Grab a bucket! It's raining data!
Grab a bucket! It's raining data!Grab a bucket! It's raining data!
Grab a bucket! It's raining data!
 
Astrobite
AstrobiteAstrobite
Astrobite
 
Bannerul Pe Tel Mobil
Bannerul Pe Tel MobilBannerul Pe Tel Mobil
Bannerul Pe Tel Mobil
 
I own copyright, so I pwn you!
I own copyright, so I pwn you!I own copyright, so I pwn you!
I own copyright, so I pwn you!
 

Similar to Save the Data

Inspirational Quotes On Planning
Inspirational Quotes On PlanningInspirational Quotes On Planning
Inspirational Quotes On Planningxtrm nurse
 
Digging in mountain of data april bell qrca presentation final slideshare
Digging in mountain of data april bell qrca presentation final slideshareDigging in mountain of data april bell qrca presentation final slideshare
Digging in mountain of data april bell qrca presentation final slideshareApril Bell Consulting
 
Idph presentation 4.5.12
Idph presentation 4.5.12Idph presentation 4.5.12
Idph presentation 4.5.12michaelshmarak
 
Writing Workshop
Writing WorkshopWriting Workshop
Writing Workshopsmacksoud
 
PT.Kriyamud Indonesia
PT.Kriyamud IndonesiaPT.Kriyamud Indonesia
PT.Kriyamud IndonesiaGuzpoer Nomo
 
Mso excel 2003 tips & tricks
Mso excel 2003   tips & tricksMso excel 2003   tips & tricks
Mso excel 2003 tips & tricksSkender Beu
 
The MELA Quiz Session
The MELA Quiz SessionThe MELA Quiz Session
The MELA Quiz SessionTamal Dutta
 
Visualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the HairballVisualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the HairballOReillyStrata
 
Ted talk evaluation presentation
Ted talk evaluation presentationTed talk evaluation presentation
Ted talk evaluation presentationSylvester Alfonso
 
March Newsletter 2011 Pacific Advisors
March Newsletter 2011 Pacific AdvisorsMarch Newsletter 2011 Pacific Advisors
March Newsletter 2011 Pacific Advisorsmpitkin
 
Why can't we all just get along? [Bettakultcha]
Why can't we all just get along? [Bettakultcha]Why can't we all just get along? [Bettakultcha]
Why can't we all just get along? [Bettakultcha]Cormac Lawler
 
TED Talk Evaluation Presentation
TED Talk Evaluation PresentationTED Talk Evaluation Presentation
TED Talk Evaluation PresentationSylvester Alfonso
 
Delray Presentation Microsoft
Delray Presentation MicrosoftDelray Presentation Microsoft
Delray Presentation MicrosoftSerein
 
Why Hire Portent?
Why Hire Portent?Why Hire Portent?
Why Hire Portent?Ian Lurie
 
Back To Basics - Web standards and accessibility
Back To Basics - Web standards and accessibilityBack To Basics - Web standards and accessibility
Back To Basics - Web standards and accessibilityazcazandco
 
Ics Demontfort Culture 17112009
Ics Demontfort Culture 17112009Ics Demontfort Culture 17112009
Ics Demontfort Culture 17112009Richard Hill
 
Social Knowledge: Are we ready for the future?
Social Knowledge: Are we ready for the future?Social Knowledge: Are we ready for the future?
Social Knowledge: Are we ready for the future?John Girard
 

Similar to Save the Data (20)

HLABC Forum: January 2008
HLABC Forum: January 2008HLABC Forum: January 2008
HLABC Forum: January 2008
 
Inspirational Quotes On Planning
Inspirational Quotes On PlanningInspirational Quotes On Planning
Inspirational Quotes On Planning
 
Digging in mountain of data april bell qrca presentation final slideshare
Digging in mountain of data april bell qrca presentation final slideshareDigging in mountain of data april bell qrca presentation final slideshare
Digging in mountain of data april bell qrca presentation final slideshare
 
Idph presentation 4.5.12
Idph presentation 4.5.12Idph presentation 4.5.12
Idph presentation 4.5.12
 
Writing Workshop
Writing WorkshopWriting Workshop
Writing Workshop
 
PT.Kriyamud Indonesia
PT.Kriyamud IndonesiaPT.Kriyamud Indonesia
PT.Kriyamud Indonesia
 
Progressing and enhancing
Progressing and enhancingProgressing and enhancing
Progressing and enhancing
 
Mso excel 2003 tips & tricks
Mso excel 2003   tips & tricksMso excel 2003   tips & tricks
Mso excel 2003 tips & tricks
 
The MELA Quiz Session
The MELA Quiz SessionThe MELA Quiz Session
The MELA Quiz Session
 
Visualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the HairballVisualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the Hairball
 
Ted talk evaluation presentation
Ted talk evaluation presentationTed talk evaluation presentation
Ted talk evaluation presentation
 
March Newsletter 2011 Pacific Advisors
March Newsletter 2011 Pacific AdvisorsMarch Newsletter 2011 Pacific Advisors
March Newsletter 2011 Pacific Advisors
 
Why can't we all just get along? [Bettakultcha]
Why can't we all just get along? [Bettakultcha]Why can't we all just get along? [Bettakultcha]
Why can't we all just get along? [Bettakultcha]
 
Teaching Librarians
Teaching LibrariansTeaching Librarians
Teaching Librarians
 
TED Talk Evaluation Presentation
TED Talk Evaluation PresentationTED Talk Evaluation Presentation
TED Talk Evaluation Presentation
 
Delray Presentation Microsoft
Delray Presentation MicrosoftDelray Presentation Microsoft
Delray Presentation Microsoft
 
Why Hire Portent?
Why Hire Portent?Why Hire Portent?
Why Hire Portent?
 
Back To Basics - Web standards and accessibility
Back To Basics - Web standards and accessibilityBack To Basics - Web standards and accessibility
Back To Basics - Web standards and accessibility
 
Ics Demontfort Culture 17112009
Ics Demontfort Culture 17112009Ics Demontfort Culture 17112009
Ics Demontfort Culture 17112009
 
Social Knowledge: Are we ready for the future?
Social Knowledge: Are we ready for the future?Social Knowledge: Are we ready for the future?
Social Knowledge: Are we ready for the future?
 

More from Dorothea Salo

Soylent SemanticWeb Is People!
Soylent SemanticWeb Is People!Soylent SemanticWeb Is People!
Soylent SemanticWeb Is People!Dorothea Salo
 
Privacy and libraries
Privacy and librariesPrivacy and libraries
Privacy and librariesDorothea Salo
 
Risk management and auditing
Risk management and auditingRisk management and auditing
Risk management and auditingDorothea Salo
 
MARC and BIBFRAME; Linking libraries and archives
MARC and BIBFRAME; Linking libraries and archivesMARC and BIBFRAME; Linking libraries and archives
MARC and BIBFRAME; Linking libraries and archivesDorothea Salo
 
Research Data and Scholarly Communication
Research Data and Scholarly CommunicationResearch Data and Scholarly Communication
Research Data and Scholarly CommunicationDorothea Salo
 
Research Data and Scholarly Communication (with notes)
Research Data and Scholarly Communication (with notes)Research Data and Scholarly Communication (with notes)
Research Data and Scholarly Communication (with notes)Dorothea Salo
 
Librarians love data!
Librarians love data!Librarians love data!
Librarians love data!Dorothea Salo
 
Taming the Monster: Digital Preservation Planning and Implementation Tools
Taming the Monster: Digital Preservation Planning and Implementation ToolsTaming the Monster: Digital Preservation Planning and Implementation Tools
Taming the Monster: Digital Preservation Planning and Implementation ToolsDorothea Salo
 
Manufacturing Serendipity
Manufacturing SerendipityManufacturing Serendipity
Manufacturing SerendipityDorothea Salo
 
Lipstick on a Pig: Integrated Library Systems
Lipstick on a Pig: Integrated Library SystemsLipstick on a Pig: Integrated Library Systems
Lipstick on a Pig: Integrated Library SystemsDorothea Salo
 
Databases, Markup, and Regular Expressions
Databases, Markup, and Regular ExpressionsDatabases, Markup, and Regular Expressions
Databases, Markup, and Regular ExpressionsDorothea Salo
 
So are we winning yet?
So are we winning yet?So are we winning yet?
So are we winning yet?Dorothea Salo
 

More from Dorothea Salo (18)

Soylent SemanticWeb Is People!
Soylent SemanticWeb Is People!Soylent SemanticWeb Is People!
Soylent SemanticWeb Is People!
 
Encryption
EncryptionEncryption
Encryption
 
Privacy and libraries
Privacy and librariesPrivacy and libraries
Privacy and libraries
 
Paying for it
Paying for itPaying for it
Paying for it
 
Risk management and auditing
Risk management and auditingRisk management and auditing
Risk management and auditing
 
MARC and BIBFRAME; Linking libraries and archives
MARC and BIBFRAME; Linking libraries and archivesMARC and BIBFRAME; Linking libraries and archives
MARC and BIBFRAME; Linking libraries and archives
 
Library Linked Data
Library Linked DataLibrary Linked Data
Library Linked Data
 
FRBR and RDA
FRBR and RDAFRBR and RDA
FRBR and RDA
 
Research Data and Scholarly Communication
Research Data and Scholarly CommunicationResearch Data and Scholarly Communication
Research Data and Scholarly Communication
 
Research Data and Scholarly Communication (with notes)
Research Data and Scholarly Communication (with notes)Research Data and Scholarly Communication (with notes)
Research Data and Scholarly Communication (with notes)
 
What We Organize
What We OrganizeWhat We Organize
What We Organize
 
Librarians love data!
Librarians love data!Librarians love data!
Librarians love data!
 
Taming the Monster: Digital Preservation Planning and Implementation Tools
Taming the Monster: Digital Preservation Planning and Implementation ToolsTaming the Monster: Digital Preservation Planning and Implementation Tools
Taming the Monster: Digital Preservation Planning and Implementation Tools
 
Manufacturing Serendipity
Manufacturing SerendipityManufacturing Serendipity
Manufacturing Serendipity
 
Lipstick on a Pig: Integrated Library Systems
Lipstick on a Pig: Integrated Library SystemsLipstick on a Pig: Integrated Library Systems
Lipstick on a Pig: Integrated Library Systems
 
Databases, Markup, and Regular Expressions
Databases, Markup, and Regular ExpressionsDatabases, Markup, and Regular Expressions
Databases, Markup, and Regular Expressions
 
Escaping Datageddon
Escaping DatageddonEscaping Datageddon
Escaping Datageddon
 
So are we winning yet?
So are we winning yet?So are we winning yet?
So are we winning yet?
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Save the Data

  • 1. Save the Cows! Cyberinfrastructure for the rest of us Dorothea Salo Digital Repository Librarian University of Wisconsin 11 March 2009
  • 2. Cyberinfrastructure Petabytes Y TE S Data mining X A B Grid Computing E tio n Terabytes Id aE-Research entit or E-Science ab ta oll da y C a G H !Stan et Data Curation s dard M A A R IT? A A Faculty? Libraries?
  • 3. It’s simpler than that. (thank goodness!)
  • 7. So now we have to support that. Data generation Data management Data storage Data certification Data discovery and reuse
  • 8. That’s all this is about. Really.
  • 9. What I will not talk about today • Collaboration technology • Identity-management, authentication, authorization, etc. • Grid computing • Instrument science • Open Notebook Science Of course these are important. I’m just not competent to opine. Fortunately, you have Melissa!
  • 10. What I’m on about DATA.
  • 11. Data?
  • 12. Charts and graphs are DEAD data Killed! Cut in pieces! Ground up! Unrecognizable! Not revivable! Not reusable!
  • 13. Okay, what’s data, then? We have to save the cows!
  • 14. In case you’re wondering... ike it l ” ab ML nto co is ws. F to X ers i ay/ PD rg l K 607 > ng ambu aev/200html erti g h nv in ichl-de 0509. —Mes/xm sg0 “Co vert arc hiv m co n l .org / ist s.xm /l ht tp:/ <
  • 15. Do we have to keep data? SOMETIMES. (but it’s often a good idea even if you don’t have to)
  • 18. Here’s the catch Some of these places have built barns Many haven’t. for the cows.
  • 19. Guess who’s on if they don’t?
  • 20. What can be done with data? • Experimental validation • Meta-analysis, data-mining, mashups • Interdisciplinary investigation • Historical investigation • Modeling and model validation • ... the possibilities are endless—IF we have the cows the data.
  • 21. Is all data from “BIG SCIENCE”?
  • 22. Absolutely not. (they don’t even need our help)
  • 23. “Small Science” Less money Less know-how In aggregate? MORE COWS.
  • 26. Nobody knows how to do all this. (yet)
  • 27. But we do know a few things...
  • 28. Cows are dumb. They will not save themselves.
  • 29. It takes a village to save the cows.
  • 30. Researchers Can you tell a Holstein from an Angus? Me neither. But researchers know their cows.
  • 32. Librarians i ful f e aut re o oes s b uctu at g le thi str ... he e th peop g is ng t in di e ocod the talk t’s pen tan g th le t ed to tha ap ers h d n din sab us now see f un rsta e it u at we — at I n o h io u nde mak k th rian 5” n 1 ..” w B ut inat n, and w to I thin libra b “Li Suc ia rar cess. b tio o . d c om ma nd h ess it hybri rs o f or t, a acc the inf ind i t to g Fac to o r fyin b eh wan ded, I den ti n l., “ ho t ble n. w u e re ta ia alm abo librar P the
  • 33. Grant administrators Cows don’t corral themselves. Neither do researchers.
  • 34. The big gray area Informaticists? Researchers who code? IT pros who grok metadata? Librarians who model data?
  • 35. Great. So now what?
  • 40. Ten Questions 1. What is the story of your data? 2. What form and format are the data in? 3. What is the expected lifecycle of your data? 4. How could your data be used, reused, and repurposed? 5. How large is your dataset, and what is its rate of growth? 6. Who are the potential audiences for your data? 7. Who owns the data? 8. Does the dataset include any sensitive information? 9. What publications or discoveries have resulted from the data? 10. How should the data be made accessible? —Michael Witt and Jake Carlson, Purdue University
  • 41. Keep an eye out
  • 42. If this seems like common sense... ... good! It mostly is!
  • 43. Thank you! (and save a cow today!)
  • 44. Credits • Title slide: http://www.flickr.com/photos/flikr/131673772/ • Server rack: http://www.flickr.com/photos/dumbledad/3276756770/ • Command centre: http://www.flickr.com/photos/soundman1024/2054512893/ • Laptop: http://www.flickr.com/photos/arbron/56216464/ • Dual-monitor setup: http://www.flickr.com/photos/blakespot/2372432028/ • Photo-data: http://www.flickr.com/photos/51114580@N00/1597765466/ • Word cloud: http://www.flickr.com/photos/55772089@N00/3291287830/ • Internet map: http://www.flickr.com/photos/jurvetson/63009926/ • Dhaka image: http://www.flickr.com/photos/ahaqueusa/1268467179/ • Plant cross-section: http://www.flickr.com/photos/tonios-pics/387510805/ • Journals: http://www.flickr.com/photos/emdot/56157732/ • Books: http://www.flickr.com/photos/guwashi999/2635608241/ • Manuscript: http://www.flickr.com/photos/86624586@N00/10187684/ • Hamburger: http://www.flickr.com/photos/nadya/1019816514/ • Row of cows: http://www.flickr.com/photos/flikr/230379411/ • Beware of cow: http://www.flickr.com/photos/tm-tm/2339539399/ • Cowboys: http://www.flickr.com/photos/bistrosavage/30710414/ • Hands: http://www.flickr.com/photos/iandesign/1204632335/ • Money: http://www.flickr.com/photos/emraya/2867188734/
  • 45. Thank you! (and save a cow today!)

Editor's Notes

  1. Good morning, and thank you for coming. My name is Dorothea Salo, and I work for the University of Wisconsin System as an odd sort of digital archivist. I do have strong interests in the area of cyberinfrastructure, as I hope to prove to you today, and so Melissa asked me to come here and talk to you a little bit about my angle on the whole cyberinfrastructure thing.And I promise you will understand the title by the time I’m done talking. Cross my heart.
  2. So, when we say the word cyberinfrastructure, some of the first things that come to mind are grid computing, in which we throw a whole lot of little computers working together at huge, massive computational problems, and data mining, in which we throw those computing resources at huge amounts of data on a scale we could never have considered before.(CLICK) Of course, these processes create new data. Terabytes and petabytes of it. And now all the librarians listening to me are wincing, because our shock-and-awe sensors tripped as soon as you could fit the Library of Alexandria on a USB thumb drive, you know what I’m saying? (CLICK) And then the grid computing people start tossing around exabytes, and look, my brain just shuts down.(CLICK) In the UK, what we call cyberinfrastructure is often called “e-science.” This, of course, betrays an assumption. (CLICK) So we don’t use “e-science” here, because it’s not just the physicists and the astronomers and the climatologists; (CLICK) we say “e-research” instead, because it’s certainly true that the social sciences, the arts, and the humanities are joining the party too. And with that, we add concerns over collaboration, especially across institutions and across disciplines -- and doing cross-disciplinary collaboration creates sticky issues around identity and authorization and it all gets very evil and nasty and complicated very quickly.(CLICK) And while we’re at it, let’s not forget the data I mentioned. An emerging professional specialty, though exactly *where* it’s emerging is a really good question, is that of data curation. This brings up questions of metadata, a thing dear to librarian hearts that just made the IT professionals here cringe, and data standards. We have a few of those, in a few disciplines, but not nearly enough, and unstandardized, not-uniform data is something that I think we can all agree makes us ALL cringe!(CLICK) And then there’s the question of who’s going to do data curation. Is it an IT function? Are faculty responsible? After all, it’s their data! And what about those libraries?(CLICK) And by this time much screaming has ensued and much hair is being torn out. Not least because wow, that is one ugly, ugly slide.
  3. Scholars are using computers, in a number of different form factors, including big old server racks like this one, in their research. This, I am sure, is not news to anyone!
  4. All this computation produces data, sometimes as the point of the exercise, sometimes as a sort of side effect. Data takes all kinds of forms; it’s not just numbers. Word-clouds, scanned manuscripts, maps, images on wildly different scales -- it’s all bits-and-bytes; it’s all reusable and recomputable -- it’s all data!
  5. This is in addition to the books and journals that librarians are familiar with and already care for.Interestingly, as these materials move digital themselves (CLICK), they too can be treated as data, as grist for the computational mill. This doesn’t happen as much as it should, honestly, and the reason for that is that even when these materials are digital, they’re locked up behind pay-access firewalls to protect the current scholarly-publishing business model, so the computers can’t get in to crunch on them. This is a major argument for open access to the literature -- and for those of you who know me and what I do, I hereby reassure you that it’s the only open-access argument I’m going to make in this presentation.So to recap a bit, we have our researchers, and they’re using computers, and they’re generating data.
  6. And that support, librarians, has to happen throughout the entire data lifecycle. And that support, IT professionals, is absolutely not limited to providing computational horsepower and storage. And that support, scholars and researchers, has to include verification and documentation of data-gathering methods, so that everyone knows that everything’s on the level, and it’s got to include ways to refer back to other people’s data that you’ve used; that’s what I mean by ‘certification’ here.
  7. So that’s the cyberinfrastructure puzzle as I see it. There are large swathes of it that I’m not going to talk about today...
  8. Now here we are. This is data, right? Nice bar graphs and charts, with a nice key in the corner; you can imagine this on a web page or equally well on a print journal page.(CLICK) NO. No. Not data. This is not data in the sense I mean it.
  9. For optimum reusability, we need to save data before it’s distilled into charts and graphs and tables. We need to save the cows before they become hamburger!
  10. So in tight budget times, a very good question to ask is whether it’s actually necessary to solve this problem. Even if it is, do we have to solve it now? Do we have to keep all these data?(CLICK) The answer is a resounding -- sometimes. But I do want to add that even when it’s not absolutely required, it’s often a really good idea. On the Madison campus, we have collected a number of stories of researchers who wish they’d done a better job keeping their data, because a new use turned up for it, often years or decades later!So in what cases is it mandatory?
  11. (mention NIH, distinguish articles from data)
  12. Most of the funders requiring open data are in Europe at the moment, but that’s not true of journals. I can’t give you a laundry list, because it’s very discipline-dependent and also very volatile, but we are seeing more and more science journals instituting data-retention policies.Now, the ones I’ve seen have usually been time-limited; five or ten years is common. My question is this: if you’re going to do it for five or ten years, why not plan for longer? Sure, it makes sense to assess every now and again, because some datasets do become obsolete. But don’t let your thinking be governed by journal requirements; most of the work of keeping a dataset happens before the bits hit storage, so keeping them longer is often a very low-margin business.
  13. There’s nothing stopping a journal or a funder from creating an unfunded mandate to keep and preserve data. A few have. And we, collectively, researchers and librarians and IT professionals, are left dangling on the hook figuring out how to comply.Okay. So that’s the stick. Now for the carrot. We’re keeping all these data. Why? What’s the use?
  14. I’ve answered this already, for those who were listening at the beginning, but for anybody who came late, and just to reiterate, there’s an image of cyberinfrastructure that assumes it’s all about the Higgs bosons of this world. Physics, astronomy, and biomedicine. That’s who’s got all the data, just like they’ve got all the money.
  15. A broader concern is so-called “small science,” which is science without the big bucks, which is frankly most scientists, not that that surprises anyone. The big guns have mostly worked out their data issues, as I’ve said. The small-science folks -- a lot of them hardly seem to know where to begin.(CLICK) And the sting in the tail here is that there are a lot MORE small-science researchers than big science. This means that if you pile up all their data, there’s probably a lot more of it! Each individual data-herd is pretty small by comparison with the Large Hadron Collider, granted. But add all those herds together, and we are talking a LOT of cows.
  16. And my dearest loves, the arts and humanities, are hardly devoid of data. A digitized image is data. A digitized book is data, and can be computed upon. The performing arts are pushing out huge amounts of audio and video -- and while we’re talking storage capacity, digital video is an unbelievable headache because of file sizes.I like to think about folklorists and ethnographers while I consider digital data in the arts and humanities. Anything you can imagine is grist for their analysis mill, and yes, they are both analyzing digital data and recording their conclusions digitally.So we’ve all got data, one way or another.
  17. And here’s the other thing... We don’t have a service-provision model for this. Not in libraries. Not in IT. Not in most regular research practice. Nobody’s sure how it’s going to get done yet.This is part of why I’m here today. UW Milwaukee is busily trying to sort out how to do all this, in addition to all the other cyberinfrastructure-related things I told you at the beginning I wasn’t going to talk about.
  18. We know that apathy is not a solution. And here we often hear someone grumbling that if this was just all paper, it’d be fine; it’s this stupid digital stuff that’s the problem. Leaving aside that data on paper are completely useless as data, we shouldn’t ignore the incredibly complex safety net that libraries have built around paper. Paper doesn’t preserve itself either; librarians preserve it! Digital data are no different. We have to take intentional action to keep data viable.
  19. Right, so who’s we?Okay. Show of hands. Librarians? IT pros? Faculty and researchers? Research support, grant administrators and the like?Right. If you raised your hand at any point, part of this is probably your problem. Which part, I don’t know, and anybody who tells you they know is lying and probably trying to sell you something.
  20. So, can you tell a Holstein from an Angus? (I’m just going to die if there’s a dairy researcher in the room.)(CLICK) No, I can’t either. I can tell you that the Anguses are on the left, because I dug up the photos, but I swear that’s the only reason I know.The point of this little parable is that we know absolutely that data curation can’t happen without researchers helping and being cooperative with other people in the village. This is because data without context and interpretation are meaningless, like a spreadsheet with the header row chopped off -- and researchers are the people with the context and with the ability to interpret. Librarians and IT pros don’t automatically understand how a given dataset fits together, how it was created, how other people will expect to search for it or use it, what different parts of it even MEAN. Researchers will have to learn to express these things, if they don’t already know how!
  21. IT pros, you’re going to be running the big iron. No surprises there. But there are surprises for you in this, such as time horizons you’re not used to, mass file format migrations, metadata internal and external and relational that we can hardly imagine yet... and so on. Don’t panic, we’re all in this together, and we have examples to work from, especially on the larger scales -- but by the same token, don’t make the mistake of thinking you can just sail in and solve this one. It’s complicated.
  22. Librarians, this is your call to arms. Step up and sit at the table, or the table is going to forget that we exist. This isn’t good for the table, and it’s not good for us, either.Sure, we’re used to dealing with the published literature, and we’re fond of its authority and finality. (CLICK) But we’re going to have to look earlier in the lifecycle for our greatest impact.
  23. And then there’s the big gray area. When I said I didn’t know who would do all this? This is what I meant. Some researchers say that the solution is to teach themselves -- or up-and-coming newcomers -- information-management skills so that they become informaticists. Some researchers say that the answer is for researchers to learn to code.All of this will probably happen, in some fields and at some levels. I don’t know how it will all shake out, in the long run. But cross-functional training, no matter what end of the research enterprise you’re on, is probably the wave of the future.
  24. Infrastructure is more than computers. It’s also a policy and procedures infrastructure, without which none of this can happen. And finally, as I dearly hope I’ve made clear, infrastructure is people. Fancy supercomputers aren’t worth a penny without people to use them, care for them, and take care of what they compute.
  25. Everyone in this room can do this, and I hope you will. But, you may ask, what do you say?
  26. mention Educause
  27. I used so many Creative Commons-licensed photos that I have to actually roll the credits here... while that’s happening, let me ask if there are any questions?