SlideShare uma empresa Scribd logo
1 de 56
From	
  Calisphere	
  via	
  California	
  State	
  University	
  Libraries,	
  	
  




                                                                                                   Data	
  
                                                                                                   Management	
  
                                                                                                    A	
  Scientist’s	
  
	
  ark:/13030/c818356g	
  




                                                                                                    Perspective	
  




Carly	
  Strasser	
  
California	
  Digital	
  Library	
                                                     University	
  of	
  Florida	
  Libraries	
  
University	
  of	
  California	
  Curation	
  Center	
                                                         August	
  2012	
  
C.	
  Strasser	
  




                     C.	
  Strasser	
  
                                                               Courtesy	
  of	
  WHOI	
  




C.	
  Strasser	
  




                                          C.	
  Strasser	
  
C.	
  Strasser	
  


                     C.	
  Strasser	
  




                                          North	
  Atlantic	
  right	
  whale	
  mother	
  and	
  calf,	
  
C.	
  Strasser	
                          by	
  Gill	
  Braulik	
  under	
  Permit	
  No.	
  655-­‐1652	
  
Roadmap	
  



                                      5.  Landscape	
  
                             4.  Barriers	
  
                                      	
  
                    3.  The	
  Fallout	
  
                             	
  
          2.  The	
  world	
  of	
  data	
  
1.  A	
  brief	
  history	
  of	
  data	
  collection	
  
	
                                                          C.	
  Strasser	
  
A	
  Brief	
  
From	
  Calisphere	
  via	
  Santa	
  Clara	
  University,	
  	
  




                                                                                                         History	
  of	
  
                                                                                                         Data	
  
ark:/13030/kt696nc7j2	
  




                                                                                                         Collection	
  

                                                                     Or…	
  how	
  scientists	
  came	
  to	
  be	
  so	
  
                                                                        bad	
  at	
  data	
  management	
  
The	
  lab/field	
  notebook	
  
                                                    Curie	
  
                                   Newton	
  




                                                Darwin	
  
         Da	
  Vinci	
  




classicalschool.blogspot.com	
  
The	
  lab/field	
  notebook	
  




From	
  Calisphere	
  via	
  Fullerton	
  College,	
  	
  ark:/13030/kt5c60273t	
  
From	
  Flickr	
  by	
  	
  DW0825	
  
                                                                                                                 From	
  Flickr	
  by	
  Flickmor	
  




                                                          From	
  Flickr	
  by	
  	
  deltaMike	
  
                                                                                                                                                                            The	
  lab/field	
  notebook…?	
  




                                             www.woodrow.org	
  
                                                                                            C.	
  Strasser	
  




                                                                                                                                                        Courtesey	
  of	
  WHOI	
  
 From	
  Flickr	
  by	
  US	
  Army	
  Environmental	
  Command	
  
From	
  Flickr	
  by	
  	
  DW0825	
  
                                                                                                                 From	
  Flickr	
  by	
  Flickmor	
  




                                                          From	
  Flickr	
  by	
  	
  deltaMike	
  
                                                                                                                                                                       Digital	
  data	
  




                                             www.woodrow.org	
  
                                                                                            C.	
  Strasser	
  




                                                                                                                                                        Courtesey	
  of	
  WHOI	
  
 From	
  Flickr	
  by	
  US	
  Army	
  Environmental	
  Command	
  
Digital	
  data	
  
     +	
  	
  
 Complex	
  
workflows	
  
Data	
                               Models	
  

                    Maximum	
  
                    Likelihood	
  
                    estimation	
  



                      Matrix	
  
                      Models	
  



       Images	
       Tables	
       Paper	
  
From	
  Flickr	
  by	
  stevecadman	
  
                                          The	
  Wide	
  World	
  of	
  Data	
  
Data	
  Types	
  
Dimensions	
  of	
  Data	
  

     Datum	
  

    Data	
  file	
  
                                   Metadata	
  
     Dataset	
  

Data	
  collection	
  

Data	
  repository	
  
Data	
  Diversity	
  
 Temporal	
                           File	
  size	
  
                  Units	
                                File	
  
 structure	
  
                                                     organization	
  
                              File	
  type	
  
Documentation	
          Spatial	
  
    extent	
            structure	
                Metadata	
  
   Collection	
   Codes	
                                Project	
  intent	
  
   practices	
  
                                  Analysis	
  
Big	
  Data	
  


OSTP	
  March	
  2012	
  
Big	
  Data	
  Effort	
  Launched	
  
Guys?	
  
                                              The	
  Little	
  




From	
  Flickr	
  by	
  jason	
  tinder	
  
The	
  Long	
  Tail	
  


 Size	
  of	
  
dataset	
  
grant	
  ($)	
  




                        #	
  datasets	
  
                       #	
  researchers	
  
                          #	
  grants	
  
The	
  Long	
  Tail	
  
                   300

                                                                                    NSF	
  DEB	
  2005-­‐2010	
  
                   250
                                                                                    n	
  =	
  1234	
  
Number of Awards




                   200


                   150


                   100


                    50


                    0
                         0.1         0.5                      1                      1.5                       2             >2.5
                                           Award Amount (millions of dollars)

                               Hampton	
  et	
  al.,	
  In	
  press,	
  Frontiers	
  in	
  Ecology	
  and	
  Evolution	
  
From	
  Flickr	
  by	
  Old	
  Shoe	
  Woman	
  




The	
  Fallout	
  
UGLY TRUTH
                                                    Many	
  (most?)	
  
                                                    researchers…	
  	
  
5shortessays.blogspot.com	
  
                                                    	
  
                                                                 	
  
                          are	
  not	
  taught	
  data	
  management	
  
                          don’t	
  know	
  what	
  metadata	
  are	
  
                          can’t	
  name	
  data	
  centers	
  or	
  repositories	
  
                          don’t	
  share	
  data	
  publicly	
  or	
  store	
  it	
  in	
  an	
  archive	
  
                          aren’t	
  convinced	
  they	
  should	
  share	
  data	
  

                                                                           	
  
Information	
  Entropy	
  	
  
  Fig.	
  1	
  of	
  Michener	
  et	
  al.	
  1997	
  
From	
  Calisphere	
  via	
  San	
  Jose	
  Public	
  Library	
  




                                                                    How	
  bad	
  can	
  it	
  be?	
  
2	
  tables	
                             Random	
  notes	
  

C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                               Peter's lab     Don't use - old data
                         Sample Type: Algal                                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360           Shore           Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c        -1.26          -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364                1.26            0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                             From	
  Stephanie	
  Hampton	
  (2010)	
          	
  	
  
                                                                                                                                                             ESA	
  Workshop	
  on	
  Best	
  Practices	
  
                                                                                                                                                                       From	
  Stephanie	
  Hampton	
  
Wash	
  Cres	
  Lake	
  Dec	
  15	
  Dont_Use.xls	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                               Peter's lab     Don't use - old data
                         Sample Type: Algal                                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360           Shore           Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c        -1.26          -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364                1.26            0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                             From	
  Stephanie	
  Hampton	
  (2010)	
          	
  	
  
                                                                                                                                                             ESA	
  Workshop	
  on	
  Best	
  Practices	
  
                                                                                                                                                                       From	
  Stephanie	
  Hampton	
  
Random	
  stats	
  output	
  


C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                               Peter's lab              Don't use - old data
                         Sample Type: Algal                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                     13                                                   15
                     Reference statistics: SD for delta C = 0.07                              SD for delta N = 0.15


          Position        SampleID        Weight (mg)      %C      delta 13C   delta 13C_ca        %N          delta 15N   delta 15N_ca Spec. No.
         A1                           ref    0.98         38.27     -25.05         -24.59          1.96           4.12          3.47     25354
         A2                           ref    0.98         39.78     -25.00         -24.54          2.03           4.01          3.36     25356
         A3                           ref    0.98         40.37     -24.99         -24.53          2.04           4.09          3.44     25358
         A4                           ref    1.01         42.23     -25.06         -24.60          2.17           4.20          3.55     25360          Shore                    Avg Con
         A5          ALG01                   3.05         1.88      -24.34         -23.88          0.17          -1.65         -2.30     25362      c       -1.26                   -27.22
         A6          Lk Outlet Alg           3.06         31.55     -30.17         -29.71          0.92           0.87          0.22     25364               1.26                     0.32
         A7          ALG03                   2.91         6.85      -21.11         -20.65          0.48          -0.97         -1.62     25366      c
         A8          ALG05                   2.91         35.56     -28.05         -27.59          2.30           0.59         -0.06     25368
         A9          ALG07                   3.04         33.49     -29.56         -29.10          1.68           0.79          0.14     25370
         A10         ALG06                   2.95         41.17     -27.32         -26.86          1.97           2.71          2.06     25372
         B1          ALG04                   3.01         43.74     -27.50         -27.04          1.36           0.99          0.34     25374      c               SUMMARY OUTPUT
         B2          ALG02                     3          4.51      -22.68         -22.22          0.34           4.31          3.66     25376
         B3          ALG01                   2.99         1.59      -24.58         -24.12          0.15          -1.69         -2.34     25378      c                Regression Statistics
         B4          ALG03                   2.92         4.37      -21.06         -20.60          0.34          -1.52         -2.17     25380      c               Multiple R 0.283158
         B5          ALG07                    2.9         33.58     -29.44         -28.98          1.74           0.62         -0.03     25382                      R Square 0.080178
         B6                           ref    1.01         44.94     -25.00         -24.54          2.59           3.96          3.31     25384                      Adjusted R Square
                                                                                                                                                                                -0.022024
         B7                           ref    0.99         42.28     -24.87         -24.41          2.37           4.33          3.68     25386                      Standard Error
                                                                                                                                                                                 1.906378
         B8          Lk Outlet Alg           3.04         31.43     -29.69         -29.23          1.07           0.95          0.30     25388                      Observations         11
         B9          ALG06                   3.09         35.57     -27.26         -26.80          1.96           2.79          2.14     25390
         B10         ALG02                   3.05         5.52      -22.31         -21.85          0.45           4.72          4.07     25392                      ANOVA
         C1          ALG04                   2.98         37.90     -27.42         -26.96          1.36           1.21          0.56     25394      c                                df         SS      MS        F Significance F
         C2          ALG05                   3.04         31.74     -27.93         -27.47          2.40           0.73          0.08     25396                      Regression             1 2.851116 2.851116 0.784507 0.398813
         C3                           ref    0.99         38.46     -25.09         -24.63          2.40           4.37          3.72     25398                      Residual               9 32.7085 3.634278
                                                          23.78                                    1.17                                                             Total                 10 35.55962

                                                                                                                                                                              Coefficients
                                                                                                                                                                                        Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                  Upper 95.0%
                                                                                                                                                                    Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341
                                                                                                                                                                    X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569




                                                                                                                                                                                                           From	
  Stephanie	
  Hampton	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                          Peter's lab          Don't use - old data
                         Sample Type: Algal                                                                                                                        Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                      15
                     Reference statistics: SD for delta        C = 0.07                            SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C delta 13C_ca        %N                delta 15N delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05       -24.59         1.96                  4.12        3.47       25354
         A2                            ref    0.98              39.78      -25.00       -24.54         2.03                  4.01        3.36       25356
         A3                            ref    0.98              40.37      -24.99       -24.53         2.04                  4.09        3.44       25358
         A4                            ref    1.01              42.23      -25.06       -24.60         2.17                  4.20        3.55       25360          Shore                Avg Con
         A5          ALG01                    3.05              1.88       -24.34       -23.88         0.17                 -1.65       -2.30       25362 c            -1.26               -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17       -29.71         0.92                  0.87        0.22       25364               1.26                 0.32
         A7          ALG03                    2.91              6.85       -21.11       -20.65         0.48                 -0.97       -1.62       25366 c
         A8          ALG05                    2.91              35.56      -28.05       -27.59         2.30                  0.59       -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56       -29.10         1.68                  0.79        0.14       25370
         A10         ALG06                    2.95              41.17      -27.32       -26.86         1.97                  2.71        2.06       25372
         B1          ALG04                    3.01              43.74      -27.50       -27.04         1.36                  0.99        0.34       25374 c                    SUMMARY OUTPUT
         B2          ALG02                      3               4.51            SampleID
                                                                           -22.68       -22.22        ALG03
                                                                                                       0.34               ALG05
                                                                                                                             4.31        3.66         ALG07
                                                                                                                                                    25376           ALG06            ALG04            ALG02                ALG01                  ALG03           ALG07
         B3          ALG01                    2.99              1.59       -24.58       -24.12         0.15                 -1.69       -2.34       25378 c                 Regression Statistics
         B4          ALG03                    2.92              4.37       -21.06       -20.60         0.34                 -1.52       -2.17       25380 c                Multiple R 0.283158
         B5          ALG07                     2.9              33.58         Weight (mg)
                                                                           -29.44       -28.98          2.91
                                                                                                       1.74                  0.62    2.91
                                                                                                                                        -0.03       25382 3.04          2.95 Square 0.080178
                                                                                                                                                                           R            3.01                     3                  2.99               2.92                  2.9
         B6                            ref    1.01              44.94      -25.00       -24.54         2.59                  3.96        3.31       25384                  Adjusted R Square
                                                                                                                                                                                       -0.022024
         B7                            ref    0.99              42.28      -24.87       -24.41         2.37                  4.33        3.68       25386                  Standard Error
                                                                                                                                                                                        1.906378
         B8          Lk Outlet Alg            3.04              31.43      -29.69 %C-29.23              6.85
                                                                                                       1.07                  0.95   35.560.30       25388 33.49        41.17
                                                                                                                                                                           Observations43.74    11              4.51                1.59              4.37               33.58
         B9          ALG06                    3.09              35.57      -27.26       -26.80         1.96                  2.79        2.14       25390
         B10         ALG02                    3.05              5.52       -22.31
                                                                                 delta 13C
                                                                                        -21.85
                                                                                                       -21.11
                                                                                                       0.45                  4.72
                                                                                                                                   -28.054.07       25392
                                                                                                                                                          -29.56       -27.32
                                                                                                                                                                           ANOVA
                                                                                                                                                                                 -27.50                        -22.68             -24.58             -21.06             -29.44
         C1          ALG04                    2.98              37.90         delta 13C_ca
                                                                           -27.42       -26.96         -20.65
                                                                                                       1.36                  1.21  -27.590.56       25394 -29.10
                                                                                                                                                             c         -26.86    -27.04
                                                                                                                                                                                    df              SS         -22.22
                                                                                                                                                                                                                  MS  F           -24.12
                                                                                                                                                                                                                               Significance F        -20.60             -28.98
         C2          ALG05                    3.04              31.74      -27.93       -27.47         2.40                  0.73        0.08       25396                  Regression          1 2.851116 2.851116 0.784507 0.398813
         C3                            ref    0.99              38.46      -25.09       -24.63         2.40                  4.37        3.72       25398                  Residual            9 32.7085 3.634278
                                                                23.78             %N                    0.48
                                                                                                       1.17                          2.30                 1.68          1.97
                                                                                                                                                                           Total          1.3610 35.55962 0.34                0.15                     0.34                  1.74
                                                                              delta 15N                  -0.97                       0.59                 0.79          2.71              0.99                 4.31                -1.69              -1.52                  0.62
                                                                                                                                                                                         Coefficients
                                                                                                                                                                                                   Standard Error t Stat  P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                              Upper 95.0%
                                                                             delta 15N_ca                -1.62                      -0.06                 0.14          2.06
                                                                                                                                                                           Intercept       -4.297428 4.671099 3.66
                                                                                                                                                                                            0.34                                    -2.34              -2.17
                                                                                                                                                                                                                -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341      -0.03
                                                                                                                                                                               X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569




                                                                                                                                                                                                                                                   4.00



                                                                                                                                                                                                                                                   3.00



                                                                                                                                                                                                                                                   2.00



                                                                                                                                                                                                                                                   1.00

                                                                                                                                                                                                                                                                      Series1

                                                                                                                                                                                                                                                   0.00
                                                                              -35.00                  -30.00                       -25.00                -20.00                 -15.00                  -10.00                  -5.00                  0.00

                                                                                                                                                                                                                                                  -1.00



                                                                                                                                                                                                                                                  -2.00



                                                                                                                                                                                                                                                  -3.00


                                                                                                                                                                                                                                         From	
  Stephanie	
  Hampton	
  
The	
  Fallout:	
  Where	
  data	
  end	
  up	
  
                                                       From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                  www




                         blog.order2disorder.com	
  




                                                                                                  From	
  Flickr	
  by	
  csessums	
  
  Data	
  
Metadata	
  




                                                                                                      From	
  Flickr	
  by	
  csessums	
  
                                                                          Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
The	
  Fallout	
  

                        Data	
  
                        Reuse	
  


                        Data	
  
                       Sharing	
  


                        Data	
  
                     Management	
  
Is data produced                                      Is the data produced            100	
  NSF	
  Dare data
                                                                                                      Where EB	
  awards	
  
           or reused?                                                shared?                  2005-­‐2009	
  
                                                                                                        shared?

                                                                                              	
  
        Is data produced
           or reused?
                                                             Is the data produced
                                                                     shared?
                                                                                              One	
  paper	
  from	
  each	
  
                                                                                                     Where areor
                                                                                                     GenBank data
                                                                                                        shared?
                                                                           Shared                    TreeBase
            Produced
                                                                           all                award	
            Else-
 Reused                                                       Shared                                                where
                                                              none                                      GenBank or
                                                                             Shared
                                                                           Shared
        Produced                                                                                        TreeBase
       Is data produced
          Both                                                   Is the data some
                                                                             produced                      Where are data
                                                                           all                                      Else-
          or reused?                                                    shared?                               shared?
 Reused                                                       Shared                                                where
                                                              none
                                                                             Shared
       Produced: 57% (37)                                   Shared all:     28% (17)
                                                                             some                    GenBank or
             Both                                                                                          GenBank or
       Reused:    8% (5)                                    Shared some: 15% (9)                     TreeBase:
               Produced                                                        Shared                      TreeBase (21)
                                                                                                                 81%
       Both:      35% (23)                                  Shared none: 57% (34)
                                                                               all                   Elsewhere:  19% (5)
                                                                                                                      Else-
       Reused                                                    Shared                                               where
       Produced: 57% (37)                                   Shared all:  28% (17)                    GenBank or
                                                                 none
       Reused:   8% (5)                                                     Shared
                                                            Shared some: 15% (9)                     TreeBase:   81% (21)
       Both:    Both (23)
                 35%                                                        some
                                                            Shared none: 57% (34)                    Elsewhere:  19% (5)



             Produced: 57% (37)                                   Shared all:  28% (17)                GenBank or
             Reused:   8% (5)                                     Shared some: 15% (9)                 TreeBase:    81% (21)
             Both:     35% (23)                                   Shared none: 57% (34)                Elsewhere:   19% (5)
Hampton	
  et	
  al.,	
  In	
  press,	
  Frontiers	
  in	
  Ecology	
  and	
  Evolution	
  
Why?	
  Barriers	
  to	
  Data	
  
                                                 Stewardship	
  
From	
  Flickr	
  by	
  iowa_spirit_walker	
  
From	
  Flickr	
  by	
  indigoprime	
  
                                                                                        Barriers:	
  Cost	
  




                                                   From	
  Flickr	
  by	
  kobiz7	
  
C.	
  Strasser	
  
Barriers:	
  Sociocultural	
  
                                 From	
  Flickr	
  by	
  freefotouk	
  




 Not	
  the	
  norm	
  
                   	
  
Barriers:	
  Sociocultural	
  

 Not	
  the	
  norm	
  
                        	
  
  Lack	
  of	
  /	
  too	
  
many	
  standards	
  
Barriers:	
  Sociocultural	
  

  Not	
  the	
  norm	
  
                         	
  
   Lack	
  of	
  /	
  too	
                                                     From	
  Flickr	
  by	
  toucanradio	
  



many	
  standards	
  
                         	
  
 Disparate	
  data	
            From	
  Flickr	
  by	
  Chris	
  Campbell	
  
Barriers:	
  Sociocultural	
  

                                  From	
  Flickr	
  by	
  uniinnsbruck	
  


  Not	
  the	
  norm	
  
                         	
  
   Lack	
  of	
  /	
  too	
  
many	
  standards	
  
                         	
  
 Disparate	
  data	
  
                         	
  
Lack	
  of	
  training	
  
From	
  Flickr	
  by	
  Christina	
  Ann	
  VanMeter	
  




      Missed	
  
      opportunities	
  
                                                                                                                          Loss	
  of	
  rights	
  or	
  benefits	
  




From	
  Flickr	
  by	
  pnh	
  
                                                                                                                                                           Barriers:	
  Sociocultural	
  

                                                                                                          Conflict	
  




                                                                                   From	
  Flickr	
  by	
  tymesynk	
  
                                     Misuse	
  
Barriers:	
  Sociocultural	
  
                                      Lack	
  of	
  incentives	
  

                                                                       Time	
  consuming	
  
                                                                       &	
  expensive	
  
                                                                       	
  
                                                                       No	
  
                                                                       requirements	
  
From	
  Flickr	
  by	
  bthomso	
  




                                                                       	
  
                                                                       Reward	
  
                                                                       structure	
  
From	
  Flickr	
  by	
  	
  Marquette	
  University	
  




generation?	
  
But	
  what	
  about	
  the	
  next	
  
Are	
  Undergrads	
  Learning	
  
About	
  Data	
  Management?	
  

•    Metadata	
  generation	
                 40	
  
•    Software	
  choice	
                      35	
  
•    File	
  naming	
  
•    QAQC	
                                   30	
  




                                          Important	
  
•    Backing	
  up	
  	
                       25	
  
•    Workflows	
  
                                              20	
  
•    Data	
  sharing	
  
•    Data	
  re-­‐use	
                        15	
  
•    Meta-­‐analysis	
  
                                              10	
  
•    Reproducibility	
  
•    Notebook	
  protocols	
                       5	
  
•    Databases	
  	
  
                                                  0	
  
          If	
  it’s	
  important,	
  why	
  0	
           10	
     Assessed	
  
                                                                       20	
        30	
     40	
  
                  isn’t	
  it	
  taught?	
  
Are	
  Undergrads	
  Learning	
  
About	
  Data	
  Management?	
  
                                                 Barriers:	
  

                           Too	
                               Not	
  a	
  
        Not	
            advanced	
                           priority	
  
    appropriate	
  
       level	
  

                         Students	
                Time	
  
                        don’t	
  know	
                                  No	
  
                         software	
  
                                                                         Lab	
  
            No	
  
         training	
                                           Covered	
  
                                       Too	
                   in	
  Lab	
  
                                       big	
  
C.	
  Strasser	
  




 The	
  Current	
  Landscape	
  
Who	
  cares?	
  
       	
  


                                                  From	
  Flickr	
  by	
  Redden-­‐McAllister	
  




   From	
  Flickr	
  by	
  AJC1	
     www.rba.gov.au	
  
Where	
  data	
  end	
  up	
  
                                                                    From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                               www




  Data	
  
                                                                                         www
Metadata	
  
                             From	
  Flickr	
  by	
  torkildr	
  




                                                                                       Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Trends	
  in	
  Data	
  Archiving	
  
Journal	
  publishers	
  
Joint	
  Data	
  Archiving	
  Agreement	
  
	
  
Data	
  Papers	
  etc.	
  
Ecological	
  Archives,	
  Beyond	
  the	
  PDF	
  
	
  
Funders	
  
Data	
  management	
  requirements	
  
	
  
What	
  is	
  a	
  data	
  management	
  plan?	
  
A	
  document	
  that	
  describes	
  what	
  you	
  will	
  do	
  with	
  your	
  data	
  
during	
  your	
  research	
  and	
  after	
  you	
  complete	
  your	
  research	
  
Why	
  should	
  a	
  scientist	
  prepare	
  a	
  DMP?	
  
                                  	
  
         	
  
         Saves	
  time	
  
         Increases	
  efficiency	
  
         Easier	
  to	
  use	
  data	
  	
  	
  
         Others	
  can	
  understand	
  &	
  use	
  data	
  
         Credit	
  for	
  data	
  products	
  
         Funders	
  require	
  it	
  
	
  
From	
  Flickr	
  by	
  einalem	
  

                                      The	
  Fallout	
  
NSF	
  DMP	
  Requirements	
  
 From	
  Grant	
  Proposal	
  Guidelines:	
  
	
  DMP	
  supplement	
  may	
  include:	
  
     1.  the	
  types	
  of	
  data,	
  samples,	
  physical	
  collections,	
  software,	
  curriculum	
  
         materials,	
  and	
  other	
  materials	
  to	
  be	
  produced	
  in	
  the	
  course	
  of	
  the	
  project	
  
  2.  	
  the	
  standards	
  to	
  be	
  used	
  for	
  data	
  and	
  metadata	
  format	
  and	
  content	
  (where	
  
      existing	
  standards	
  are	
  absent	
  or	
  deemed	
  inadequate,	
  this	
  should	
  be	
  
      documented	
  along	
  with	
  any	
  proposed	
  solutions	
  or	
  remedies)	
  
  3.  	
  policies	
  for	
  access	
  and	
  sharing	
  including	
  provisions	
  for	
  appropriate	
  
      protection	
  of	
  privacy,	
  confidentiality,	
  security,	
  intellectual	
  property,	
  or	
  other	
  
      rights	
  or	
  requirements	
  
  4.  	
  policies	
  and	
  provisions	
  for	
  re-­‐use,	
  re-­‐distribution,	
  and	
  the	
  production	
  of	
  
      derivatives	
  
  5.  	
  plans	
  for	
  archiving	
  data,	
  samples,	
  and	
  other	
  research	
  products,	
  and	
  for	
  
      preservation	
  of	
  access	
  to	
  them	
  
NSF’s	
  Vision*	
  


    DMPs	
  and	
  their	
  evaluation	
  will	
  grow	
  &	
  change	
  over	
  time	
  
    (similar	
  to	
  broader	
  impacts)	
  
    Peer	
  review	
  will	
  determine	
  next	
  steps	
  
    Community-­‐driven	
  guidelines	
  	
  
           –  Different	
  disciplines	
  have	
  different	
  definitions	
  of	
  acceptable	
  
              data	
  sharing	
  
           –  Flexibility	
  at	
  the	
  directorate	
  and	
  division	
  levels	
  
           –  Tailor	
  implementation	
  of	
  DMP	
  requirement	
  

    Evaluation	
  will	
  vary	
  with	
  directorate,	
  division,	
  &	
  program	
  
    officer	
  
    	
  
*Unofficially	
  
                                                                                Help	
  from	
  Jennifer	
  Schopf,	
  NSF	
  
dmp.cdlib.org	
  




                    dmponline.dcc.ac.uk	
  
Individual	
  Challenges	
  
                                                                                What	
  is	
  a	
  data	
  
                        Will	
  I	
  get	
  credit	
  
                        for	
  my	
  work?	
              Collect	
             management	
  
                                                                                   plan?	
  

                                     Analyze	
                          Assure	
  
                                                                                                   What	
  is	
  
What	
  tools	
  do	
  I	
                                                                        metadata?	
  
    use?	
  
                                                                                                   Are	
  there	
  
                                                                                                  standards?	
  

                    Integrate	
                                                Describe	
  

                                                                                               How	
  much	
  will	
  
                                                                                                 it	
  cost?	
  
 Who	
  can	
  help	
  
    me?	
  
                                    Discover	
                          Deposit	
  
                Where	
  do	
  I	
                                                How	
  do	
  I	
  
               preserve	
  my	
                          Preserve	
             preserve	
  my	
  
                  data?	
                                                          data?	
  
NSF	
  funded	
  DataNet	
  Project	
  
Office	
  of	
  Cyberinfrastructure	
  

                                              Community	
  
           Cyberinfrastructure	
             Engagement	
  &	
  
                                               Outreach	
  




                                          Courtesy	
  of	
  DataONE	
  
What	
  role	
  can	
  
                                                             libraries	
  play	
  in	
  
                                                             data	
  education?	
  


                                         What	
  barriers	
  to	
  sharing	
  
                                           can	
  we	
  eliminate?	
  
            Why	
  don’t	
  people	
  
              share	
  data?	
  
                                     Is	
  data	
  management	
  
Do	
  attitudes	
  about	
  
                                            being	
  taught?	
  
  sharing	
  differ	
  
among	
  disciplines?	
  
                                           How	
  can	
  we	
  promote	
  storing	
  
                                              data	
  in	
  repositories?	
  
dataup.cdlib.org	
  
@DataUpCDL	
  
facebook.com/DataUpCDL	
  

                                  carlystrasser.net	
  
                        carlystrasser@gmail.com	
  
                                   @carlystrasser	
  

Mais conteúdo relacionado

Mais procurados

UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsCarly Strasser
 
Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Carly Strasser
 
Open Data & Open Access - DLF 2012
Open Data & Open Access - DLF 2012Open Data & Open Access - DLF 2012
Open Data & Open Access - DLF 2012Carly Strasser
 
Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Carly Strasser
 
Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012Carly Strasser
 
Data Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA WeekData Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA WeekCarly Strasser
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekCarly Strasser
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsCarly Strasser
 
Data Management: Scientist Perspective - UC3 Data Curation Workshop
Data Management: Scientist Perspective - UC3 Data Curation WorkshopData Management: Scientist Perspective - UC3 Data Curation Workshop
Data Management: Scientist Perspective - UC3 Data Curation WorkshopCarly Strasser
 
Data Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFData Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFCarly Strasser
 
Data Management Planning for ESA 2013
Data Management Planning for ESA 2013Data Management Planning for ESA 2013
Data Management Planning for ESA 2013Carly Strasser
 
Data Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopData Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopCarly Strasser
 
DMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for SuccessDMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for SuccessCarly Strasser
 
Needs for Data Management & Citation Throughout the Information Lifecycle
Needs for Data Management & Citation Throughout  the Information LifecycleNeeds for Data Management & Citation Throughout  the Information Lifecycle
Needs for Data Management & Citation Throughout the Information LifecycleMicah Altman
 
Supporting UC Research Data Management
Supporting UC Research Data ManagementSupporting UC Research Data Management
Supporting UC Research Data Managementslabrams
 
A Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesA Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesIan Mulvany
 

Mais procurados (18)

UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for Scientists
 
Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012
 
Open Data & Open Access - DLF 2012
Open Data & Open Access - DLF 2012Open Data & Open Access - DLF 2012
Open Data & Open Access - DLF 2012
 
Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012
 
Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012
 
Data Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA WeekData Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA Week
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research Week
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for Scientists
 
Data Management: Scientist Perspective - UC3 Data Curation Workshop
Data Management: Scientist Perspective - UC3 Data Curation WorkshopData Management: Scientist Perspective - UC3 Data Curation Workshop
Data Management: Scientist Perspective - UC3 Data Curation Workshop
 
Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)
 
Data Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFData Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UF
 
Data Management Planning for ESA 2013
Data Management Planning for ESA 2013Data Management Planning for ESA 2013
Data Management Planning for ESA 2013
 
Data Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopData Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities Workshop
 
DMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for SuccessDMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for Success
 
Needs for Data Management & Citation Throughout the Information Lifecycle
Needs for Data Management & Citation Throughout  the Information LifecycleNeeds for Data Management & Citation Throughout  the Information Lifecycle
Needs for Data Management & Citation Throughout the Information Lifecycle
 
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data EquivalenceNISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
 
Supporting UC Research Data Management
Supporting UC Research Data ManagementSupporting UC Research Data Management
Supporting UC Research Data Management
 
A Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesA Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific Curiosities
 

Mais de Carly Strasser

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeCarly Strasser
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015Carly Strasser
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceCarly Strasser
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyCarly Strasser
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014Carly Strasser
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataCarly Strasser
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarCarly Strasser
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopCarly Strasser
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Carly Strasser
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumCarly Strasser
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14Carly Strasser
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Carly Strasser
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishCarly Strasser
 

Mais de Carly Strasser (20)

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of Change
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career Conference
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14sky
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminar
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories Workshop
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch Libraries
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science Workshop
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014
 
Dash for IASSIST 2014
Dash for IASSIST 2014Dash for IASSIST 2014
Dash for IASSIST 2014
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience Symposium
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or Perish
 

Último

20240408 Bending Backwards to the Second Step Up.docx
20240408 Bending Backwards to the Second Step Up.docx20240408 Bending Backwards to the Second Step Up.docx
20240408 Bending Backwards to the Second Step Up.docxSharon Liu
 
PGC _ 3.1 _ Powerpoint (2024) scorm ready.pptx
PGC _ 3.1 _ Powerpoint (2024) scorm ready.pptxPGC _ 3.1 _ Powerpoint (2024) scorm ready.pptx
PGC _ 3.1 _ Powerpoint (2024) scorm ready.pptxaleonardes
 
Project & Portfolio, Market Analysis: WWE
Project & Portfolio, Market Analysis: WWEProject & Portfolio, Market Analysis: WWE
Project & Portfolio, Market Analysis: WWEDeShawn Ellis
 
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docxJORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docxArturo Pacheco Alvarez
 
Benifits of Individual And Team Sports-Group 7.pptx
Benifits of Individual And Team Sports-Group 7.pptxBenifits of Individual And Team Sports-Group 7.pptx
Benifits of Individual And Team Sports-Group 7.pptxsherrymieg19
 
BADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptx
BADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptxBADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptx
BADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptxvillenoc6
 
Resultados 20 KMS "Podebrady" Walking 2024
Resultados 20 KMS "Podebrady" Walking 2024Resultados 20 KMS "Podebrady" Walking 2024
Resultados 20 KMS "Podebrady" Walking 2024Judith Chuquipul
 
Clash of Titans_ PSG vs Barcelona (1).pdf
Clash of Titans_ PSG vs Barcelona (1).pdfClash of Titans_ PSG vs Barcelona (1).pdf
Clash of Titans_ PSG vs Barcelona (1).pdfMuhammad Hashim
 
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024DONAL88 GACOR
 
PPT on INDIA VS PAKISTAN - A Sports Rivalry
PPT on INDIA VS PAKISTAN - A Sports RivalryPPT on INDIA VS PAKISTAN - A Sports Rivalry
PPT on INDIA VS PAKISTAN - A Sports Rivalryanirbannath184
 

Último (10)

20240408 Bending Backwards to the Second Step Up.docx
20240408 Bending Backwards to the Second Step Up.docx20240408 Bending Backwards to the Second Step Up.docx
20240408 Bending Backwards to the Second Step Up.docx
 
PGC _ 3.1 _ Powerpoint (2024) scorm ready.pptx
PGC _ 3.1 _ Powerpoint (2024) scorm ready.pptxPGC _ 3.1 _ Powerpoint (2024) scorm ready.pptx
PGC _ 3.1 _ Powerpoint (2024) scorm ready.pptx
 
Project & Portfolio, Market Analysis: WWE
Project & Portfolio, Market Analysis: WWEProject & Portfolio, Market Analysis: WWE
Project & Portfolio, Market Analysis: WWE
 
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docxJORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
 
Benifits of Individual And Team Sports-Group 7.pptx
Benifits of Individual And Team Sports-Group 7.pptxBenifits of Individual And Team Sports-Group 7.pptx
Benifits of Individual And Team Sports-Group 7.pptx
 
BADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptx
BADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptxBADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptx
BADMINTON EQUIPMENTS / EQUIPMENTS GROUP9.pptx
 
Resultados 20 KMS "Podebrady" Walking 2024
Resultados 20 KMS "Podebrady" Walking 2024Resultados 20 KMS "Podebrady" Walking 2024
Resultados 20 KMS "Podebrady" Walking 2024
 
Clash of Titans_ PSG vs Barcelona (1).pdf
Clash of Titans_ PSG vs Barcelona (1).pdfClash of Titans_ PSG vs Barcelona (1).pdf
Clash of Titans_ PSG vs Barcelona (1).pdf
 
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
 
PPT on INDIA VS PAKISTAN - A Sports Rivalry
PPT on INDIA VS PAKISTAN - A Sports RivalryPPT on INDIA VS PAKISTAN - A Sports Rivalry
PPT on INDIA VS PAKISTAN - A Sports Rivalry
 

Data Management from a Scientist's Perspective

  • 1. From  Calisphere  via  California  State  University  Libraries,     Data   Management   A  Scientist’s    ark:/13030/c818356g   Perspective   Carly  Strasser   California  Digital  Library   University  of  Florida  Libraries   University  of  California  Curation  Center   August  2012  
  • 2. C.  Strasser   C.  Strasser   Courtesy  of  WHOI   C.  Strasser   C.  Strasser  
  • 3. C.  Strasser   C.  Strasser   North  Atlantic  right  whale  mother  and  calf,   C.  Strasser   by  Gill  Braulik  under  Permit  No.  655-­‐1652  
  • 4. Roadmap   5.  Landscape   4.  Barriers     3.  The  Fallout     2.  The  world  of  data   1.  A  brief  history  of  data  collection     C.  Strasser  
  • 5. A  Brief   From  Calisphere  via  Santa  Clara  University,     History  of   Data   ark:/13030/kt696nc7j2   Collection   Or…  how  scientists  came  to  be  so   bad  at  data  management  
  • 6. The  lab/field  notebook   Curie   Newton   Darwin   Da  Vinci   classicalschool.blogspot.com  
  • 7. The  lab/field  notebook   From  Calisphere  via  Fullerton  College,    ark:/13030/kt5c60273t  
  • 8. From  Flickr  by    DW0825   From  Flickr  by  Flickmor   From  Flickr  by    deltaMike   The  lab/field  notebook…?   www.woodrow.org   C.  Strasser   Courtesey  of  WHOI   From  Flickr  by  US  Army  Environmental  Command  
  • 9. From  Flickr  by    DW0825   From  Flickr  by  Flickmor   From  Flickr  by    deltaMike   Digital  data   www.woodrow.org   C.  Strasser   Courtesey  of  WHOI   From  Flickr  by  US  Army  Environmental  Command  
  • 10. Digital  data   +     Complex   workflows  
  • 11. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  • 12. From  Flickr  by  stevecadman   The  Wide  World  of  Data  
  • 14. Dimensions  of  Data   Datum   Data  file   Metadata   Dataset   Data  collection   Data  repository  
  • 15. Data  Diversity   Temporal   File  size   Units   File   structure   organization   File  type   Documentation   Spatial   extent   structure   Metadata   Collection   Codes   Project  intent   practices   Analysis  
  • 16. Big  Data   OSTP  March  2012   Big  Data  Effort  Launched  
  • 17. Guys?   The  Little   From  Flickr  by  jason  tinder  
  • 18. The  Long  Tail   Size  of   dataset   grant  ($)   #  datasets   #  researchers   #  grants  
  • 19. The  Long  Tail   300 NSF  DEB  2005-­‐2010   250 n  =  1234   Number of Awards 200 150 100 50 0 0.1 0.5 1 1.5 2 >2.5 Award Amount (millions of dollars) Hampton  et  al.,  In  press,  Frontiers  in  Ecology  and  Evolution  
  • 20. From  Flickr  by  Old  Shoe  Woman   The  Fallout  
  • 21. UGLY TRUTH Many  (most?)   researchers…     5shortessays.blogspot.com       are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data    
  • 22. Information  Entropy     Fig.  1  of  Michener  et  al.  1997  
  • 23. From  Calisphere  via  San  Jose  Public  Library   How  bad  can  it  be?  
  • 24. 2  tables   Random  notes   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices   From  Stephanie  Hampton  
  • 25. Wash  Cres  Lake  Dec  15  Dont_Use.xls   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices   From  Stephanie  Hampton  
  • 26. Random  stats  output   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 From  Stephanie  Hampton  
  • 27. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 SampleID -22.68 -22.22 ALG03 0.34 ALG05 4.31 3.66 ALG07 25376 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 Weight (mg) -29.44 -28.98 2.91 1.74 0.62 2.91 -0.03 25382 3.04 2.95 Square 0.080178 R 3.01 3 2.99 2.92 2.9 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 %C-29.23 6.85 1.07 0.95 35.560.30 25388 33.49 41.17 Observations43.74 11 4.51 1.59 4.37 33.58 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 delta 13C -21.85 -21.11 0.45 4.72 -28.054.07 25392 -29.56 -27.32 ANOVA -27.50 -22.68 -24.58 -21.06 -29.44 C1 ALG04 2.98 37.90 delta 13C_ca -27.42 -26.96 -20.65 1.36 1.21 -27.590.56 25394 -29.10 c -26.86 -27.04 df SS -22.22 MS F -24.12 Significance F -20.60 -28.98 C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 %N 0.48 1.17 2.30 1.68 1.97 Total 1.3610 35.55962 0.34 0.15 0.34 1.74 delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% delta 15N_ca -1.62 -0.06 0.14 2.06 Intercept -4.297428 4.671099 3.66 0.34 -2.34 -2.17 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 -0.03 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 4.00 3.00 2.00 1.00 Series1 0.00 -35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00 -1.00 -2.00 -3.00 From  Stephanie  Hampton  
  • 28. The  Fallout:  Where  data  end  up   From  Flickr  by  diylibrarian   www blog.order2disorder.com   From  Flickr  by  csessums   Data   Metadata   From  Flickr  by  csessums   Recreated  from  Klump  et  al.  2006  
  • 29. The  Fallout   Data   Reuse   Data   Sharing   Data   Management  
  • 30. Is data produced Is the data produced 100  NSF  Dare data Where EB  awards   or reused? shared? 2005-­‐2009   shared?   Is data produced or reused? Is the data produced shared? One  paper  from  each   Where areor GenBank data shared? Shared TreeBase Produced all award   Else- Reused Shared where none GenBank or Shared Shared Produced TreeBase Is data produced Both Is the data some produced Where are data all Else- or reused? shared? shared? Reused Shared where none Shared Produced: 57% (37) Shared all: 28% (17) some GenBank or Both GenBank or Reused: 8% (5) Shared some: 15% (9) TreeBase: Produced Shared TreeBase (21) 81% Both: 35% (23) Shared none: 57% (34) all Elsewhere: 19% (5) Else- Reused Shared where Produced: 57% (37) Shared all: 28% (17) GenBank or none Reused: 8% (5) Shared Shared some: 15% (9) TreeBase: 81% (21) Both: Both (23) 35% some Shared none: 57% (34) Elsewhere: 19% (5) Produced: 57% (37) Shared all: 28% (17) GenBank or Reused: 8% (5) Shared some: 15% (9) TreeBase: 81% (21) Both: 35% (23) Shared none: 57% (34) Elsewhere: 19% (5) Hampton  et  al.,  In  press,  Frontiers  in  Ecology  and  Evolution  
  • 31. Why?  Barriers  to  Data   Stewardship   From  Flickr  by  iowa_spirit_walker  
  • 32. From  Flickr  by  indigoprime   Barriers:  Cost   From  Flickr  by  kobiz7   C.  Strasser  
  • 33. Barriers:  Sociocultural   From  Flickr  by  freefotouk   Not  the  norm    
  • 34. Barriers:  Sociocultural   Not  the  norm     Lack  of  /  too   many  standards  
  • 35. Barriers:  Sociocultural   Not  the  norm     Lack  of  /  too   From  Flickr  by  toucanradio   many  standards     Disparate  data   From  Flickr  by  Chris  Campbell  
  • 36. Barriers:  Sociocultural   From  Flickr  by  uniinnsbruck   Not  the  norm     Lack  of  /  too   many  standards     Disparate  data     Lack  of  training  
  • 37. From  Flickr  by  Christina  Ann  VanMeter   Missed   opportunities   Loss  of  rights  or  benefits   From  Flickr  by  pnh   Barriers:  Sociocultural   Conflict   From  Flickr  by  tymesynk   Misuse  
  • 38. Barriers:  Sociocultural   Lack  of  incentives   Time  consuming   &  expensive     No   requirements   From  Flickr  by  bthomso     Reward   structure  
  • 39. From  Flickr  by    Marquette  University   generation?   But  what  about  the  next  
  • 40. Are  Undergrads  Learning   About  Data  Management?   •  Metadata  generation   40   •  Software  choice   35   •  File  naming   •  QAQC   30   Important   •  Backing  up     25   •  Workflows   20   •  Data  sharing   •  Data  re-­‐use   15   •  Meta-­‐analysis   10   •  Reproducibility   •  Notebook  protocols   5   •  Databases     0   If  it’s  important,  why  0   10   Assessed   20   30   40   isn’t  it  taught?  
  • 41. Are  Undergrads  Learning   About  Data  Management?   Barriers:   Too   Not  a   Not   advanced   priority   appropriate   level   Students   Time   don’t  know   No   software   Lab   No   training   Covered   Too   in  Lab   big  
  • 42. C.  Strasser   The  Current  Landscape  
  • 43. Who  cares?     From  Flickr  by  Redden-­‐McAllister   From  Flickr  by  AJC1   www.rba.gov.au  
  • 44. Where  data  end  up   From  Flickr  by  diylibrarian   www Data   www Metadata   From  Flickr  by  torkildr   Recreated  from  Klump  et  al.  2006  
  • 45. Trends  in  Data  Archiving   Journal  publishers   Joint  Data  Archiving  Agreement     Data  Papers  etc.   Ecological  Archives,  Beyond  the  PDF     Funders   Data  management  requirements    
  • 46. What  is  a  data  management  plan?   A  document  that  describes  what  you  will  do  with  your  data   during  your  research  and  after  you  complete  your  research  
  • 47. Why  should  a  scientist  prepare  a  DMP?       Saves  time   Increases  efficiency   Easier  to  use  data       Others  can  understand  &  use  data   Credit  for  data  products   Funders  require  it    
  • 48. From  Flickr  by  einalem   The  Fallout  
  • 49. NSF  DMP  Requirements   From  Grant  Proposal  Guidelines:    DMP  supplement  may  include:   1.  the  types  of  data,  samples,  physical  collections,  software,  curriculum   materials,  and  other  materials  to  be  produced  in  the  course  of  the  project   2.   the  standards  to  be  used  for  data  and  metadata  format  and  content  (where   existing  standards  are  absent  or  deemed  inadequate,  this  should  be   documented  along  with  any  proposed  solutions  or  remedies)   3.   policies  for  access  and  sharing  including  provisions  for  appropriate   protection  of  privacy,  confidentiality,  security,  intellectual  property,  or  other   rights  or  requirements   4.   policies  and  provisions  for  re-­‐use,  re-­‐distribution,  and  the  production  of   derivatives   5.   plans  for  archiving  data,  samples,  and  other  research  products,  and  for   preservation  of  access  to  them  
  • 50. NSF’s  Vision*   DMPs  and  their  evaluation  will  grow  &  change  over  time   (similar  to  broader  impacts)   Peer  review  will  determine  next  steps   Community-­‐driven  guidelines     –  Different  disciplines  have  different  definitions  of  acceptable   data  sharing   –  Flexibility  at  the  directorate  and  division  levels   –  Tailor  implementation  of  DMP  requirement   Evaluation  will  vary  with  directorate,  division,  &  program   officer     *Unofficially   Help  from  Jennifer  Schopf,  NSF  
  • 51. dmp.cdlib.org   dmponline.dcc.ac.uk  
  • 52. Individual  Challenges   What  is  a  data   Will  I  get  credit   for  my  work?   Collect   management   plan?   Analyze   Assure   What  is   What  tools  do  I   metadata?   use?   Are  there   standards?   Integrate   Describe   How  much  will   it  cost?   Who  can  help   me?   Discover   Deposit   Where  do  I   How  do  I   preserve  my   Preserve   preserve  my   data?   data?  
  • 53. NSF  funded  DataNet  Project   Office  of  Cyberinfrastructure   Community   Cyberinfrastructure   Engagement  &   Outreach   Courtesy  of  DataONE  
  • 54. What  role  can   libraries  play  in   data  education?   What  barriers  to  sharing   can  we  eliminate?   Why  don’t  people   share  data?   Is  data  management   Do  attitudes  about   being  taught?   sharing  differ   among  disciplines?   How  can  we  promote  storing   data  in  repositories?  
  • 55.
  • 56. dataup.cdlib.org   @DataUpCDL   facebook.com/DataUpCDL   carlystrasser.net   carlystrasser@gmail.com   @carlystrasser