Scaling API-first – The story of a global engineering organization
Mblwhoil2010 Heidorn
1. Biodiversity Informatics: Mining Untapped Resources February 8, 2010 Marine Biology Laboratory and Woods Hole Oceanographic Institute Library P. Bryan Heidorn Director University of Arizona School of Information Resources and Library Science
2.
3.
4.
5. Naive View of Science Data GenBank PDB f ( x )= ax k + o ( x k ) Power Law of Science Data f ( x )= ax k + o ( x k )| X<.20 Data Volume Science Projects and Initiatives
6. Does NSF’s Data Follow the Power Law? I do not know but if $1 = X bytes…..
7. 20-80 Rule The small are big! Total Grants 9347 $2,137,636,716 20% 80% Number Grants 1869 7478 Total Dollars $1,199,088,125 $938,548,595 Range $6,892,810-$350,000 $350,000- $831
16. Automatic Metadata Extraction (Darwin Core) From Museum Specimen Labels … <co> Curtis, </co><hdlc> North American Pl </hdlc><cnl> No.</cnl><cn> 503*</cn> <gn> Polygala</gn><sp> ambigua,</sp><sa> Nutt.,</sa><val> var.</val> <hb> Coral soil,</hb><lc> Cudjoe Key, South Florida. </lc><col> Legit</col><co> A. H. Curtiss.</co><dt>February</dt>… With Qin Wei, Univ of Illinois
28. Learning w/ pre categorization Gold Labels Machine Learner Model n Classified Labels Class 1 Labels Categor- ization Class 2 Labels Class n Labels Machine Learner Machine Learner Model 2 Model 1 Class 1 Labels Categor- ization Class 2 Labels Class n Labels Machine Classification Machine Classification Machine Classification Classified Labels Classified Labels Unclassified Labels
29. FIG. 5. Improved Performance of Specialist Model Specialist100 Curtiss VS 100 General Iterations 0 200 0 100 Specialist Random
30. P. Bryan Heidorn 1 , Hong Zhang 1 , Eugene Chung 2 and BGWG 1 Graduate School of Library and Information Science, 2 Linguistics, University of Illinois Machine Learning in BioGeomancer’s Locality Specification SPNHC & NSCA 2006
33. Example Locality Types Record # Specification of Location Locality Type 43 dario 7 mi wnw of; RIO VIEJO FOH; F 86 near Aleutian Islands; S of Amukta Pass NF; FH 100 INDIAN CREEK, 11 MI. W HWY 160 P; POH 109 TIESMA RD, 1.5 MI NW EDGEWATER; OFF LAKE MICHIGAN R P; FOH; NP 160 WALTMAN, 9 MI N, 2.5 MI W OF FOO 181 0.4 mi N Collinston on LA 138 FPOH 204 Seward Peninsula; vic. Bluff, S coast F; NF; FS
34.
35.
36.
37. Information Extraction From FNA Templates for useful information Extraction Rules Structured information Leaf_Shape obovate Leaf_Shape orbiculate Blade_Dimension 3—9 x 3—8 cm ………… .. ………… .. Original documents ……… .. Leaf blade obovate to nearly orbiculate, 3--9 × 3--8 cm, leathery, base obtuse to broadly cuneate, margins flat, coarsely and often irregularly doubly serrate to nearly dentate, . ……………… Knowledge bases … .. PartBlade: Leaf blade Blades blade …… Pattern:: * <PartBlade> ' ' <leafShape> * ( <leafShape> ) ',' * Output:: leaf {leafShape $1} Pattern:: * <PartBlade> * ', ' ( <Range> ' ' * <LengUnit> ) * <PartBase> Output:: leaf {bladeDimension $1} User log analysis Leaf_Shape Leaf_Margin Leaf_Apex Leaf_Base Blade_Dimension … .. … ..
38. Results – System Performance NT: number of tasks accomplished in total NTH: number of tasks accomplished per hour TSR: task success rate SSR: search success rate NSST: number of searches to accomplish a task TST: time spent to accomplish a task NDVST: number of documents viewed to accomplish a task Group NT NTH TSR SSR NSST TST NDVST SEARFA 6.75 8.078 0.860 0.210 4.779 338.8 11.16 SEARF 4.50 3.598 0.568 0.053 9.584 435.2 14.75 Sig.(ANOVA) 0.005 0.005 0.000 0.011 0.000 0.72 0.162
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
Notas do Editor
Figure 1. Bar graphs depicting phylogenetically corrected mean differences between species groups for two climate change response traits: the correlation coefficient between first flowering day and annual spring temperature for the time period of 1888–1902 (A; i.e., flowering time tracking ), and the shift in mean first flowering day during the period exhibiting the most dramatic increase in mean annual temperature, from 1900–2006 (B; i.e., flowering time shift ).