SlideShare uma empresa Scribd logo
1 de 43
Databasing the World:Biodiversity and the 2000s Written by Bowker, G. C.  Presented by Chen Zhang (Mike)
Four Key Aspects Database Infrastructure Standards—flexible, stable Technology—stable  Communication Data Sharing Ownership Disarticulation Data collection
Four Key Aspects Distributed Collective Practice Collaborate work New Knowledge Economy Accounting for life Development of Classification Cladistics The Future
Database    Infrastructure
Standards Why do we need standards Example of air-conditioner industry Diameter Match between screw and the hole on the panel Reasons for database Need ‘handshake’ among various media MIME<Multipurpose Internet Mail Extensions>protocol  Each layer of infrastructure requires its own set of standards Need standardized  categories.
Standards Standards will not always win Some best-known standards QWERTY keyboard
Standards Standards will not always win Some best-known standards VHS (Video Home System) standard
Standards Standards will not always win Some best-known standards DOS computing system
Standards Standards will not always win Why? The best standard maybe doesn’t have best market Standards setting is a key site of political work The inferior standard may be respected by the political agency. ( Such as standards-setting bodies)
Standards Interoperability Continuum of strategies for standards setting One Standard Fits All Let A Thousand standards bloom
Standards Interoperability Some Related Standards 1. ANSI/NISO Z39.50 ANSI/NISO Z39.50 is the American National Standard Information Retrieval Application Service Definition and Protocol Specification for Open Systems Interconnection. 	IT makes it easier to use large information databases by standardizing the procedures and features for searching and retrieving information.
Standards Interoperability Some Related Standards ANSI/NISO Z39.50
Standards Interoperability Some Related Standards 1. ANSI/NISO Z39.50 	A single enquiry over multiple databases. 	widely adopter in the library world.
Standards Interoperability Some Related Standards 2. XML Extensible Markup Language (XML) is a set of rules for encoding documents in machine-readable form. 	Two extremes: 	a. Colonial model b. Democratic model (win out) 	       People’s established computing environment
Technology Technology must be stable Nothing to guarantee the stability of vast data sets Failure of Paul Otlet’s  well catalogued microfiches Development of computer memory Hard to retrieve information
Technology Technology must stable Data accessible and usable Infrastructure will require a continued maintenance effort Reasons 	a.  Data is passed from one medium to another b.  Data is analyzed by one generation of database technology to the next.
Issues of Communication Problem of reliable metadata Metadata—data about data The blue lines  are metadata
Issues of Communication Problem of reliable metadata The standard name of certain kinds of data Searchable—easy to search over multiple database Issue—how detail does the name of data should be? Lack of details— the information of data is useless Too many details— longer time, more work
Issues of Communication Dublin code The Dublin Core set ofmetadata elements provides a small and fundamental group of text elements through which most resources can be described and cataloged. The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements: Language Relation Coverage Rights Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source
Data Sharing
Ownership Control of knowledge Mid-nineteenth century:  only professionally trained scientists and doctors  New information economy:  from many people Example: patients group
Ownership Privacy Keep data private is difficult : 	Example: data is complied by third-company to generate a new, marketable form of knowledge New Patterns of ownership Science has frequently been analyzed as a “public good” Increasing privatization of knowledge :   	It is unclear to what extent the vaunted openness of the scientific community will last
Disarticulation Ideal database Should according to most practitioners be theory-neutral, but should serve as a common basis for a number of scientific disciplines to progress. Example: genome databank new kind of science  genome construct arguments about the genetic causation ≠ the process of mapping the genome ,[object Object]
The data in a database should be easily manipulated by other scientists.,[object Object]
Data Collection Deal with old data Difficulties Scientific paper don’t in general offer enough information to allow an experiment or procedure to be repeated. The distributed database is becoming a new model form of scientific publication in its own right Issues of Update No automatic update from one field to a cognate one Scientist are not able to share information across discipline divides
Data Collection International Technoscience Purpose: Narrow the gaps between countries Issues: People do not have equal knowledge Access is never really equal Government have doubts of the usefulness of opening the database onto internet.
Distributed Collective Practice
Collaborative Work Management structures in universities and industry still tend to support the heroic myth of the individual researcher. What kind of value the large publishing houses add to journal production. Great attention must be paid to the social and organizational setting of technoscientific work
New Knowledge Economy Three central issues The development of flexible, stable data standard The generation of protocols for data sharing The restructuring of scientific careers
Accounting For Life
Development of Classification Introduction: PANDORA taxonomic database
Development of Classification Importance of classification 18th-19th centuries : botanist must know all genera, and commit their names to memory, but cannot be expected to remember all specific names. ( A.J. Cain, 1958) Later part of 19th century: new information technologies developed which permitted the easy storage and coding of larger amounts of data than could previously be easily manipulated. (Chandler,1977),(Yates,1989)
Development of Classification Example of classification Paper-based archival practice. Issues: hard to reclassified Type specimen had to be relocated physically So do Series of articles or books
Development of Classification Example of classification Multifaceted classification system Improve: Enabling the classifications to be ordered in multiple ways, rather than in a single Example: A collection of books might be classified using an author facet, a subject facet, a date facet
Development of Classification Example of classification Hierarchical classification (for reading the past) E.F. Codd In the early 1970s Split physical storage of data in the computer and the representation of that data. Disadvantage: becomes awkward to introduce other levels of taxonomic category as an afterthought. Improve method: one record for every name, regardless of its taxonomic level
Cladistics Definition It is a method of classifying species of organisms into groups called clades, which consist of 1) all the descendants of an ancestral organism and 2) the ancestor itself. Features : Give a more regular algorithm for determining phylogeny Focusing attention on shared, derived characteristics of set organisms Using ‘outgroup’ comparisons to develop the classification system
Cladistics Tree of life Cladists use cladograms, diagrams which show ancestral relations between taxa, to represent the evolutionary tree of life Charles Darwin (1809–1882) was the first to produce an evolutionary tree of life
Cladistics Tree of life
Cladistics Computer programs in cladistics Undertaken using Swofford’s (1985) package PAUP version 2.4installed on a Cyber mainframe computer and version 2.4.1 on an amstrad 1512 PC David Swofford’s PAUP is a software package for inference of evolutionary trees Purpose: follow a given algorithm for generating and testing cladograms
Cladistics Computer programs in cladistics
Cladistics Computer programs in cladistics Issues: The packages produce variable results and cannot possibly look at all the possibilities, since there is NP-complete problem. Algorithm issues
The Future Store the life Life is described as itself a program, with DNA being code. IF everything is information, then life can equally well be “stored”
THANK YOU !

Mais conteúdo relacionado

Mais procurados

Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008
bosc_2008
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
Carole Goble
 
E Research Chapter 1
E Research Chapter 1E Research Chapter 1
E Research Chapter 1
guest2426e1d
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
Norman Morrison
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
Moving From Small Science To Big Science
Moving From Small Science To Big ScienceMoving From Small Science To Big Science
Moving From Small Science To Big Science
Webometrics Class
 

Mais procurados (20)

Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
E Research Chapter 1
E Research Chapter 1E Research Chapter 1
E Research Chapter 1
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
 
Moving From Small Science To Big Science
Moving From Small Science To Big ScienceMoving From Small Science To Big Science
Moving From Small Science To Big Science
 
Moving From Small Science To Big Science
Moving From Small Science To Big ScienceMoving From Small Science To Big Science
Moving From Small Science To Big Science
 
The Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military DataThe Role of Ontology in the Era of Big Military Data
The Role of Ontology in the Era of Big Military Data
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher?
 
IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
IAO-Intel: An Ontology of Information Artifacts in the Intelligence DomainIAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
 

Destaque (9)

Blog
BlogBlog
Blog
 
Getting started with delicious
Getting started with deliciousGetting started with delicious
Getting started with delicious
 
[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...
[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...
[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...
 
RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...
RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...
RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...
 
Practicum slideshow
Practicum slideshowPracticum slideshow
Practicum slideshow
 
Cp indicator
Cp indicatorCp indicator
Cp indicator
 
Family Newsletter
Family NewsletterFamily Newsletter
Family Newsletter
 
Practicum slideshow
Practicum slideshowPracticum slideshow
Practicum slideshow
 
Dogs
DogsDogs
Dogs
 

Semelhante a Databasing the world

kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
butest
 
Hedstrom Infrastructure
Hedstrom InfrastructureHedstrom Infrastructure
Hedstrom Infrastructure
guest2c9ba28e
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
Rafael C. Jimenez
 

Semelhante a Databasing the world (20)

Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Metadata standards
Metadata standardsMetadata standards
Metadata standards
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
Hedstrom Infrastructure
Hedstrom InfrastructureHedstrom Infrastructure
Hedstrom Infrastructure
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 
Digital Destiny
Digital DestinyDigital Destiny
Digital Destiny
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Databasing the world

  • 1. Databasing the World:Biodiversity and the 2000s Written by Bowker, G. C. Presented by Chen Zhang (Mike)
  • 2. Four Key Aspects Database Infrastructure Standards—flexible, stable Technology—stable Communication Data Sharing Ownership Disarticulation Data collection
  • 3. Four Key Aspects Distributed Collective Practice Collaborate work New Knowledge Economy Accounting for life Development of Classification Cladistics The Future
  • 4. Database Infrastructure
  • 5. Standards Why do we need standards Example of air-conditioner industry Diameter Match between screw and the hole on the panel Reasons for database Need ‘handshake’ among various media MIME<Multipurpose Internet Mail Extensions>protocol Each layer of infrastructure requires its own set of standards Need standardized categories.
  • 6. Standards Standards will not always win Some best-known standards QWERTY keyboard
  • 7. Standards Standards will not always win Some best-known standards VHS (Video Home System) standard
  • 8. Standards Standards will not always win Some best-known standards DOS computing system
  • 9. Standards Standards will not always win Why? The best standard maybe doesn’t have best market Standards setting is a key site of political work The inferior standard may be respected by the political agency. ( Such as standards-setting bodies)
  • 10. Standards Interoperability Continuum of strategies for standards setting One Standard Fits All Let A Thousand standards bloom
  • 11. Standards Interoperability Some Related Standards 1. ANSI/NISO Z39.50 ANSI/NISO Z39.50 is the American National Standard Information Retrieval Application Service Definition and Protocol Specification for Open Systems Interconnection. IT makes it easier to use large information databases by standardizing the procedures and features for searching and retrieving information.
  • 12. Standards Interoperability Some Related Standards ANSI/NISO Z39.50
  • 13. Standards Interoperability Some Related Standards 1. ANSI/NISO Z39.50 A single enquiry over multiple databases. widely adopter in the library world.
  • 14. Standards Interoperability Some Related Standards 2. XML Extensible Markup Language (XML) is a set of rules for encoding documents in machine-readable form. Two extremes: a. Colonial model b. Democratic model (win out) People’s established computing environment
  • 15. Technology Technology must be stable Nothing to guarantee the stability of vast data sets Failure of Paul Otlet’s well catalogued microfiches Development of computer memory Hard to retrieve information
  • 16. Technology Technology must stable Data accessible and usable Infrastructure will require a continued maintenance effort Reasons a. Data is passed from one medium to another b. Data is analyzed by one generation of database technology to the next.
  • 17. Issues of Communication Problem of reliable metadata Metadata—data about data The blue lines are metadata
  • 18. Issues of Communication Problem of reliable metadata The standard name of certain kinds of data Searchable—easy to search over multiple database Issue—how detail does the name of data should be? Lack of details— the information of data is useless Too many details— longer time, more work
  • 19. Issues of Communication Dublin code The Dublin Core set ofmetadata elements provides a small and fundamental group of text elements through which most resources can be described and cataloged. The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements: Language Relation Coverage Rights Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source
  • 21. Ownership Control of knowledge Mid-nineteenth century: only professionally trained scientists and doctors New information economy: from many people Example: patients group
  • 22. Ownership Privacy Keep data private is difficult : Example: data is complied by third-company to generate a new, marketable form of knowledge New Patterns of ownership Science has frequently been analyzed as a “public good” Increasing privatization of knowledge : It is unclear to what extent the vaunted openness of the scientific community will last
  • 23.
  • 24.
  • 25. Data Collection Deal with old data Difficulties Scientific paper don’t in general offer enough information to allow an experiment or procedure to be repeated. The distributed database is becoming a new model form of scientific publication in its own right Issues of Update No automatic update from one field to a cognate one Scientist are not able to share information across discipline divides
  • 26. Data Collection International Technoscience Purpose: Narrow the gaps between countries Issues: People do not have equal knowledge Access is never really equal Government have doubts of the usefulness of opening the database onto internet.
  • 28. Collaborative Work Management structures in universities and industry still tend to support the heroic myth of the individual researcher. What kind of value the large publishing houses add to journal production. Great attention must be paid to the social and organizational setting of technoscientific work
  • 29. New Knowledge Economy Three central issues The development of flexible, stable data standard The generation of protocols for data sharing The restructuring of scientific careers
  • 31. Development of Classification Introduction: PANDORA taxonomic database
  • 32. Development of Classification Importance of classification 18th-19th centuries : botanist must know all genera, and commit their names to memory, but cannot be expected to remember all specific names. ( A.J. Cain, 1958) Later part of 19th century: new information technologies developed which permitted the easy storage and coding of larger amounts of data than could previously be easily manipulated. (Chandler,1977),(Yates,1989)
  • 33. Development of Classification Example of classification Paper-based archival practice. Issues: hard to reclassified Type specimen had to be relocated physically So do Series of articles or books
  • 34. Development of Classification Example of classification Multifaceted classification system Improve: Enabling the classifications to be ordered in multiple ways, rather than in a single Example: A collection of books might be classified using an author facet, a subject facet, a date facet
  • 35. Development of Classification Example of classification Hierarchical classification (for reading the past) E.F. Codd In the early 1970s Split physical storage of data in the computer and the representation of that data. Disadvantage: becomes awkward to introduce other levels of taxonomic category as an afterthought. Improve method: one record for every name, regardless of its taxonomic level
  • 36. Cladistics Definition It is a method of classifying species of organisms into groups called clades, which consist of 1) all the descendants of an ancestral organism and 2) the ancestor itself. Features : Give a more regular algorithm for determining phylogeny Focusing attention on shared, derived characteristics of set organisms Using ‘outgroup’ comparisons to develop the classification system
  • 37. Cladistics Tree of life Cladists use cladograms, diagrams which show ancestral relations between taxa, to represent the evolutionary tree of life Charles Darwin (1809–1882) was the first to produce an evolutionary tree of life
  • 39. Cladistics Computer programs in cladistics Undertaken using Swofford’s (1985) package PAUP version 2.4installed on a Cyber mainframe computer and version 2.4.1 on an amstrad 1512 PC David Swofford’s PAUP is a software package for inference of evolutionary trees Purpose: follow a given algorithm for generating and testing cladograms
  • 41. Cladistics Computer programs in cladistics Issues: The packages produce variable results and cannot possibly look at all the possibilities, since there is NP-complete problem. Algorithm issues
  • 42. The Future Store the life Life is described as itself a program, with DNA being code. IF everything is information, then life can equally well be “stored”