SlideShare a Scribd company logo
1 of 18
SURVEY OF COMMONALITY WITH OTHER DISCIPLINES
WORKSHOP 2 – JULY 25, 2013
INDIANAPOLIS, INDIANA
MICAH ALTMAN
DIRECTOR OF RESEARCH, MIT LIBRARIES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
ESCIENCE@MIT.EDU
PRIMARY RESEARCH OR PRACTICE AREA(S)
• INFORMATION SCIENCE
• SOCIAL SCIENCE
PREVIOUS EXPERIENCE
• DIGITAL LIBRARIES
• DIGITAL PRESERVATION
• STATISTICAL COMPUTING
RELATED WORK
• PUBLICMAPPING.ORG
• INFORMATICS.MIT.EDU
CONTACT INFORMATION
E25-131, 77 MASSACHUSETTS AVE, MIT, CAMBRIDGE, MA, 02139
Prepared for
DASPOS Workshop
JCDL 2013
Characterizing Data and Software for
Social Science Research
Dr. Micah Altman
<escience@mit.edu>
Director of Research, MIT Libraries
Non-Resident Senior Fellow, Brookings Institution
DISCLAIMER
These opinions are my own, they are not the opinions
of MIT, Brookings, any of the project funders, nor (with
the exception of co-authored previously published
work) my collaborators
Secondary disclaimer:
“It’s tough to make predictions, especially about the
future!”
-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill,
Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi,
Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle,
George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White,
etc.
Data and Software in Social Science Research
Collaborators & Co-Conspirators
• Jonathan Crabtree, Nancy McGovern
• National Digital Stewardship Coordination
Committee & Working Group Chairs
• Privacy Tools for Sharing Research Data
Team
(Salil Vadhan, P.I.)
http://privacytools.seas.harvard.edu/peopl
e
• Research Support
– Supported in part by NSF grant CNS-1237235
– Thanks to the Library of Congress, & the
Massachusetts Institute of Technology.Data and Software in Social Science Research
Related Work
• CoData Task Group on Data Citations, 2013 (Forthcoming) Out of Cite, Out of
Mind: The Current State of Practice, Policy, and Technology for the Citation of Data, Co-
Data Journal (Special Volume).
• Altman & Jackman, 2012, 19 Ways of Looking at Statistical Software, Journal of
Statistical Software
• National Digital Stewardship Alliance, 2013, 2014 National Agenda for Digital
Stewardship.
• Novak, K., Altman, M., Broch, E., Carroll, J. M., Clemins, P. J., Fournier, D.,
Laevart, C., et al. 201.. Communicating Science and Engineering Data in the
Information Age. Computer Science and Telecommunications. National
Academies Press
• Altman, M., Rogerson, K., & U, D. (2008). Open Research Questions on
Information and Technology in Global and Domestic Politics – Beyond “E-.i,
41(4), 1-8. Retrieved from
http://www.journals.cambridge.org/abstract_S104909650824093X
• Altman, Gill & McDonald. 2003. Numerical Issues in Statistical Computing for
the Social Scientist
Most reprints available from:Data and Software in Social Science Research
This Talk
• Landscape
(dimensions & attributes)
• Landmarks
(sample use cases)
Data and Software in Social Science Research
Landscape:
Characteristics of Social
Science Research Data
Data and Software in Social Science Research
Some Characteristics of Research Data
Data and Software in Social Science Research
Attribute Type Examples
Data: Structure - Single relation (table)
- Fully relational
- Network
- Geospatial
- Semi-structured (e.g. text)
Data: Attribute Types - Continuous/Discrete
- Scale: ratio/interval/ordinal/nominal
Data: Performance Characteristics - Number of observations
- Frequency of updates
- Dimensionality
- Sparsity
- Collection heterogeneity
Some Characteristics of Research Measurements
Data and Software in Social Science Research
Attribute Type Examples
Measurement: Unit of Observation - Individuals
- Groups
- Institutions
- Organizations
- Interactions
Measurement: Measurement type - Experimental
- Observational
- Synthetic/computational
Measurement: Performance characteristic - Metadata
- Ontology
- Quality
Some Characteristics of Research Data Use
Data and Software in Social Science Research
Attribute Type Examples
Analysis methods - Counting
- GLM model family
- MLE model family
- (Constrained) continuous nonlinear
optimization
- Blind global optimization
- Discrete optimization
- Bayesian Methods (MCMC)
- Heuristically/algorithmically defined
- Text mining
- Clustering
- Coding and qualitative analysis
- Exploratory Data Analysis
Desired Outputs - Summary scalars
- Summary table
- Data subset
- Static data publication
- Static visualization
- Dynamic Visualization
Data and Software in Social Science Research
Some Characteristics of Use Constraints
Contract Intellectual Property
Access
Rights Confidentiality
Copyright
Fair Use
DMCA
Database Rights
Moral Rights
Intellectual
Attribution
Trade Secret
Patent
Trademark
Common Rule
45 CFR 26
HIPAA
FERPA
EU Privacy Directive
Privacy
Torts
(Invasion,
Defamation)
Rights of
Publicity
Sensitive but
Unclassified
Potentially
Harmful
(Archeological
Sites,
Endangered
Species,
Animal Testing,
…)
Classified
FOIA
CIPSEA
State
Privacy Laws
EAR
State FOI
Laws
Journal
Replication
Requirements
Funder Open
Access
Contract
License
Click-Wrap
TOU
Export
Restrictions
NDA
Landmarks
(Exemplar Use Cases)
Data and Software in Social Science Research
Exemplar: Policy Analysis
Data and Software in Social Science Research
Attribute Type Examples
Data: Structure - Single relation (table)
Data: Attribute Types - Continuous/Discrete
- Scale: ratio/interval/ordinal
Data: Performance
Characteristics
- 10K-100K observation
- Monthly/annual updates
- Dozens of dimensions/measures
Measurement: Unit of
Observation
- Individuals; Organization; Institutions
Measurement: Measurement
type
- Observational
- Repeated cross-sectional/longitudinal
over decades
Measurement: Performance
characteristic
- High quality measurements
- Systematic and complete metadata
- Controlled ontology
- Regular updates & long-term access
Management Constraints - Confidentiality; Public Access
Analysis methods - Counting (contingency tables); GLM
Family
Desired Outputs - Summary scalars
- Summary table
- Static visualization (map)
More Information
• Science and Engineering Indicators:
http://www.nsf.gov/statistics/seind12/
• Details of NCSES use case:
Novak et al. 2011
• Policy data producer perspectives:
Journal of Official Statistics
Exemplar: Media Anthropology Dissertation
Data and Software in Social Science Research
Attribute Type Examples
Data: Structure - audio video
- GIS coverage/ GPS trails
- Semi structured field notes
- Coded qualitative and
quantitative data
Data: Attribute Types - Discrete
- Scale: ordinal/nominal
Data: Performance Characteristics - 100’s of observed units
- Longitudinal
- Dozens of
dimensions/measures
- Static after publication
Measurement: Unit of Observation - Individuals; Organizations;
Physical environment
Measurement: Measurement type - Observational; Interaction
Measurement: Performance
characteristic
- High quality measurements
- Systematic and complete
metadata
- Emergent coding/ontology
Management Constraints - Confidentiality; social norms
Analysis methods - Counting; Discourse; CAQDA
(Qualitative)
- (Future) AI/Machine learning
Desired Outputs - Book
- 1-2 hour video / interactive
media synthesis
More Information
• Harvard media anthropology Ph.D. Program:
sel.fas.harvard.edu/phd.html
Image Sources: Wikimedia Commons. Pixabay.com, Flickr
Exemplar: Social Message Analysis
Data and Software in Social Science Research
Attribute Type Examples
Data: Structure - network
Data: Attribute Types - Continuous/Discrete/
- Scale: ratio/interval/ordinal/nominal
Data: Performance
Characteristics
- 10M-1B observations
- Sample from stream of continuously
updated corpus
- Dozens of dimensions/measures
Measurement: Unit of
Observation
- Individuals; Interactions
Measurement: Measurement
type
- Observational
Measurement: Performance
characteristic
- High volume
- Complex network structure
- Sparsity
- Systematic and sparse metadata
Management Constraints - License; Replication
Analysis methods - Bespoke algorithms (clustering);
nonlinear optimization; Bayesian
methods
Desired Outputs - Summary scalars (model coefficients)
- Summary table
- Static /interactive visualization
More Information
• Grimmer, Justin, and Gary King. "General purpose computer-
assisted clustering and conceptualization." Proceedings of the
National Academy of Sciences 108.7 (2011): 2643-2650.
• King, Gary, Jennifer Pan, and Molly Roberts. "How censorship in
China allows government criticism but silences collective
expression." APSA 2012 Annual Meeting Paper. 2012.
• Lazer, David, et al. "Life in the network: the coming age of
computational social science." Science (New York, NY) 323.5915
(2009): 721.
Trends: More
More Types of Evidence More CollaborationMore Data
More Publications, More Filters
More Learners
More Open
Data and Software in Social Science Research
More Replication
Some Challenges for Long-Term
Replication/Access
• “messy” human sensors
• Mix of data types, structures, sparsity
• Complex constraints: confidentiality, licensing,
NDA’s
• Manual/Computer-assisted coding
• Niche commercial software (and private bespoke
software) integral to analysis
• Very long term longitudinal data/accessibility
requirements
Data and Software in Social Science Research
Questions?
E-mail: escience@mit.edu
Web: micahaltman.com
Twitter: @drmaltman
Data and Software in Social Science
Research

More Related Content

What's hot

Social Network Analysis (Part 1)
Social Network Analysis (Part 1)Social Network Analysis (Part 1)
Social Network Analysis (Part 1)Vala Ali Rohani
 
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBOA COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBOijaia
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeHarish Vaidyanathan
 
2011 06-14 cristhian-parra_u_count
2011 06-14 cristhian-parra_u_count2011 06-14 cristhian-parra_u_count
2011 06-14 cristhian-parra_u_countCristhian Parra
 
Doing An Internet Study
Doing An Internet StudyDoing An Internet Study
Doing An Internet StudyHan Woo PARK
 
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...Micah Altman
 
Redistricting and Voting Technology
Redistricting and Voting TechnologyRedistricting and Voting Technology
Redistricting and Voting TechnologyMicah Altman
 
e-Research: A Social Informatics Perspective
e-Research: A Social Informatics Perspectivee-Research: A Social Informatics Perspective
e-Research: A Social Informatics PerspectiveEric Meyer
 
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...Micah Altman
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in WikidataElena Simperl
 
세계산학관협력총회 Watef 패널을 공지합니다
세계산학관협력총회 Watef 패널을 공지합니다세계산학관협력총회 Watef 패널을 공지합니다
세계산학관협력총회 Watef 패널을 공지합니다Han Woo PARK
 
Relationship Web: Trailblazing, Analytics and Computing for Human Experience
Relationship Web: Trailblazing, Analytics and Computing for Human ExperienceRelationship Web: Trailblazing, Analytics and Computing for Human Experience
Relationship Web: Trailblazing, Analytics and Computing for Human ExperienceAmit Sheth
 
[old] Presentation Of My Projekt Aalborg Fall 2007
[old] Presentation Of My Projekt Aalborg Fall 2007[old] Presentation Of My Projekt Aalborg Fall 2007
[old] Presentation Of My Projekt Aalborg Fall 2007Charles Seger
 
Intro to Big Data Landscape: Creating Real World Solutions for industry/Homel...
Intro to Big Data Landscape: Creating Real World Solutions for industry/Homel...Intro to Big Data Landscape: Creating Real World Solutions for industry/Homel...
Intro to Big Data Landscape: Creating Real World Solutions for industry/Homel...Career Communications Group
 

What's hot (20)

Social Network Analysis (Part 1)
Social Network Analysis (Part 1)Social Network Analysis (Part 1)
Social Network Analysis (Part 1)
 
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBOA COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
 
Automating Homelessness
Automating HomelessnessAutomating Homelessness
Automating Homelessness
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of Life
 
2011 06-14 cristhian-parra_u_count
2011 06-14 cristhian-parra_u_count2011 06-14 cristhian-parra_u_count
2011 06-14 cristhian-parra_u_count
 
Doing An Internet Study
Doing An Internet StudyDoing An Internet Study
Doing An Internet Study
 
Critically Assembling Data, Processes & Things: Toward and Open Smart City
Critically Assembling Data, Processes & Things: Toward and Open Smart CityCritically Assembling Data, Processes & Things: Toward and Open Smart City
Critically Assembling Data, Processes & Things: Toward and Open Smart City
 
Q046049397
Q046049397Q046049397
Q046049397
 
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
 
Redistricting and Voting Technology
Redistricting and Voting TechnologyRedistricting and Voting Technology
Redistricting and Voting Technology
 
Data stories
Data storiesData stories
Data stories
 
e-Research: A Social Informatics Perspective
e-Research: A Social Informatics Perspectivee-Research: A Social Informatics Perspective
e-Research: A Social Informatics Perspective
 
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in Wikidata
 
nm
nmnm
nm
 
세계산학관협력총회 Watef 패널을 공지합니다
세계산학관협력총회 Watef 패널을 공지합니다세계산학관협력총회 Watef 패널을 공지합니다
세계산학관협력총회 Watef 패널을 공지합니다
 
Relationship Web: Trailblazing, Analytics and Computing for Human Experience
Relationship Web: Trailblazing, Analytics and Computing for Human ExperienceRelationship Web: Trailblazing, Analytics and Computing for Human Experience
Relationship Web: Trailblazing, Analytics and Computing for Human Experience
 
[old] Presentation Of My Projekt Aalborg Fall 2007
[old] Presentation Of My Projekt Aalborg Fall 2007[old] Presentation Of My Projekt Aalborg Fall 2007
[old] Presentation Of My Projekt Aalborg Fall 2007
 
Intro to Big Data Landscape: Creating Real World Solutions for industry/Homel...
Intro to Big Data Landscape: Creating Real World Solutions for industry/Homel...Intro to Big Data Landscape: Creating Real World Solutions for industry/Homel...
Intro to Big Data Landscape: Creating Real World Solutions for industry/Homel...
 

Similar to Characterizing Data and Software for Social Science Research

AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysCliff Lampe
 
Privacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesPrivacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesMicah Altman
 
A Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information PrivacyA Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information PrivacyMicah Altman
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data SciencePhilip Bourne
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveMicah Altman
 
01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...teodroscampaus
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data CitationMicah Altman
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningAbcdDcba12
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesMicah Altman
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingMatthew Lease
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slidestafosepsdfasg
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open DataBlerina Spahiu
 
Distributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsDistributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsLiming Zhu
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in EducationPhilip Piety
 

Similar to Characterizing Data and Software for Social Science Research (20)

AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveys
 
Privacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesPrivacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use Cases
 
A Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information PrivacyA Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information Privacy
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics Perspective
 
01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data Citation
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and Approaches
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
 
unit 1 DATA MINING.ppt
unit 1 DATA MINING.pptunit 1 DATA MINING.ppt
unit 1 DATA MINING.ppt
 
SMART Seminar Series: "From Big Data to Smart data"
SMART Seminar Series: "From Big Data to Smart data"SMART Seminar Series: "From Big Data to Smart data"
SMART Seminar Series: "From Big Data to Smart data"
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open Data
 
Distributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsDistributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based Systems
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
DBMS
DBMSDBMS
DBMS
 

More from Micah Altman

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesMicah Altman
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset ConversationMicah Altman
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset ConversationMicah Altman
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer ReviewMicah Altman
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer ReviewMicah Altman
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An OverviewMicah Altman
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral DistrictingMicah Altman
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenaryMicah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 

More from Micah Altman (20)

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategies
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Characterizing Data and Software for Social Science Research

  • 1. SURVEY OF COMMONALITY WITH OTHER DISCIPLINES WORKSHOP 2 – JULY 25, 2013 INDIANAPOLIS, INDIANA MICAH ALTMAN DIRECTOR OF RESEARCH, MIT LIBRARIES MASSACHUSETTS INSTITUTE OF TECHNOLOGY ESCIENCE@MIT.EDU PRIMARY RESEARCH OR PRACTICE AREA(S) • INFORMATION SCIENCE • SOCIAL SCIENCE PREVIOUS EXPERIENCE • DIGITAL LIBRARIES • DIGITAL PRESERVATION • STATISTICAL COMPUTING RELATED WORK • PUBLICMAPPING.ORG • INFORMATICS.MIT.EDU CONTACT INFORMATION E25-131, 77 MASSACHUSETTS AVE, MIT, CAMBRIDGE, MA, 02139
  • 2. Prepared for DASPOS Workshop JCDL 2013 Characterizing Data and Software for Social Science Research Dr. Micah Altman <escience@mit.edu> Director of Research, MIT Libraries Non-Resident Senior Fellow, Brookings Institution
  • 3. DISCLAIMER These opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators Secondary disclaimer: “It’s tough to make predictions, especially about the future!” -- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc. Data and Software in Social Science Research
  • 4. Collaborators & Co-Conspirators • Jonathan Crabtree, Nancy McGovern • National Digital Stewardship Coordination Committee & Working Group Chairs • Privacy Tools for Sharing Research Data Team (Salil Vadhan, P.I.) http://privacytools.seas.harvard.edu/peopl e • Research Support – Supported in part by NSF grant CNS-1237235 – Thanks to the Library of Congress, & the Massachusetts Institute of Technology.Data and Software in Social Science Research
  • 5. Related Work • CoData Task Group on Data Citations, 2013 (Forthcoming) Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data, Co- Data Journal (Special Volume). • Altman & Jackman, 2012, 19 Ways of Looking at Statistical Software, Journal of Statistical Software • National Digital Stewardship Alliance, 2013, 2014 National Agenda for Digital Stewardship. • Novak, K., Altman, M., Broch, E., Carroll, J. M., Clemins, P. J., Fournier, D., Laevart, C., et al. 201.. Communicating Science and Engineering Data in the Information Age. Computer Science and Telecommunications. National Academies Press • Altman, M., Rogerson, K., & U, D. (2008). Open Research Questions on Information and Technology in Global and Domestic Politics – Beyond “E-.i, 41(4), 1-8. Retrieved from http://www.journals.cambridge.org/abstract_S104909650824093X • Altman, Gill & McDonald. 2003. Numerical Issues in Statistical Computing for the Social Scientist Most reprints available from:Data and Software in Social Science Research
  • 6. This Talk • Landscape (dimensions & attributes) • Landmarks (sample use cases) Data and Software in Social Science Research
  • 7. Landscape: Characteristics of Social Science Research Data Data and Software in Social Science Research
  • 8. Some Characteristics of Research Data Data and Software in Social Science Research Attribute Type Examples Data: Structure - Single relation (table) - Fully relational - Network - Geospatial - Semi-structured (e.g. text) Data: Attribute Types - Continuous/Discrete - Scale: ratio/interval/ordinal/nominal Data: Performance Characteristics - Number of observations - Frequency of updates - Dimensionality - Sparsity - Collection heterogeneity
  • 9. Some Characteristics of Research Measurements Data and Software in Social Science Research Attribute Type Examples Measurement: Unit of Observation - Individuals - Groups - Institutions - Organizations - Interactions Measurement: Measurement type - Experimental - Observational - Synthetic/computational Measurement: Performance characteristic - Metadata - Ontology - Quality
  • 10. Some Characteristics of Research Data Use Data and Software in Social Science Research Attribute Type Examples Analysis methods - Counting - GLM model family - MLE model family - (Constrained) continuous nonlinear optimization - Blind global optimization - Discrete optimization - Bayesian Methods (MCMC) - Heuristically/algorithmically defined - Text mining - Clustering - Coding and qualitative analysis - Exploratory Data Analysis Desired Outputs - Summary scalars - Summary table - Data subset - Static data publication - Static visualization - Dynamic Visualization
  • 11. Data and Software in Social Science Research Some Characteristics of Use Constraints Contract Intellectual Property Access Rights Confidentiality Copyright Fair Use DMCA Database Rights Moral Rights Intellectual Attribution Trade Secret Patent Trademark Common Rule 45 CFR 26 HIPAA FERPA EU Privacy Directive Privacy Torts (Invasion, Defamation) Rights of Publicity Sensitive but Unclassified Potentially Harmful (Archeological Sites, Endangered Species, Animal Testing, …) Classified FOIA CIPSEA State Privacy Laws EAR State FOI Laws Journal Replication Requirements Funder Open Access Contract License Click-Wrap TOU Export Restrictions NDA
  • 12. Landmarks (Exemplar Use Cases) Data and Software in Social Science Research
  • 13. Exemplar: Policy Analysis Data and Software in Social Science Research Attribute Type Examples Data: Structure - Single relation (table) Data: Attribute Types - Continuous/Discrete - Scale: ratio/interval/ordinal Data: Performance Characteristics - 10K-100K observation - Monthly/annual updates - Dozens of dimensions/measures Measurement: Unit of Observation - Individuals; Organization; Institutions Measurement: Measurement type - Observational - Repeated cross-sectional/longitudinal over decades Measurement: Performance characteristic - High quality measurements - Systematic and complete metadata - Controlled ontology - Regular updates & long-term access Management Constraints - Confidentiality; Public Access Analysis methods - Counting (contingency tables); GLM Family Desired Outputs - Summary scalars - Summary table - Static visualization (map) More Information • Science and Engineering Indicators: http://www.nsf.gov/statistics/seind12/ • Details of NCSES use case: Novak et al. 2011 • Policy data producer perspectives: Journal of Official Statistics
  • 14. Exemplar: Media Anthropology Dissertation Data and Software in Social Science Research Attribute Type Examples Data: Structure - audio video - GIS coverage/ GPS trails - Semi structured field notes - Coded qualitative and quantitative data Data: Attribute Types - Discrete - Scale: ordinal/nominal Data: Performance Characteristics - 100’s of observed units - Longitudinal - Dozens of dimensions/measures - Static after publication Measurement: Unit of Observation - Individuals; Organizations; Physical environment Measurement: Measurement type - Observational; Interaction Measurement: Performance characteristic - High quality measurements - Systematic and complete metadata - Emergent coding/ontology Management Constraints - Confidentiality; social norms Analysis methods - Counting; Discourse; CAQDA (Qualitative) - (Future) AI/Machine learning Desired Outputs - Book - 1-2 hour video / interactive media synthesis More Information • Harvard media anthropology Ph.D. Program: sel.fas.harvard.edu/phd.html Image Sources: Wikimedia Commons. Pixabay.com, Flickr
  • 15. Exemplar: Social Message Analysis Data and Software in Social Science Research Attribute Type Examples Data: Structure - network Data: Attribute Types - Continuous/Discrete/ - Scale: ratio/interval/ordinal/nominal Data: Performance Characteristics - 10M-1B observations - Sample from stream of continuously updated corpus - Dozens of dimensions/measures Measurement: Unit of Observation - Individuals; Interactions Measurement: Measurement type - Observational Measurement: Performance characteristic - High volume - Complex network structure - Sparsity - Systematic and sparse metadata Management Constraints - License; Replication Analysis methods - Bespoke algorithms (clustering); nonlinear optimization; Bayesian methods Desired Outputs - Summary scalars (model coefficients) - Summary table - Static /interactive visualization More Information • Grimmer, Justin, and Gary King. "General purpose computer- assisted clustering and conceptualization." Proceedings of the National Academy of Sciences 108.7 (2011): 2643-2650. • King, Gary, Jennifer Pan, and Molly Roberts. "How censorship in China allows government criticism but silences collective expression." APSA 2012 Annual Meeting Paper. 2012. • Lazer, David, et al. "Life in the network: the coming age of computational social science." Science (New York, NY) 323.5915 (2009): 721.
  • 16. Trends: More More Types of Evidence More CollaborationMore Data More Publications, More Filters More Learners More Open Data and Software in Social Science Research More Replication
  • 17. Some Challenges for Long-Term Replication/Access • “messy” human sensors • Mix of data types, structures, sparsity • Complex constraints: confidentiality, licensing, NDA’s • Manual/Computer-assisted coding • Niche commercial software (and private bespoke software) integral to analysis • Very long term longitudinal data/accessibility requirements Data and Software in Social Science Research
  • 18. Questions? E-mail: escience@mit.edu Web: micahaltman.com Twitter: @drmaltman Data and Software in Social Science Research

Editor's Notes

  1. This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.Any images included in derivative works must be individually attributed to their original sources, as indicated in notes
  2. The structure and design of digital storage systems is a cornerstone of digital preservation. To better understand ongoing storage practices of organizations committed to digital preservation, the National Digital Stewardship Alliance conducted a survey of member organizations. This talk discusses findings from this survey, common gaps, and trends in this area.(I also have a little fun highlighting the hidden assumptions underlying Amazon Glacier&apos;s reliability claims. For more on that see this earlier post: http://drmaltman.wordpress.com/2012/11/15/amazons-creeping-glacier-and-digital-preservation )
  3. Survey image source, licensed under CC-SA-NC : http://gaithersburgbookfestival.org/take-our-survey-and-win/Students images source: commons.wikimedia.orgOther images source: nsf.gov.
  4. File icon is licensed under CC0 on pixabay.com. http://pixabay.com/en/spreadsheet-excel-table-diagram-98491/Dissertation is licensed under CC-BY-SA by Victoria Catterson http://www.flickr.com/photos/cowlet/354911838/Other images available through commons.wikimedia.org
  5. Other image source: wikimedia commons
  6. LHC produces a PB every 2 weeks, Sloan Galaxy zoo has hundreds of thousands of “authors”, 50K people attend a class from the University of michigan, and to understand public opinion instead of surveying 100’s of people per month we can analyze 10ooo tweets per second.