SlideShare uma empresa Scribd logo
1 de 59
Data, Responsibly:
The Next Decade of Data Science
Bill Howe, PhD
Associate Professor, Information School
Associate Director, eScience Institute
Adjunct Associate Professor, Computer Science & Engineering
University of Washington
My goals this evening
• Describe emerging topics in data science research
and practice around a technical interpretation of ethics
• Describe some specific thrusts we are pursuing
• Encourage you to get involved
11/10/2016 Data, Responsibly / SciTech NW 2
How much time do you spend “handling
data” as opposed to “doing science”?
Mode answer: “90%”
11/10/2016 Bill Howe, UW 3
1) Upload data “as is”
Cloud-hosted; no need to
install or design a database;
no pre-defined schema
2) Analyze data with SQL
Right in your browser,
writing queries on top of
queries on top of queries ...
SELECT hit, COUNT(*)
FROM tigrfam_surface
GROUP BY hit
ORDER BY cnt DESC
3) Share the results
Click on the science question,
see the SQL that answers it
5
SPARQL
(GEMS)Serial C++
PGAS/HP
CMyriaX RDBMS
SQLDatalogMyriaL
Compiler Compiler Compiler Compiler Compiler
Hadoop
(via layers)
Compiler
multiple
languages
multiple big
data systems
multiple
GUIs/apps
Making it easier to do data science
• SQLShare: Easier to use a database
• Myria: Easier to use a bunch of different
systems at once, at scale
• Worked great in the physical sciences
• But some collaborators weren’t that excited…
November 10, 2016 6
11/10/2016 Bill Howe, UW 7
Data Science Kickoff Session:
137 posters from 30+ departments and units
Data, data, data
8
Kevin Merrit
CEO
Socrata
Deep Dhillon
CTO
Socrata
9
• Pursue transformative interdisciplinary urban research
• Facilitate translation from UW to .gov stakeholders
• Position Seattle/UW as a leader in applied urban research
• 80+ faculty from 20+ departments around campus
10
Assessing Community Well-Being
Third-Place Technologies
Optimization of King County Metro Paratransit
Computer Science & Engineering
Predictors of Permanent Housing for Homeless Families
Bill and Melinda Gates Foundation
Open Sidewalk Graph for Accessible Trip Planning
Computer Science & Engineering
Inaugural 2015 program:
16 spots
140 applicants
…from 20+ departments
11
Mining Online Data to Detect Unsafe Food Products
Elaine Nsoesie, Institute for Health Metrics and Evaluation
ORCA data for improved transit system planning and operation
Washington State Transportation Center (TRAC)
Global Open Sidewalks: Creating a shared open data layer
Taskar Center for Accessible Technology
CrowdSensing Census: A tool for estimating poverty
Bell Labs, Nokia
2016 program:
16 spots
190 applicants
New in 2016: An explicit emphasis on data ethics
11/10/2016 Bill Howe, UW 12
July 2016
“Data, Responsibly”
Dagstuhl Workshop
Gerhard
Weikum
Serge
Abiteboul
Julia
Stoyanovich
Gerome
Miklau
14
Cathy O’Neil
September 2016
Three properties of a WMD:
Opacity
Scale
Damage
First decade of Data Science research and practice:
What can we do with massive, noisy, heterogeneous datasets?
Next decade of Data Science research and practice:
What should we do with massive, noisy, heterogeneous datasets?
The way I think about this…..(1)
The way I think about this…. (2)
Decisions are based on two sources of information:
1. Past examples
e.g., “prior arrests tend to increase likelihood of future arrests”
2. Societal constraints
e.g., “we must avoid racial discrimination”
11/10/2016 Data, Responsibly / SciTech NW 16
We’ve become very good at automating the use of past examples
We’ve only just started to think about incorporating societal constraints
The way I think about this… (3)
How do we apply societal constraints to algorithmic
decision-making?
Option 1: Keep a human in the loop
Ex: EU General Data Protection Regulation requires that a
human be involved in legally binding algorithmic decision-making
Ex: Wisconsin Supreme Court says a human must review
algorithmic decisions made by recidivism models
Option 2: Build them into the algorithms themselves
I’ll talk about some approaches for this
11/10/2016 Data, Responsibly / SciTech NW 17
The way I think about this…(4)
On transparency vs. accountability:
• For human decision-making, sometimes explanations are
required, improving transparency
– Supreme court decisions
– Employee reprimands/termination
• But when transparency is difficult, accountability takes over
– medical emergencies, business decisions
• As we shift decisions to algorithms, we lose both
transparency AND accountability
• “The buck stops where?”
11/10/2016 Data, Responsibly / SciTech NW 18
Some Facets of “Data, Responsibly”
• Privacy
• Fairness
• Transparency
• Reproducibility
• Ethics
11/10/2016 Data, Responsibly / SciTech NW 19
I won’t be talking about this
I’ll give a taste of the work here
I won’t be talking about this
Towards automatic scientific claim-checking
Vignette on teaching data ethics
FAIRNESS
11/10/2016 Data, Responsibly / SciTech NW 20
21
Ex: Staples online pricing
Reasoning: Offer deals to people that live near competitors’ stores
Effect: lower prices offered to buyers who live in more affluent
neighborhoods
22
[Latanya Sweeney; CACM 2013]
Racially identifying names trigger
ads suggestive of an arrest record
slide adapted from Stoyanovich, Miklau
23
Propublica, May 2016
24
The Special Committee on Criminal Justice Reform's
hearing of reducing the pre-trial jail population.
Technical.ly, September 2016
Philadelphia is grappling with the prospect of a racist computer algorithm
Any background signal in the
data of institutional racism is
amplified by the algorithm
operationalized by the algorithm
legitimized by the algorithm
“Should I be afraid of risk assessment tools?”
“No, you gotta tell me a lot more about yourself.
At what age were you first arrested?
What is the date of your most recent crime?”
“And what’s the culture of policing in the
neighborhood in which I grew up in?”
26
Towards a precise characterization of fairness…
Positive Outcomes Negative Outcomes
offered employment denied employment
accepted to school rejected from school
offered a loan denied a loan
offered a discount not offered a discount
Label outcomes to individuals as positive or negative
Fairness is concerned with how outcomes are
assigned to a population
slide adapted from Stoyanovich, Miklau
27
Statistical parity
race
black
white
⊕ ⊖
⊖
⊕
⊕⊖
⊖
⊖
⊕
40% of the whole population
positive
outcomes
⊖
Statistical parity
demographics of the individuals receiving any outcome are the same
as demographics of the underlying population
20%
of black
60%
of white
slide adapted from Stoyanovich, Miklau
28
First attempt: Ignore sensitive information
zip code
10025 10027
race
black
white
20%
of black
60%
of white
⊕
⊖
⊖⊖
⊕
⊕ ⊖
⊖
⊖
⊕
positive
outcomes
Removing race from the vendor’s assignment process
does not prevent discrimination
Assessing disparate impact
Discrimination is assessed by the effect on the protected sub-
population, not by the input or by the process that lead to the effect.
slide adapted from Stoyanovich, Miklau
29
More directly: Impose statistical parity
credit score
good bad
black
white
⊕
⊖
⊖
⊖
⊕
⊕ ⊖
⊖
⊖
⊕
positive
outcomes
40%
of black
40%
of white
race positive outcome: offered a loan
Tradeoff between (perceived) accuracy and fairness;
may be contrary to the goals of the vendor
slide adapted from Stoyanovich, Miklau
30
A systems approach:
FairTest: fairness test suite for data analysis apps
• Tests for unintentional discrimination according to several
representative discrimination measures.
• Automates search for context-specific associations between
protected variables and application outputs
• Report findings, ranked by association strength and
affected population size
[F. Tramèr et al., arXiv:1510.02377 (2015)]
As a corporation, should I care?
Compliance
Jacobson, Scientific American, 2013
Customer
Retention
Employee
Retention
Eichler, Hiffington Post, 2012
CNET, May 2016
REPRODUCIBILITY
11/10/2016 Bill Howe, UW 32
Science is a complete mess
• Reproducibility
– Begley & Ellis, Nature 2012: 6 out of 53 cancer studies reproducible
– Only about half of psychology 100 studies had effect sizes that approximated
the original result (Science, 2015)
– Ioannidis 2005: Why most public research findings are false
– Reinhart & Rogoff: global economic policy based on spreadsheet fuck ups
11/10/2016 Bill Howe, UW 33
Science, 2015
11/10/2016 Data, Responsibly @ Dagstuhl 35
Retractions are increasing…..
Why is this happening? (1)
11/10/2016 Bill Howe, UW 37
Why is this happening? (2)
11/10/2016 Bill Howe, UW 38
Why is this happening? (2)
Publication Bias!
“DEEP CURATION”
TOWARDS AUTOMATIC SCIENTIFIC CLAIM CHECKING
Vision: Validate scientific claims automatically
– Check for manipulation (manipulated images, Benford’s Law)
– Extract claims from papers
– Check claims against the authors’ data
– Check claims against related data sets
– Automatic meta-analysis across the literature + public datasets
• First steps
– Automatic curation: Validate and attach metadata to public datasets
– Longitudinal analysis of the visual literature
11/10/2016 Data, Responsibly / SciTech NW 41
Microarray experiments
11/10/2016 Bill Howe, UW 43
Microarray samples submitted to the Gene Expression Omnibus
Curation is fast becoming the
bottleneck to data sharing
Maxim
Gretchkin
Hoifung
Poon
Maxim
Gretchkin
Hoifung
Poon
No growth in number of
datasets used per paper!
Maxim
Gretchkin
Hoifung
Poon
Majority of samples are
one-time-use only!
color = labels supplied
as metadata
clusters = 1st two PCA
dimensions on the
gene expression data
itself
Can we use curate algorithmically?
Maxim
Gretchkin
Hoifung
Poon
The expression data
and the text labels
appear to disagree
Maxim
Gretchkin
Hoifung
Poon
Better Tissue
Type Labels
Domain knowledge
(Ontology)
Expression data
Free-text Metadata
2 Deep Networks
text
expr
SVM
Deep Curation Maxim
Gretchkin
Hoifung
Poon
Distant supervision and co-learning between text-
based classified and expression-based classifier: Both
models improve by training on each others’ results.
Free-text classifier
Expression classifier
Deep Curation:
Our stuff wins, with no training data
Maxim
Gretchkin
Hoifung
Poon
state of the art
our reimplementation
of the state of the art
our dueling
pianos NN
amount of training data used
VIGNETTE ON TEACHING
DATA ETHICS
11/10/2016 Bill Howe, UW 51
Alcohol Study, Barrow Alaska, 1979
Native leaders and city officials,
worried about drinking and associated
violence in their community invited a
group of sociology researchers to
assess the problem and work with
them to devise solutions.
Methods
• 10% representative sample
(N=88) of everyone over the age
of 15 using a 1972 demographic
survey
• Interviewed on attitudes and
values about use of alcohol
• Obtained psychological histories
including drinking behavior
• Given the Michigan Alcoholism
Screening Test (Seltzer, 1971)
• Asked to draw a picture of a
person
– Used to determine cultural identity
Results announced unilaterally and publicly
At the conclusion of the study researchers formulated a report entitled “The
Inupiat, Economics and Alcohol on the Alaskan North Slope” which was released
simultaneously at a press release and to the Barrow community. The press
release was picked up by the New York Times, who ran a front page story
entitled Alcohol Plagues Eskimos
The results of the Barrow Alcohol Study in Alaska were revealed in the context of a
press conference that was held far from the Native village, and without the
presence, much less the knowledge or consent, of any community member who
might have been able to present any context concerning the socioeconomic
conditions of the village. Study results suggested that nearly all adults in the
community were alcoholics. In addition to the shame felt by community members,
the town’s Standard and Poor bond rating suffered as a result, which in turn
decreased the tribe’s ability to secure funding for much needed projects.
Backlash
Methodological Problems
“The authors once again met with the Barrow Technical
Advisory Group, who stated their concern that only Natives
were studied, and that outsiders in town had not been
included.”
“The estimates of the frequency of intoxication based on
association with the probability of being detained were
termed "ludicrous, both logically and statistically.””
Edward F. Foulks, M.D., Misalliances In The Barrow Alcohol Study
Ethical Problems
• Participants were not in control of their data nor
the context in which they were presented.
• Easy to demonstrate specific, significant harms:
– Social: Stigmatization
– Financial: Bond rating lowered
• Important: Nothing to do with individual privacy
– No PII revealed at any point, to anyone
– No violations of best practices in data handling
– But even those who did not participate in the study
incurred harm
Two Topics
• Social Component: Codes of Conduct
• Technical Component: Managing Sensitive
Data
Ethical principles vs. ethical rules
• In the Barrow example, ethical rules
were generally followed
• But ethical principles were violated: The
researchers appear to have placed their
own interests ahead of those of the
research subjects, the client, and society
Principles: Codes of Conduct
• American Statistical Association
– http://www.amstat.org/committees/ethics/
• Certified Analytics Professional
– https://www.certifiedanalytics.org/ethics.php
• Data Science Association
– http://www.datascienceassn.org/code-of-
conduct.html
Recap
• There’s a sea change underway in how we will teach
and practice data science
• No longer only about what can be done, but about
what should be done
• This is not just a policy/behavior/culture issue – there
are technical problems to solve
• If you’re not thinking about this stuff, you will be facing
retention issues and compliance issues very soon
– Witness privacy, which is a few years ahead

Mais conteúdo relacionado

Mais procurados

From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 
A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationUniversity of South Africa (Unisa)
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?Philip Bourne
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Research Metadata Mechanics - Simon Porter
Research Metadata Mechanics - Simon PorterResearch Metadata Mechanics - Simon Porter
Research Metadata Mechanics - Simon PorterCASRAI
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebStefan Dietze
 
Reproducibility from an infomatics perspective
Reproducibility from an infomatics perspectiveReproducibility from an infomatics perspective
Reproducibility from an infomatics perspectiveMicah Altman
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)James Hendler
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeLiz Lyon
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data SciencePhilip Bourne
 

Mais procurados (20)

From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) Education
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Research Metadata Mechanics - Simon Porter
Research Metadata Mechanics - Simon PorterResearch Metadata Mechanics - Simon Porter
Research Metadata Mechanics - Simon Porter
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the Web
 
Reproducibility from an infomatics perspective
Reproducibility from an infomatics perspectiveReproducibility from an infomatics perspective
Reproducibility from an infomatics perspective
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
 
Hands-on Introduction to Machine Learning
Hands-on Introduction to Machine LearningHands-on Introduction to Machine Learning
Hands-on Introduction to Machine Learning
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
 

Destaque

10 JUN 10 Tenant Executive Council
10 JUN 10 Tenant Executive Council10 JUN 10 Tenant Executive Council
10 JUN 10 Tenant Executive CouncilWashington, DC
 
BIM, Collaboration and Green Building
BIM, Collaboration and Green BuildingBIM, Collaboration and Green Building
BIM, Collaboration and Green Buildinguwcomm
 
Practical Ethnography: doing ethnography in the private sector
Practical Ethnography: doing ethnography in the private sectorPractical Ethnography: doing ethnography in the private sector
Practical Ethnography: doing ethnography in the private sectorSam Ladner
 
Abi's Washington, D.C. Presentation
Abi's Washington, D.C. PresentationAbi's Washington, D.C. Presentation
Abi's Washington, D.C. PresentationLyn Hilt
 
Washington Powerpoint
Washington PowerpointWashington Powerpoint
Washington Powerpointablack
 

Destaque (6)

10 JUN 10 Tenant Executive Council
10 JUN 10 Tenant Executive Council10 JUN 10 Tenant Executive Council
10 JUN 10 Tenant Executive Council
 
BIM, Collaboration and Green Building
BIM, Collaboration and Green BuildingBIM, Collaboration and Green Building
BIM, Collaboration and Green Building
 
Practical Ethnography: doing ethnography in the private sector
Practical Ethnography: doing ethnography in the private sectorPractical Ethnography: doing ethnography in the private sector
Practical Ethnography: doing ethnography in the private sector
 
Abi's Washington, D.C. Presentation
Abi's Washington, D.C. PresentationAbi's Washington, D.C. Presentation
Abi's Washington, D.C. Presentation
 
Washington Powerpoint
Washington PowerpointWashington Powerpoint
Washington Powerpoint
 
Washington d.c
Washington d.cWashington d.c
Washington d.c
 

Semelhante a Data, Responsibly: The Next Decade of Data Science

Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceUniversity of Washington
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureUniversity of Washington
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptxRahulTr22
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
 
Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014MedicReS
 
Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Richard Bookman
 
Questions for knowledge creators
Questions for knowledge creatorsQuestions for knowledge creators
Questions for knowledge creatorsRichard Bookman
 
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Academia Sinica
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptxshalini s
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangePhilip Bourne
 
Blockchain: Information Tracking - Manion AFCEA/GMU C4i
Blockchain: Information Tracking - Manion AFCEA/GMU C4iBlockchain: Information Tracking - Manion AFCEA/GMU C4i
Blockchain: Information Tracking - Manion AFCEA/GMU C4iSean Manion PhD
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxssuser1a4f0f
 
Methodology 2.pptx
Methodology 2.pptxMethodology 2.pptx
Methodology 2.pptxMarcCollazo1
 
How NOT to Aggregrate Polling Data
How NOT to Aggregrate Polling DataHow NOT to Aggregrate Polling Data
How NOT to Aggregrate Polling DataDataCards
 
AAPOR 2012 Langer Probability
AAPOR 2012 Langer ProbabilityAAPOR 2012 Langer Probability
AAPOR 2012 Langer ProbabilityLangerResearch
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7Scott Edmunds
 
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.Josh Cowls
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfvishal choudhary
 

Semelhante a Data, Responsibly: The Next Decade of Data Science (20)

Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data science
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State Legislature
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014
 
Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014
 
Questions for knowledge creators
Questions for knowledge creatorsQuestions for knowledge creators
Questions for knowledge creators
 
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Blockchain: Information Tracking - Manion AFCEA/GMU C4i
Blockchain: Information Tracking - Manion AFCEA/GMU C4iBlockchain: Information Tracking - Manion AFCEA/GMU C4i
Blockchain: Information Tracking - Manion AFCEA/GMU C4i
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Methodology 2.pptx
Methodology 2.pptxMethodology 2.pptx
Methodology 2.pptx
 
How NOT to Aggregrate Polling Data
How NOT to Aggregrate Polling DataHow NOT to Aggregrate Polling Data
How NOT to Aggregrate Polling Data
 
AAPOR 2012 Langer Probability
AAPOR 2012 Langer ProbabilityAAPOR 2012 Langer Probability
AAPOR 2012 Langer Probability
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7
 
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 

Mais de University of Washington

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)University of Washington
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsUniversity of Washington
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsUniversity of Washington
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe University of Washington
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaUniversity of Washington
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsUniversity of Washington
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013University of Washington
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareUniversity of Washington
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchUniversity of Washington
 
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersHaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersUniversity of Washington
 
Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce University of Washington
 
Visual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceVisual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceUniversity of Washington
 
A New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScienceA New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScienceUniversity of Washington
 
Research Dataspaces: Pay-as-you-go Integration and Analysis
Research Dataspaces: Pay-as-you-go Integration and AnalysisResearch Dataspaces: Pay-as-you-go Integration and Analysis
Research Dataspaces: Pay-as-you-go Integration and AnalysisUniversity of Washington
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceUniversity of Washington
 

Mais de University of Washington (19)

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD Models
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) Scientists
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
eResearch New Zealand Keynote
eResearch New Zealand KeynoteeResearch New Zealand Keynote
eResearch New Zealand Keynote
 
Data science curricula at UW
Data science curricula at UWData science curricula at UW
Data science curricula at UW
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShare
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 
End-to-End eScience
End-to-End eScienceEnd-to-End eScience
End-to-End eScience
 
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersHaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
 
Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce
 
Visual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceVisual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory Science
 
A New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScienceA New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScience
 
Data-Intensive Scalable Science
Data-Intensive Scalable ScienceData-Intensive Scalable Science
Data-Intensive Scalable Science
 
Research Dataspaces: Pay-as-you-go Integration and Analysis
Research Dataspaces: Pay-as-you-go Integration and AnalysisResearch Dataspaces: Pay-as-you-go Integration and Analysis
Research Dataspaces: Pay-as-you-go Integration and Analysis
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
 

Último

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Último (20)

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Data, Responsibly: The Next Decade of Data Science

  • 1. Data, Responsibly: The Next Decade of Data Science Bill Howe, PhD Associate Professor, Information School Associate Director, eScience Institute Adjunct Associate Professor, Computer Science & Engineering University of Washington
  • 2. My goals this evening • Describe emerging topics in data science research and practice around a technical interpretation of ethics • Describe some specific thrusts we are pursuing • Encourage you to get involved 11/10/2016 Data, Responsibly / SciTech NW 2
  • 3. How much time do you spend “handling data” as opposed to “doing science”? Mode answer: “90%” 11/10/2016 Bill Howe, UW 3
  • 4. 1) Upload data “as is” Cloud-hosted; no need to install or design a database; no pre-defined schema 2) Analyze data with SQL Right in your browser, writing queries on top of queries on top of queries ... SELECT hit, COUNT(*) FROM tigrfam_surface GROUP BY hit ORDER BY cnt DESC 3) Share the results Click on the science question, see the SQL that answers it
  • 5. 5 SPARQL (GEMS)Serial C++ PGAS/HP CMyriaX RDBMS SQLDatalogMyriaL Compiler Compiler Compiler Compiler Compiler Hadoop (via layers) Compiler multiple languages multiple big data systems multiple GUIs/apps
  • 6. Making it easier to do data science • SQLShare: Easier to use a database • Myria: Easier to use a bunch of different systems at once, at scale • Worked great in the physical sciences • But some collaborators weren’t that excited… November 10, 2016 6
  • 7. 11/10/2016 Bill Howe, UW 7 Data Science Kickoff Session: 137 posters from 30+ departments and units
  • 8. Data, data, data 8 Kevin Merrit CEO Socrata Deep Dhillon CTO Socrata
  • 9. 9 • Pursue transformative interdisciplinary urban research • Facilitate translation from UW to .gov stakeholders • Position Seattle/UW as a leader in applied urban research • 80+ faculty from 20+ departments around campus
  • 10. 10 Assessing Community Well-Being Third-Place Technologies Optimization of King County Metro Paratransit Computer Science & Engineering Predictors of Permanent Housing for Homeless Families Bill and Melinda Gates Foundation Open Sidewalk Graph for Accessible Trip Planning Computer Science & Engineering Inaugural 2015 program: 16 spots 140 applicants …from 20+ departments
  • 11. 11 Mining Online Data to Detect Unsafe Food Products Elaine Nsoesie, Institute for Health Metrics and Evaluation ORCA data for improved transit system planning and operation Washington State Transportation Center (TRAC) Global Open Sidewalks: Creating a shared open data layer Taskar Center for Accessible Technology CrowdSensing Census: A tool for estimating poverty Bell Labs, Nokia 2016 program: 16 spots 190 applicants New in 2016: An explicit emphasis on data ethics
  • 13. July 2016 “Data, Responsibly” Dagstuhl Workshop Gerhard Weikum Serge Abiteboul Julia Stoyanovich Gerome Miklau
  • 14. 14 Cathy O’Neil September 2016 Three properties of a WMD: Opacity Scale Damage
  • 15. First decade of Data Science research and practice: What can we do with massive, noisy, heterogeneous datasets? Next decade of Data Science research and practice: What should we do with massive, noisy, heterogeneous datasets? The way I think about this…..(1)
  • 16. The way I think about this…. (2) Decisions are based on two sources of information: 1. Past examples e.g., “prior arrests tend to increase likelihood of future arrests” 2. Societal constraints e.g., “we must avoid racial discrimination” 11/10/2016 Data, Responsibly / SciTech NW 16 We’ve become very good at automating the use of past examples We’ve only just started to think about incorporating societal constraints
  • 17. The way I think about this… (3) How do we apply societal constraints to algorithmic decision-making? Option 1: Keep a human in the loop Ex: EU General Data Protection Regulation requires that a human be involved in legally binding algorithmic decision-making Ex: Wisconsin Supreme Court says a human must review algorithmic decisions made by recidivism models Option 2: Build them into the algorithms themselves I’ll talk about some approaches for this 11/10/2016 Data, Responsibly / SciTech NW 17
  • 18. The way I think about this…(4) On transparency vs. accountability: • For human decision-making, sometimes explanations are required, improving transparency – Supreme court decisions – Employee reprimands/termination • But when transparency is difficult, accountability takes over – medical emergencies, business decisions • As we shift decisions to algorithms, we lose both transparency AND accountability • “The buck stops where?” 11/10/2016 Data, Responsibly / SciTech NW 18
  • 19. Some Facets of “Data, Responsibly” • Privacy • Fairness • Transparency • Reproducibility • Ethics 11/10/2016 Data, Responsibly / SciTech NW 19 I won’t be talking about this I’ll give a taste of the work here I won’t be talking about this Towards automatic scientific claim-checking Vignette on teaching data ethics
  • 21. 21 Ex: Staples online pricing Reasoning: Offer deals to people that live near competitors’ stores Effect: lower prices offered to buyers who live in more affluent neighborhoods
  • 22. 22 [Latanya Sweeney; CACM 2013] Racially identifying names trigger ads suggestive of an arrest record slide adapted from Stoyanovich, Miklau
  • 24. 24 The Special Committee on Criminal Justice Reform's hearing of reducing the pre-trial jail population. Technical.ly, September 2016 Philadelphia is grappling with the prospect of a racist computer algorithm Any background signal in the data of institutional racism is amplified by the algorithm operationalized by the algorithm legitimized by the algorithm “Should I be afraid of risk assessment tools?” “No, you gotta tell me a lot more about yourself. At what age were you first arrested? What is the date of your most recent crime?” “And what’s the culture of policing in the neighborhood in which I grew up in?”
  • 25. 26 Towards a precise characterization of fairness… Positive Outcomes Negative Outcomes offered employment denied employment accepted to school rejected from school offered a loan denied a loan offered a discount not offered a discount Label outcomes to individuals as positive or negative Fairness is concerned with how outcomes are assigned to a population slide adapted from Stoyanovich, Miklau
  • 26. 27 Statistical parity race black white ⊕ ⊖ ⊖ ⊕ ⊕⊖ ⊖ ⊖ ⊕ 40% of the whole population positive outcomes ⊖ Statistical parity demographics of the individuals receiving any outcome are the same as demographics of the underlying population 20% of black 60% of white slide adapted from Stoyanovich, Miklau
  • 27. 28 First attempt: Ignore sensitive information zip code 10025 10027 race black white 20% of black 60% of white ⊕ ⊖ ⊖⊖ ⊕ ⊕ ⊖ ⊖ ⊖ ⊕ positive outcomes Removing race from the vendor’s assignment process does not prevent discrimination Assessing disparate impact Discrimination is assessed by the effect on the protected sub- population, not by the input or by the process that lead to the effect. slide adapted from Stoyanovich, Miklau
  • 28. 29 More directly: Impose statistical parity credit score good bad black white ⊕ ⊖ ⊖ ⊖ ⊕ ⊕ ⊖ ⊖ ⊖ ⊕ positive outcomes 40% of black 40% of white race positive outcome: offered a loan Tradeoff between (perceived) accuracy and fairness; may be contrary to the goals of the vendor slide adapted from Stoyanovich, Miklau
  • 29. 30 A systems approach: FairTest: fairness test suite for data analysis apps • Tests for unintentional discrimination according to several representative discrimination measures. • Automates search for context-specific associations between protected variables and application outputs • Report findings, ranked by association strength and affected population size [F. Tramèr et al., arXiv:1510.02377 (2015)]
  • 30. As a corporation, should I care? Compliance Jacobson, Scientific American, 2013 Customer Retention Employee Retention Eichler, Hiffington Post, 2012 CNET, May 2016
  • 32. Science is a complete mess • Reproducibility – Begley & Ellis, Nature 2012: 6 out of 53 cancer studies reproducible – Only about half of psychology 100 studies had effect sizes that approximated the original result (Science, 2015) – Ioannidis 2005: Why most public research findings are false – Reinhart & Rogoff: global economic policy based on spreadsheet fuck ups 11/10/2016 Bill Howe, UW 33
  • 34. 11/10/2016 Data, Responsibly @ Dagstuhl 35 Retractions are increasing…..
  • 35.
  • 36. Why is this happening? (1) 11/10/2016 Bill Howe, UW 37
  • 37. Why is this happening? (2) 11/10/2016 Bill Howe, UW 38
  • 38. Why is this happening? (2) Publication Bias!
  • 39. “DEEP CURATION” TOWARDS AUTOMATIC SCIENTIFIC CLAIM CHECKING
  • 40. Vision: Validate scientific claims automatically – Check for manipulation (manipulated images, Benford’s Law) – Extract claims from papers – Check claims against the authors’ data – Check claims against related data sets – Automatic meta-analysis across the literature + public datasets • First steps – Automatic curation: Validate and attach metadata to public datasets – Longitudinal analysis of the visual literature 11/10/2016 Data, Responsibly / SciTech NW 41
  • 42. 11/10/2016 Bill Howe, UW 43 Microarray samples submitted to the Gene Expression Omnibus Curation is fast becoming the bottleneck to data sharing Maxim Gretchkin Hoifung Poon
  • 43. Maxim Gretchkin Hoifung Poon No growth in number of datasets used per paper!
  • 45. color = labels supplied as metadata clusters = 1st two PCA dimensions on the gene expression data itself Can we use curate algorithmically? Maxim Gretchkin Hoifung Poon The expression data and the text labels appear to disagree
  • 46. Maxim Gretchkin Hoifung Poon Better Tissue Type Labels Domain knowledge (Ontology) Expression data Free-text Metadata 2 Deep Networks text expr SVM
  • 47. Deep Curation Maxim Gretchkin Hoifung Poon Distant supervision and co-learning between text- based classified and expression-based classifier: Both models improve by training on each others’ results. Free-text classifier Expression classifier
  • 48. Deep Curation: Our stuff wins, with no training data Maxim Gretchkin Hoifung Poon state of the art our reimplementation of the state of the art our dueling pianos NN amount of training data used
  • 49. VIGNETTE ON TEACHING DATA ETHICS 11/10/2016 Bill Howe, UW 51
  • 50. Alcohol Study, Barrow Alaska, 1979 Native leaders and city officials, worried about drinking and associated violence in their community invited a group of sociology researchers to assess the problem and work with them to devise solutions.
  • 51. Methods • 10% representative sample (N=88) of everyone over the age of 15 using a 1972 demographic survey • Interviewed on attitudes and values about use of alcohol • Obtained psychological histories including drinking behavior • Given the Michigan Alcoholism Screening Test (Seltzer, 1971) • Asked to draw a picture of a person – Used to determine cultural identity
  • 52. Results announced unilaterally and publicly At the conclusion of the study researchers formulated a report entitled “The Inupiat, Economics and Alcohol on the Alaskan North Slope” which was released simultaneously at a press release and to the Barrow community. The press release was picked up by the New York Times, who ran a front page story entitled Alcohol Plagues Eskimos
  • 53. The results of the Barrow Alcohol Study in Alaska were revealed in the context of a press conference that was held far from the Native village, and without the presence, much less the knowledge or consent, of any community member who might have been able to present any context concerning the socioeconomic conditions of the village. Study results suggested that nearly all adults in the community were alcoholics. In addition to the shame felt by community members, the town’s Standard and Poor bond rating suffered as a result, which in turn decreased the tribe’s ability to secure funding for much needed projects. Backlash
  • 54. Methodological Problems “The authors once again met with the Barrow Technical Advisory Group, who stated their concern that only Natives were studied, and that outsiders in town had not been included.” “The estimates of the frequency of intoxication based on association with the probability of being detained were termed "ludicrous, both logically and statistically.”” Edward F. Foulks, M.D., Misalliances In The Barrow Alcohol Study
  • 55. Ethical Problems • Participants were not in control of their data nor the context in which they were presented. • Easy to demonstrate specific, significant harms: – Social: Stigmatization – Financial: Bond rating lowered • Important: Nothing to do with individual privacy – No PII revealed at any point, to anyone – No violations of best practices in data handling – But even those who did not participate in the study incurred harm
  • 56. Two Topics • Social Component: Codes of Conduct • Technical Component: Managing Sensitive Data
  • 57. Ethical principles vs. ethical rules • In the Barrow example, ethical rules were generally followed • But ethical principles were violated: The researchers appear to have placed their own interests ahead of those of the research subjects, the client, and society
  • 58. Principles: Codes of Conduct • American Statistical Association – http://www.amstat.org/committees/ethics/ • Certified Analytics Professional – https://www.certifiedanalytics.org/ethics.php • Data Science Association – http://www.datascienceassn.org/code-of- conduct.html
  • 59. Recap • There’s a sea change underway in how we will teach and practice data science • No longer only about what can be done, but about what should be done • This is not just a policy/behavior/culture issue – there are technical problems to solve • If you’re not thinking about this stuff, you will be facing retention issues and compliance issues very soon – Witness privacy, which is a few years ahead

Notas do Editor

  1. In each of these fields, my research interests are driven by this question. We like to ask researchers how much time they spend "handling data" as opposed to "doing science”.  They say things like 90%, and they don’t even blink. So my overarching research question is “How can we reduce this "data science overhead.”?
  2. One effort in this area was to develop SQLShare, where we emphasize a very simple workflow: 1) you can upload data “as is” from spreadsheets or anything. There’s no need to install software or design a schema. 2) Then you can immediately begin writing queries, right in your browser, and define queries on top of queries on top of queries to express even complex workflows. 3) Then you can share the results online: Your colleagues can browse science questions in English and see the SQL that answers it. ---- Key ideas to get data in: a) Use the cloud to avoid having to install and run a database b) Give up on the schema -- just throw your data in "as is" and do "lazy integration.” c) Use some magic to automate parsing, integration, recommendations, and more. Key ideas to get data out: a) Associate science questions (in English) with each SQL query -- makes them easy to understand and easy to find. b) Saving and reusing queries is a first class requirement.  Given examples, it's easy to modify it into an "adjacent" query. c) Expose the whole system through a REST API to make it easy to bring new client applications online.
  3. You’ll see these applications in the demo
  4. Solutions are emerging, powered by the open data movement. Socrata, a local Seattle company, has built a very successful business of helping cities jailbreak their data, and are now engaged in climbing the application stack to support analytics and visualization. Essentially every url of the form data.yourcity.gov is powered by Socrata’s technology Data, People, and Infrastructure
  5. Following a 2014 report entitled “Big Data: Seizing Opportunities, Preserving Values”
  6. On which projects should we engage? How can we ensure fairness, accountability, and transparency for algorithmic decision-making? How do we ensure privacy? How do we avoid junk science?
  7. From WLJ article: A Wall Street Journal investigation found that the Staples Inc. website displays different prices to people after estimating their locations. More than that, Staples appeared to consider the person's distance from a rival brick-and-mortar store, either OfficeMax Inc. or Office Depot Inc. If rival stores were within 20 miles or so, Staples.com usually showed a discounted price. In what appears to be an unintended side effect of Staples' pricing methods—likely a function of retail competition with its rivals—the Journal's testing also showed that areas that tended to see the discounted prices had a higher average income than areas that tended to see higher prices.
  8. Users and regulators must be able to understand how raw data was selected, and what operations were performed during analysis Users want to control what is recorded about them and how that information is used Users must be able to access their own information and correct any errors (US Fair Credit Reporting Act) Transparency facilitates accountability - verifying that a service performs as it should, and that data is used according to contract
  9. LSI-R model 25 states use it Most for targeted programs Idaho and Colorado use this for sentencing “As a Black male,” Cobb asked Penn statistician and resident expert Richard Berk, “should I be afraid of risk assessment tools?” “No,” Berk said, without skipping a beat. “You gotta tell me a lot more about yourself. … At what age were you first arrested? What is the date of your most recent crime? What are you charged with?” Cobb interjected: “And what’s the culture of policing in the neighborhood in which I grew up in?” (emphasis mine) That's exactly the point (and to Michael -- this is what I was arguing about with the guy from Comcast): a little bit of institutional racism has a triple effect: a) institutional racism is amplified by the algorithm (a small signal can now dominate the model) b) institutional racism is operationalized by the algorithm (it's far easier now to make impactful decisions based on bad data) c) institutional racism is legitimized by the algorithm (so that everyone thinks "it's just data" and actively defends the algorithm's assumed objectivity, even when the racist results are staring you right in the face. This vigorous defense doesn't happen when a human is shown to be correlating their decisions perfectly with race.)
  10. More formal definition; cite laws, 80% rule, statistical significance.
  11. Redundant encoding was actually used to disguise discrimination: red lining
  12. The point here is that avoiding discrimination may be directly in conflict with the vendor (or classifier’s) utility goals. It is therefore a constraint on the assignment of outcomes that must be balanced with the vendor’s interests. In reality, it is impossible to predict loan payback accurately, so we use past information. Then the question arises whether that past information is biased.
  13. You can’t roll the dice a bunch of times then yell “Yahtzee!”
  14. Google knowledge graph Specialized Ontologies
  15. Google knowledge graph Specialized Ontologies
  16. Google knowledge graph Specialized Ontologies
  17. "HeLa", "K562", "MCF-7" and "brain tumor” PCA on expression values
  18. Google knowledge graph – common knowledge, high redundancy, possibly crowdsourcing (visual: question answering via Google) Text features: presence of ontology terms sibling of ontology term Expression features
  19. Native leaders and city officials in Barrow, Alaska, worried about drinking and associated violence and accidental deaths in their community invited a group of sociology researchers to assess the problem and work with them to devise solutions. At the conclusion of the study researchers formulated a report entitled “The Inupiat, Economics and Alcohol on the Alaskan North Slope” which was released simultaneously at a press release and to the Barrow community. The press release was picked up by the New York Times, who ran a front page story entitled Alcohol Plagues
  20. Responsibility to which parties? * Society * Employers and Clients * Colleagues * Research Subjects ASA: Professionalism Responsibilities to Funders, Clients, Employers Responsibilities in Publications and Testimony Responsibilities to Research Subjects Responsibilities to Research Team Colleagues Responsibilities to Other Statisticians or Statistical Practitioners Responsibilities Regarding Allegations of Misconduct Responsibilities of Employers Code of Conduct: Rules Competence Do what you client asks, unless violates law Communication with clients Confidential information Conflicts of interest Rule 7: More on conflicts of interest and confidentiality Rule 8: Scientific integrity +++ Interesting: If a data scientist reasonably believes a client is misusing data science to communicate a false reality or promote an illusion of understanding, the data scientist shall take reasonable remedial measures, including disclosure to the client, and including, if necessary, disclosure to the proper authorities. The data scientist shall take reasonable measures to persuade the client to use data science appropriately. Rule 9: Misconduct (follow the rules)