SlideShare uma empresa Scribd logo
1 de 67
Data ethics and machine learning
Discrimination, algorithmic bias, and
how to discover them.
DINO PEDRESCHI
KDDLAB, DIPARTIMENTO DI INFORMATICA, UNIVERSITÀ DI PISA
Opportunities of
big data
4
5
Spot business trends
Prevent diseases
Fight crime
Improve transportation
Personalised services
Improve wellbeing
Event Detection
Detecting events in a geographic area
classifying the different kinds of users.
City of Rome
Metropolitan area
Covered geographical region: city of Rome
Dataset size per snapshot: ≈ 1.2 GBytes per day
Number of records: ≈ 5.6 million lines per day
8 months between 2015 and 2016
San Pietro
San Giovanni
Circo Massimo
Stadio Olimpico
End users
Traveler
Mobility
Manager
City
Personal mobility assistant
12
Carpooling
Network
Estimating wellbeing with mobility data
AI and Big Data 13
A
B
C
H
W
Predicting GDP with Retail Market data
14
generic utility
function
(rationality)
personal utility
function
(diversity)
Product
Price
Quantity
Needed
Sophistication
R2 = 17.25% R2 = 32.38%
R2 = 85.72%
Risks of big data
15
Big Data, Big Risks
Big data is algorithmic, therefore it cannot be biased! And yet…
• All traditional evils of social discrimination, and many new ones, exhibit
themselves in the big data ecosystem
• Because of its tremendous power, massive data analysis must be used
responsibly
• Technology alone won’t do: also need policy, user involvement and
education efforts
16
By 2018, 50% of business ethics
violations will occur through
improper use of big data analytics
[source: Gartner, 2016]
AI and Big Data 17
AI and Big Data 18
19
The danger of black boxes - 1
The COMPAS score (Correctional Offender Management Profiling for
Alternative Sanctions)
A 137-questions questionnaire and a predictive model for “risk of
crime recidivism.” The model is a proprietary secret of Northpointe,
Inc.
The data journalists at propublica.org have shown that
• the prediction accuracy of recidivism is rather low (around 60%)
• the model has a strong ethnic bias
◦ blacks who did not reoffend are classified as high risk twice as much as
whites who did not reoffend
◦ whites who did reoffend were classified as low risk twice as much as
blacks who did reoffend.
AI and Big Data 20
The danger of black boxes -2
The three major US credit bureaus, Experian, TransUnion, and
Equifax, providing credit scoring for millions of individuals, are
often discordant.
In a study of 500,000 records, 29% of consumers received credit
scores that differ by at least fifty points between credit bureaus, a
difference that may mean tens of thousands dollars over the life of
a mortgage [CRS+16].
AI and Big Data 21
The danger of black boxes - 3
In 2010, some homeowners with a regular payment
history of their mortgage reported a sudden drop of forty
points in their credit score, soon after their own enquiry.
AI and Big Data 22
The danger of black boxes - 4
During the 1970s and 1980s, St. George’s Hospital
Medical School in London used a computer program for
initial screening of job applicants.
The program used information from applicants’ forms,
which contained no reference to ethnicity.
The program was found to unfairly discriminate against
female applicants and ethnic minorities (inferred from
surnames and place of birth), less likely to be selected for
interview [LM88].
AI and Big Data 23
The danger of black boxes - 5
In a recent paper at SIGKDD 2016 [RSG16] the authors
show how an accurate but untrustworthy classifier may
result from an accidental bias in the training data.
In a task of discriminating wolves from huskies in a
dataset of images, the resulting deep learning model is
shown to classify a wolf in a picture based solely on …
AI and Big Data 24
The danger of black boxes - 5
In a recent paper at SIGKDD 2016 [RSG16] the authors
show how an accurate but untrustworthy classifier may
result from an accidental bias in the training data.
In a task of discriminating wolves from huskies in a
dataset of images, the resulting deep learning model is
shown to classify a wolf in a picture based solely on …
the presence of snow in the background!
[RSG16] “Why Should I Trust You?” Explaining the Predictions of Any Classifier
SIGKDD 2016 Conference Paper
AI and Big Data 25
Deep learning is creating computer
systems we don't fully understand
www.theverge.com/2016/7/12/12158238/first-click-deep-learning-algorithmic-
black-boxes
AI and Big Data 26
Is AI Permanently Inscrutable?
nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable
27
The danger of black boxes - 6
In a recent study at Princeton Univ, the authors show
how the semantics derived automatically from large
text/web corpora contains human biases
◦ E.g., names associated with whites were found to be
significantly easier to associate with pleasant than
unpleasant terms, compared to names associated with
black people.
Therefore, any machine learning model trained on text
data for, e.g., sentiment or opinion mining has a strong
chance of inheriting the prejudices reflected in the
human-produced training data.
AI and Big Data 28
Human Bias
AI and Big Data 29
Human Bias can be Learned - 7
AI and Big Data 30
As we stated in our 2008 SIGKDD paper that started the field of
discrimination-aware data mining [PRT08]:
“learning from historical data recording human decision making
may mean to discover traditional prejudices that are endemic in
reality, and to assign to such practices the status of general rules,
maybe unconsciously, as these rules can be deeply hidden within
the learned classifier.”
AI and Big Data 31
Policies
BIG DATA ETHICS
Satya Nadella's rules for AI
www.theverge.com/2016/6/29/12057516/satya-nadella-ai-robot-laws
AI and Big Data 33
U.S. – F.T.C.
Salvatore Ruggieri 34
www.ftc.gov/system/files/documents/reports/big-data-tool-inclusion-or-
exclusion-understanding-issues/160106big-data-rpt.pdf (Sept. 2014)
U.S. – White House
Salvatore Ruggieri 35
www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1
_2014.pdf (May 2014)
U.S. – White House
Salvatore Ruggieri
36
www.whitehouse.gov/sites/default/files/microsites/ostp/2016_0504_data_disc
rimination.pdf (May 2016)
U.S. – White House
www.whitehouse.gov/sites/default/files/whitehouse_files/microsites/ostp/NST
C/preparing_for_the_future_of_ai.pdf (October 2016)
AI and Big Data 37
E.U. - EDPS
Salvatore Ruggieri 38
secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Con
sultation/Opinions/2015/15-11-19_Big_Data_EN.pdf
E.U. - EDPS
Salvatore Ruggieri 39
secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Con
sultation/Opinions/2015/15-09-11_Data_Ethics_EN.pdf
Netherlands
www.knaw.nl/en/news/publications/ethical-and-legal-aspects-of-informatics-
research (September 2016)
AI and Big Data 40
Big Data Ethics
informationaccountability.org/big-data-ethics-initiative/
AI and Big Data 41
Value-Sensitive Design
Design for privacy
Design for security
Design for inclusion
Design for sustainability
Design for democracy
Design for safety
Design for transparency
Design for accountability
Design for human capabilities
AI and Big Data 42
EU Projects: SoBigData.eu
Social Mining & Big Data Ecosystem project (SoBigData, H2020-INFRAIA-2014-2015,
duration: 2015-2019, www.sobigdata.eu
AI and Big Data 43
Master Universitario Di II Livello
BigData Technology
BigData Sensing&Procurement
BigData Mining
BigData StoryTelling
BigData Ethics
Il Master Big Data ha l’obiettivo di formare“data scientists”,dei
professionisti dotati di un mix di competenze multidisciplinari
che permettono non solo di acquisire dati ed estrarne conos-
cenza, ma anche di raccontare“storie” attraverso questi dati, a
supporto delle decisioni, della creatività e dello sviluppo di
servizi innovativi, e di saper gestire le ripercussioni etiche e
legali dei Big Data, che spesso contengono informazioni
personali e suscitano problematiche relative alla privacy, alla
trasparenza,alla consapevolezza.
Aree di innovazione socio-economica:
BigData for Social Good
BigData forBusiness
Big Data AnalyticsESocial Mining
SoBigData
Data Ethics Literacy
Rapporto MIUR su Big Data, 28 Luglio 2016
◦ www.istruzione.it/allegati/2016/bigdata.pdf
Master UNIPI in Big Data Analytics & Social Mining
◦ masterbigdata.it
AI and Big Data 44
Data ethics
technologies
DISCRIMINATION DISCOVERY FROM DATA
AI and Big Data 46
Discrimination discovery
Given:
◦ an historical database of decision records, each describing
features of an applicant to a benefit
◦ e.g., a credit request to a bank and the corresponding on credit approval/denial
◦ some designated categories of applicants, such as groups
protected by anti-discrimination laws,
find whether, and in which circumstances, there are
evidences of discrimination of the designated categories
that emerge from the data.
DCUBE: Discrimination Discovery in Databases 47
German Credit dataset
DCUBE: Discrimination Discovery in Databases 48
How? Fight with the same weapons
Idea: use data mining to discover discrimination
◦ the decision policies hidden in a database can be represented by
decision rules and discovered by frequent pattern mining
◦ Once found all such decision rules, highlight all potential niches
of discrimination by filtering the rules using a measure that
quantifies the discrimination risk.
DCUBE: Discrimination Discovery in Databases 49
Discrimination discovery from data
FOREIGN_WORKER=yes
& PURPOSE=new_car & HOUSING=own
 CREDIT=bad
◦ elift = 5,19 supp = 56 conf = 0,37
elift = 5,19 means that foreign workers have more than 5
times more probability of being refused credit than the
average population (even if they own their house).
50
 Outcome:
 Funded
 Not funded
 Conditionally funded
Case Study: grant evaluation
51
Dataset attributes
52
Features of the PI
Project costs
Research Area
Project Evaluation
A potentially discriminatory rule
Antecedent
◦ Project proposals in “Physical and Analytical
Chemical Sciences”
◦ Young females
◦ Total cost of 1,358,000 Euros or above
Possible interpretation
◦ “Peer-reviewers of panel PE4 trusted young females
requiring high budgets less than males leading
similar projects”
53
Case study: US Harmonized Tariff System
US Harmonized Tariff System (HTS)
https://hts.usitc.gov/
Detailed tariff classification system for
merchandise imported to US
Chapter 61, 62, 64, 65: apparels
◦ Different taxes for same garments
separately produced for male and female
◦ Description is at semi-structured form
64.4¢/kg + 18.8%96¢/doz + 1.4%8.5%Women and
girls
38.6¢/kg + 10%08.9%Men and boys
CoatsFur felt hatsCotton pajamas
Different
taxes for
same
apparels for
men and
women
64.4¢/kg + 18.8%96¢/doz + 1.4%8.5%Women and
girls
38.6¢/kg + 10%08.9%Men and boys
CoatsFur felt hatsCotton pajamas
Different
taxes for
same
apparels for
men and
women
54
Women: 14%
Men: 9%
1.3 billions USD!!!
AI and Big Data 55
Totes-Isotoner Corp. v. U.S.
Rack Room Shoes Inc. and
Forever 21 Inc. vs U.S.
Court of International Trade
U.S. Court of Appeals for the Federal
Circuit (2014)
“[…] the courts may have concluded that
Congress had no discriminatory intent when
ruling the HTS, but there is little
doubt that gender-based tariffs have
discriminatory impact”
Sample rule from the HTS dataset
AI and Big Data 56
Soccer Player Ratings
Soccer Player Ratings
How humans
evaluate sports
performance?
Human evaluation line
Technical
features
Machine
performance
Human evaluation line
Technical
features
Technical+Contextual
features
Machine
performance
Wrapping up
AI AND BIG DATA 62
Right of explanation
• Applying AI within many domains requires
transparency and responsibility:
• health care
• finance
• surveillance
• autonomous vehicles
• Government
• EU General Data Protection Regulation (April
2016) establishes (?) a right of explanation
for all individuals to obtain “meaningful
explanations of the logic involved” when
automated (algorithmic) individual decision-
making, including profiling, takes place.
• In sharp contrast, (big) data-driven AI/ML
models are often black boxes.
AI and Big Data 63
Accountability
“Why exactly was my loan application rejected?”
“What could I have done differently so that my application
would not have been rejected?”
AI and Big Data 64
Social Mining & Big Data Ecosystem
www.sobigdata.eu
66
Knowledge Discovery
& Data Mining Lab
http://kdd.isti.cnr.it
Special thanks
• Salvatore Ruggieri
• Franco Turini
• Fosca Giannotti
• Anna Monreale
• Luca Pappalardo
SMARTCATs

Mais conteúdo relacionado

Mais procurados

Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsJOSEPH FRANCIS
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for HealthcareChandan Reddy
 
Big data analytics in healthcare
Big data analytics in healthcareBig data analytics in healthcare
Big data analytics in healthcareJoseph Thottungal
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
Introduction to US Privacy and Data Security Regulations and Requirements (Se...
Introduction to US Privacy and Data Security Regulations and Requirements (Se...Introduction to US Privacy and Data Security Regulations and Requirements (Se...
Introduction to US Privacy and Data Security Regulations and Requirements (Se...Financial Poise
 
computer forensics by amritanshu kaushik
computer forensics by amritanshu kaushikcomputer forensics by amritanshu kaushik
computer forensics by amritanshu kaushikamritanshu4u
 
Unlocking Any Door In The 21st Century. Immersion In Biometric Security.
Unlocking Any Door In The 21st Century. Immersion In Biometric Security.Unlocking Any Door In The 21st Century. Immersion In Biometric Security.
Unlocking Any Door In The 21st Century. Immersion In Biometric Security.Payment Village
 

Mais procurados (20)

What is big data?
What is big data?What is big data?
What is big data?
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
Big data analytics in healthcare
Big data analytics in healthcareBig data analytics in healthcare
Big data analytics in healthcare
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to US Privacy and Data Security Regulations and Requirements (Se...
Introduction to US Privacy and Data Security Regulations and Requirements (Se...Introduction to US Privacy and Data Security Regulations and Requirements (Se...
Introduction to US Privacy and Data Security Regulations and Requirements (Se...
 
Data science
Data scienceData science
Data science
 
computer forensics by amritanshu kaushik
computer forensics by amritanshu kaushikcomputer forensics by amritanshu kaushik
computer forensics by amritanshu kaushik
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Overview on data privacy
Overview on data privacy Overview on data privacy
Overview on data privacy
 
Unlocking Any Door In The 21st Century. Immersion In Biometric Security.
Unlocking Any Door In The 21st Century. Immersion In Biometric Security.Unlocking Any Door In The 21st Century. Immersion In Biometric Security.
Unlocking Any Door In The 21st Century. Immersion In Biometric Security.
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Data science ppt
Data science pptData science ppt
Data science ppt
 
Data Science
Data ScienceData Science
Data Science
 
Big data
Big dataBig data
Big data
 
Data Science Tools
Data Science ToolsData Science Tools
Data Science Tools
 

Semelhante a Data ethics and machine learning: discrimination, algorithmic bias, and how to discover them. Dino Pedreschi

JanData-mining-to-knowledge-discovery.ppt
JanData-mining-to-knowledge-discovery.pptJanData-mining-to-knowledge-discovery.ppt
JanData-mining-to-knowledge-discovery.pptgeorgejustymirobi1
 
Big data for development
Big data for development Big data for development
Big data for development Junaid Qadir
 
Transparency in ML and AI (humble views from a concerned academic)
Transparency in ML and AI (humble views from a concerned academic)Transparency in ML and AI (humble views from a concerned academic)
Transparency in ML and AI (humble views from a concerned academic)Paolo Missier
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big databis_foresight
 
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
June 2015 (142)  MIS Quarterly Executive   67The Big Dat.docxJune 2015 (142)  MIS Quarterly Executive   67The Big Dat.docx
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docxcroysierkathey
 
BigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" IntroductionBigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" IntroductionIvan Gruer
 
Algocracy and the state of AI in public administrations.
Algocracy and the state of AI in public administrations.Algocracy and the state of AI in public administrations.
Algocracy and the state of AI in public administrations.Sandra Bermúdez
 
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...Denis Parra Santander
 
Smart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislationSmart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislationcaniceconsulting
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and DefenseKishor Datta Gupta
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyClaudiu Popa
 
Big data 4 4 the art of the possible 4-en-web
Big data 4 4 the art of the possible 4-en-webBig data 4 4 the art of the possible 4-en-web
Big data 4 4 the art of the possible 4-en-webRick Bouter
 
Unveiling the Power of Data Science.pdf
Unveiling the Power of Data Science.pdfUnveiling the Power of Data Science.pdf
Unveiling the Power of Data Science.pdfKajal Digital
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 

Semelhante a Data ethics and machine learning: discrimination, algorithmic bias, and how to discover them. Dino Pedreschi (20)

JanData-mining-to-knowledge-discovery.ppt
JanData-mining-to-knowledge-discovery.pptJanData-mining-to-knowledge-discovery.ppt
JanData-mining-to-knowledge-discovery.ppt
 
Big data for development
Big data for development Big data for development
Big data for development
 
Transparency in ML and AI (humble views from a concerned academic)
Transparency in ML and AI (humble views from a concerned academic)Transparency in ML and AI (humble views from a concerned academic)
Transparency in ML and AI (humble views from a concerned academic)
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big data
 
Big Data Analytics (1).ppt
Big Data Analytics (1).pptBig Data Analytics (1).ppt
Big Data Analytics (1).ppt
 
Social Νetworks Data Mining
Social Νetworks Data MiningSocial Νetworks Data Mining
Social Νetworks Data Mining
 
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
June 2015 (142)  MIS Quarterly Executive   67The Big Dat.docxJune 2015 (142)  MIS Quarterly Executive   67The Big Dat.docx
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
 
BigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" IntroductionBigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" Introduction
 
Algocracy and the state of AI in public administrations.
Algocracy and the state of AI in public administrations.Algocracy and the state of AI in public administrations.
Algocracy and the state of AI in public administrations.
 
Business with Big data
Business with Big dataBusiness with Big data
Business with Big data
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
3.2
3.23.2
3.2
 
Smart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislationSmart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislation
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on Privacy
 
Big data 4 4 the art of the possible 4-en-web
Big data 4 4 the art of the possible 4-en-webBig data 4 4 the art of the possible 4-en-web
Big data 4 4 the art of the possible 4-en-web
 
Unveiling the Power of Data Science.pdf
Unveiling the Power of Data Science.pdfUnveiling the Power of Data Science.pdf
Unveiling the Power of Data Science.pdf
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 

Mais de Data Driven Innovation

Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Data Driven Innovation
 
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...Data Driven Innovation
 
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...Data Driven Innovation
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Data Driven Innovation
 
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...Data Driven Innovation
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Data Driven Innovation
 
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Data Driven Innovation
 
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Data Driven Innovation
 
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...Data Driven Innovation
 
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Data Driven Innovation
 
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Data Driven Innovation
 
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...Data Driven Innovation
 
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)Data Driven Innovation
 
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Data Driven Innovation
 
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Data Driven Innovation
 
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Data Driven Innovation
 
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Data Driven Innovation
 
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Data Driven Innovation
 
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Driven Innovation
 
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data Driven Innovation
 

Mais de Data Driven Innovation (20)

Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
 
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
 
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
 
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
 
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
 
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
 
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
 
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
 
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
 
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
 
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
 
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
 
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
 
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
 
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
 
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
 
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
 
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
 

Último

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

Data ethics and machine learning: discrimination, algorithmic bias, and how to discover them. Dino Pedreschi

  • 1. Data ethics and machine learning Discrimination, algorithmic bias, and how to discover them. DINO PEDRESCHI KDDLAB, DIPARTIMENTO DI INFORMATICA, UNIVERSITÀ DI PISA
  • 2.
  • 3.
  • 5. 5 Spot business trends Prevent diseases Fight crime Improve transportation Personalised services Improve wellbeing
  • 6. Event Detection Detecting events in a geographic area classifying the different kinds of users. City of Rome Metropolitan area Covered geographical region: city of Rome Dataset size per snapshot: ≈ 1.2 GBytes per day Number of records: ≈ 5.6 million lines per day 8 months between 2015 and 2016
  • 13. Estimating wellbeing with mobility data AI and Big Data 13 A B C H W
  • 14. Predicting GDP with Retail Market data 14 generic utility function (rationality) personal utility function (diversity) Product Price Quantity Needed Sophistication R2 = 17.25% R2 = 32.38% R2 = 85.72%
  • 15. Risks of big data 15
  • 16. Big Data, Big Risks Big data is algorithmic, therefore it cannot be biased! And yet… • All traditional evils of social discrimination, and many new ones, exhibit themselves in the big data ecosystem • Because of its tremendous power, massive data analysis must be used responsibly • Technology alone won’t do: also need policy, user involvement and education efforts 16
  • 17. By 2018, 50% of business ethics violations will occur through improper use of big data analytics [source: Gartner, 2016] AI and Big Data 17
  • 18. AI and Big Data 18
  • 19. 19
  • 20. The danger of black boxes - 1 The COMPAS score (Correctional Offender Management Profiling for Alternative Sanctions) A 137-questions questionnaire and a predictive model for “risk of crime recidivism.” The model is a proprietary secret of Northpointe, Inc. The data journalists at propublica.org have shown that • the prediction accuracy of recidivism is rather low (around 60%) • the model has a strong ethnic bias ◦ blacks who did not reoffend are classified as high risk twice as much as whites who did not reoffend ◦ whites who did reoffend were classified as low risk twice as much as blacks who did reoffend. AI and Big Data 20
  • 21. The danger of black boxes -2 The three major US credit bureaus, Experian, TransUnion, and Equifax, providing credit scoring for millions of individuals, are often discordant. In a study of 500,000 records, 29% of consumers received credit scores that differ by at least fifty points between credit bureaus, a difference that may mean tens of thousands dollars over the life of a mortgage [CRS+16]. AI and Big Data 21
  • 22. The danger of black boxes - 3 In 2010, some homeowners with a regular payment history of their mortgage reported a sudden drop of forty points in their credit score, soon after their own enquiry. AI and Big Data 22
  • 23. The danger of black boxes - 4 During the 1970s and 1980s, St. George’s Hospital Medical School in London used a computer program for initial screening of job applicants. The program used information from applicants’ forms, which contained no reference to ethnicity. The program was found to unfairly discriminate against female applicants and ethnic minorities (inferred from surnames and place of birth), less likely to be selected for interview [LM88]. AI and Big Data 23
  • 24. The danger of black boxes - 5 In a recent paper at SIGKDD 2016 [RSG16] the authors show how an accurate but untrustworthy classifier may result from an accidental bias in the training data. In a task of discriminating wolves from huskies in a dataset of images, the resulting deep learning model is shown to classify a wolf in a picture based solely on … AI and Big Data 24
  • 25. The danger of black boxes - 5 In a recent paper at SIGKDD 2016 [RSG16] the authors show how an accurate but untrustworthy classifier may result from an accidental bias in the training data. In a task of discriminating wolves from huskies in a dataset of images, the resulting deep learning model is shown to classify a wolf in a picture based solely on … the presence of snow in the background! [RSG16] “Why Should I Trust You?” Explaining the Predictions of Any Classifier SIGKDD 2016 Conference Paper AI and Big Data 25
  • 26. Deep learning is creating computer systems we don't fully understand www.theverge.com/2016/7/12/12158238/first-click-deep-learning-algorithmic- black-boxes AI and Big Data 26
  • 27. Is AI Permanently Inscrutable? nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable 27
  • 28. The danger of black boxes - 6 In a recent study at Princeton Univ, the authors show how the semantics derived automatically from large text/web corpora contains human biases ◦ E.g., names associated with whites were found to be significantly easier to associate with pleasant than unpleasant terms, compared to names associated with black people. Therefore, any machine learning model trained on text data for, e.g., sentiment or opinion mining has a strong chance of inheriting the prejudices reflected in the human-produced training data. AI and Big Data 28
  • 29. Human Bias AI and Big Data 29
  • 30. Human Bias can be Learned - 7 AI and Big Data 30
  • 31. As we stated in our 2008 SIGKDD paper that started the field of discrimination-aware data mining [PRT08]: “learning from historical data recording human decision making may mean to discover traditional prejudices that are endemic in reality, and to assign to such practices the status of general rules, maybe unconsciously, as these rules can be deeply hidden within the learned classifier.” AI and Big Data 31
  • 33. Satya Nadella's rules for AI www.theverge.com/2016/6/29/12057516/satya-nadella-ai-robot-laws AI and Big Data 33
  • 34. U.S. – F.T.C. Salvatore Ruggieri 34 www.ftc.gov/system/files/documents/reports/big-data-tool-inclusion-or- exclusion-understanding-issues/160106big-data-rpt.pdf (Sept. 2014)
  • 35. U.S. – White House Salvatore Ruggieri 35 www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1 _2014.pdf (May 2014)
  • 36. U.S. – White House Salvatore Ruggieri 36 www.whitehouse.gov/sites/default/files/microsites/ostp/2016_0504_data_disc rimination.pdf (May 2016)
  • 37. U.S. – White House www.whitehouse.gov/sites/default/files/whitehouse_files/microsites/ostp/NST C/preparing_for_the_future_of_ai.pdf (October 2016) AI and Big Data 37
  • 38. E.U. - EDPS Salvatore Ruggieri 38 secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Con sultation/Opinions/2015/15-11-19_Big_Data_EN.pdf
  • 39. E.U. - EDPS Salvatore Ruggieri 39 secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Con sultation/Opinions/2015/15-09-11_Data_Ethics_EN.pdf
  • 42. Value-Sensitive Design Design for privacy Design for security Design for inclusion Design for sustainability Design for democracy Design for safety Design for transparency Design for accountability Design for human capabilities AI and Big Data 42
  • 43. EU Projects: SoBigData.eu Social Mining & Big Data Ecosystem project (SoBigData, H2020-INFRAIA-2014-2015, duration: 2015-2019, www.sobigdata.eu AI and Big Data 43
  • 44. Master Universitario Di II Livello BigData Technology BigData Sensing&Procurement BigData Mining BigData StoryTelling BigData Ethics Il Master Big Data ha l’obiettivo di formare“data scientists”,dei professionisti dotati di un mix di competenze multidisciplinari che permettono non solo di acquisire dati ed estrarne conos- cenza, ma anche di raccontare“storie” attraverso questi dati, a supporto delle decisioni, della creatività e dello sviluppo di servizi innovativi, e di saper gestire le ripercussioni etiche e legali dei Big Data, che spesso contengono informazioni personali e suscitano problematiche relative alla privacy, alla trasparenza,alla consapevolezza. Aree di innovazione socio-economica: BigData for Social Good BigData forBusiness Big Data AnalyticsESocial Mining SoBigData Data Ethics Literacy Rapporto MIUR su Big Data, 28 Luglio 2016 ◦ www.istruzione.it/allegati/2016/bigdata.pdf Master UNIPI in Big Data Analytics & Social Mining ◦ masterbigdata.it AI and Big Data 44
  • 46. AI and Big Data 46
  • 47. Discrimination discovery Given: ◦ an historical database of decision records, each describing features of an applicant to a benefit ◦ e.g., a credit request to a bank and the corresponding on credit approval/denial ◦ some designated categories of applicants, such as groups protected by anti-discrimination laws, find whether, and in which circumstances, there are evidences of discrimination of the designated categories that emerge from the data. DCUBE: Discrimination Discovery in Databases 47
  • 48. German Credit dataset DCUBE: Discrimination Discovery in Databases 48
  • 49. How? Fight with the same weapons Idea: use data mining to discover discrimination ◦ the decision policies hidden in a database can be represented by decision rules and discovered by frequent pattern mining ◦ Once found all such decision rules, highlight all potential niches of discrimination by filtering the rules using a measure that quantifies the discrimination risk. DCUBE: Discrimination Discovery in Databases 49
  • 50. Discrimination discovery from data FOREIGN_WORKER=yes & PURPOSE=new_car & HOUSING=own  CREDIT=bad ◦ elift = 5,19 supp = 56 conf = 0,37 elift = 5,19 means that foreign workers have more than 5 times more probability of being refused credit than the average population (even if they own their house). 50
  • 51.  Outcome:  Funded  Not funded  Conditionally funded Case Study: grant evaluation 51
  • 52. Dataset attributes 52 Features of the PI Project costs Research Area Project Evaluation
  • 53. A potentially discriminatory rule Antecedent ◦ Project proposals in “Physical and Analytical Chemical Sciences” ◦ Young females ◦ Total cost of 1,358,000 Euros or above Possible interpretation ◦ “Peer-reviewers of panel PE4 trusted young females requiring high budgets less than males leading similar projects” 53
  • 54. Case study: US Harmonized Tariff System US Harmonized Tariff System (HTS) https://hts.usitc.gov/ Detailed tariff classification system for merchandise imported to US Chapter 61, 62, 64, 65: apparels ◦ Different taxes for same garments separately produced for male and female ◦ Description is at semi-structured form 64.4¢/kg + 18.8%96¢/doz + 1.4%8.5%Women and girls 38.6¢/kg + 10%08.9%Men and boys CoatsFur felt hatsCotton pajamas Different taxes for same apparels for men and women 64.4¢/kg + 18.8%96¢/doz + 1.4%8.5%Women and girls 38.6¢/kg + 10%08.9%Men and boys CoatsFur felt hatsCotton pajamas Different taxes for same apparels for men and women 54 Women: 14% Men: 9% 1.3 billions USD!!!
  • 55. AI and Big Data 55 Totes-Isotoner Corp. v. U.S. Rack Room Shoes Inc. and Forever 21 Inc. vs U.S. Court of International Trade U.S. Court of Appeals for the Federal Circuit (2014) “[…] the courts may have concluded that Congress had no discriminatory intent when ruling the HTS, but there is little doubt that gender-based tariffs have discriminatory impact”
  • 56. Sample rule from the HTS dataset AI and Big Data 56
  • 58. Soccer Player Ratings How humans evaluate sports performance?
  • 59.
  • 62. Wrapping up AI AND BIG DATA 62
  • 63. Right of explanation • Applying AI within many domains requires transparency and responsibility: • health care • finance • surveillance • autonomous vehicles • Government • EU General Data Protection Regulation (April 2016) establishes (?) a right of explanation for all individuals to obtain “meaningful explanations of the logic involved” when automated (algorithmic) individual decision- making, including profiling, takes place. • In sharp contrast, (big) data-driven AI/ML models are often black boxes. AI and Big Data 63
  • 64. Accountability “Why exactly was my loan application rejected?” “What could I have done differently so that my application would not have been rejected?” AI and Big Data 64
  • 65. Social Mining & Big Data Ecosystem www.sobigdata.eu
  • 66. 66 Knowledge Discovery & Data Mining Lab http://kdd.isti.cnr.it
  • 67. Special thanks • Salvatore Ruggieri • Franco Turini • Fosca Giannotti • Anna Monreale • Luca Pappalardo SMARTCATs