SlideShare uma empresa Scribd logo
1 de 39
DATA, AI & TOKENS:
A GLIMPSE OF WHAT’S TO COME
Managing Digital Transformation
AGENDA
1. What are “Big Data”?
2. Data and Data Science
3. Machine learning at scale
4. Still to come
WHAT ARE BIG DATA?
• Big Data is abroad term used to describe data sets that are large, complex, and cannot be
addressed by traditional IT methodologies and applications (Davenport, 2013)
• New technologies—both hardware and software—have had to be designed to manage the
volume
DIGITAL TRACE DATA
Taxonomy of Digital Data
Understanding different data types is crucial to correctly address
problematic areas relating to the use and collection of digital trace data
Data that we leave behind
Content Data Metadata
- User’s name and
address
- Substantial and
personal: can be
identified/linked
to a person
- Explicitly shared
or traced through
content shared on
social media like
Facebook
- User’s IP address, time of
login (data about data)
- Strength is in scale: as
companies can use it to
recognise user patterns
- Potentially problematic:
if it reveals things that we
don’t want to reveal,
example presence of a
mobile device at a protest
in might reveal the
identities of protesters
Entrusted Data: content we post
on medium not controlled by us
(FB). We don’t control what
firms do with our traces
Incidental Data: data about us
shared by others (tagged
photos). We neither influence
nor control our data traces
Service Data: Information we
provide to be able to use a
service
Disclosed Data: Content that we
post online, but on a medium
that we control, example blogs,
limiting our data traces
Behavioural Data:
unintentionally shared; captured
by services from our devices.
Example, time spent on a site
Derived Data: data inferred
about us from other data.
Example, our credit profiles built
by firms using personal data
DIGITAL TRACES
• Make existing services more efficient
• Create new services
• Access (or create?) new markets
“The loan amounts users are initially presented with currently
tend to be either £111 or £265, although I have also achieved
figures of £350 and £361. In my informal survey, those using
Apple products (a Safari browser, or say an iPhone or an
iPad) seemed to be most consistently offered £265. Although
tests with some obscure browsers suggest that it is likely that
it is less that you are ‘uprated’ by using Apple products, than
you are ‘down rated’ by using less niche browsers like Firefox
and Internet explorer.” (Deville 2013)
“The firm has found that people who
immediately shove the slider up to the
maximum amount on offer, currently £400
for 30 days for a first-time applicant for a
personal loan, are more likely than others
to default.” (Pollock 2012)
STRUCTURED VS UNSTRUCTURED DATA
• Structured: clean, organised, in a database format. Has relational properties and can be
divided into fields (e.g. what you have been working with in SQL)
• Thought to be 5-10% of all data
• Semi-structured: unstructured data that has some organisational properties that make it
easier to query, but not enough to be considered structured (e.g. your CSV files)
• Also around 5-10% of data
• Unstructured data: no structure, no clear relational properties (e.g. images, multimedia,
business documents)
• Around 80% of all data
AGENDA
1. What are “Big Data”?
2. Data and Data Science
3. Machine learning at scale
4. Still to come
SUGGEST SOME SOURCES
HAVE YOU EVER TRIED TO GET DATA?
AGENDA
1. What are “Big Data”?
2. Data and Data Science
3. Machine learning at scale
4. Still to come
“TRAINING” AN ALGO
“A computer program is said to learn from
experience (E) with some class of tasks (T) and
a performance measure (P) if its performance at
tasks in T as measured by P improves with E”
Training
data
Feature
Extraction
Model
ML
Algorithm
Test
data
Model
(learnt during
training phase)
predictions
TERMINOLOGY
• Features: features or distinct traits that can be used to describe each item in a
quantitative manner.
• Sample: item(s) to process (e.g. classify). It can be a document, a picture, a sound, a
video, a row in database or CSV file, or whatever you can describe with a fixed set of
quantitative traits.
• Feature extraction: simplifies samples into, e.g. vectors
• Training data: data to discover potentially predictive relationships.
• Test data: different data used to test the model built
CATEGORIES
• Supervised Learning
• Unsupervised Learning
• Semi-Supervised Learning
• Reinforcement Learning
SUPERVISED LEARNING
• the correct classes of the training data are known
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
UNSUPERVISED LEARNING
• the correct classes of the training data are not known
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
SEMI-SUPERVISED LEARNING
• A Mix of Supervised and Unsupervised learning
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
REINFORCEMENT LEARNING
• allows the machine or software agent to learn its behavior based on feedback from the
environment.
• This behavior can be learnt once and for all, or keep on adapting as time goes by.
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
GAMIFYING IT
Duolingo Wordrobe Foldit GalaxyZoo
IMPLICATIONS
• Consent
• Ownership
• Decontextualisation
• Third party services
MANAGERIAL CHALLENGES
• Leadership
Set clear goals, define success, ask the right questions, be creative, create a vision,
deal with stakeholders …
• Talent management
Obvious: Data scientists, computer scientists.
Also: Those who can reframe questions so that data can answer them, design
experiments, visualize and interpret data, speak the language of business.
• Technology
Commonly used: Hadoop. IT departments will need to adapt.
• Decision making
Bring people who understand the problem together with the relevant data.
• Company culture
Stop relying on hunches. Ask yourself ”What do we know?”, not ”What do we think?”
RECOMMENDATIONS
• Self-regulate
• Be transparent / educate your customers
• Need for clear rules around ownership
• Public infrastructure?
• Is data collection anti-competitive?
• Trust?
AGENDA
1. What are “Big Data”?
2. Data and Data Science
3. Machine learning at scale
4. Still to come
DIFFERENCE BETWEEN ML AND AI?
ARTIFICIAL INTELLIGENCE
• “ [The automation of] activities that we associate with human thinking, activities such
as decision-making, problem solving, learning ...“ (Bellman, 1978)
• "A field of study that seeks to explain and emulate intelligent behavior in terms of
computational processes" (Schalkoff, 1990)
• Turing Test: “Is a machine able to exhibit intelligent behavior equivalent to, or
indistinguishable from, that of a human?”
Blockchain?
Graphic from: http://www.i-scoop.eu/internet-of-things/
The Rise and Development of FinTech Crowds, Coins and Communities
Dr. Claire Ingram Bogusz
Stockholm School of Economics
claire@clairebogusz.com
@Claire_EBI
slides.clairebogusz.com

Mais conteúdo relacionado

Mais procurados

KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016
HCL Technologies
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohen
Taldor Group
 
TatvaSoft Company Profile
TatvaSoft Company ProfileTatvaSoft Company Profile
TatvaSoft Company Profile
Shweta Dastidar
 

Mais procurados (20)

NBSintro2013
NBSintro2013NBSintro2013
NBSintro2013
 
KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016
 
Conversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data StewardshipConversational Architecture, CAVE Language, Data Stewardship
Conversational Architecture, CAVE Language, Data Stewardship
 
Synthesys Technical Overview
Synthesys Technical OverviewSynthesys Technical Overview
Synthesys Technical Overview
 
Master Minds on Data Science - Maarten de Rijke
Master Minds on Data Science - Maarten de RijkeMaster Minds on Data Science - Maarten de Rijke
Master Minds on Data Science - Maarten de Rijke
 
Developing Culture and Innovation Based on Technopreneurship
Developing Culture and Innovation Based on TechnopreneurshipDeveloping Culture and Innovation Based on Technopreneurship
Developing Culture and Innovation Based on Technopreneurship
 
How To Think About Disruption
How To Think About DisruptionHow To Think About Disruption
How To Think About Disruption
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohen
 
Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semantic Computing Executive Briefing
Semantic Computing Executive Briefing
 
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation MatrixOWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
 
Bg wesleyan liberal arts to silicon valley oct 2016
Bg wesleyan liberal arts to silicon valley oct 2016Bg wesleyan liberal arts to silicon valley oct 2016
Bg wesleyan liberal arts to silicon valley oct 2016
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Dark data by Worapol Alex Pongpech
Dark data by Worapol Alex PongpechDark data by Worapol Alex Pongpech
Dark data by Worapol Alex Pongpech
 
Misceb intro2014
Misceb intro2014Misceb intro2014
Misceb intro2014
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data Management
 
Digital cultural heritage spring 2015 day 2
Digital cultural heritage spring 2015 day 2Digital cultural heritage spring 2015 day 2
Digital cultural heritage spring 2015 day 2
 
TatvaSoft Company Profile
TatvaSoft Company ProfileTatvaSoft Company Profile
TatvaSoft Company Profile
 
Sible 09
Sible 09Sible 09
Sible 09
 
事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明する事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明する
 
[Text Book] IoT Class Material - CoAP, OCF, and IoTivity
[Text Book] IoT Class Material - CoAP, OCF, and IoTivity[Text Book] IoT Class Material - CoAP, OCF, and IoTivity
[Text Book] IoT Class Material - CoAP, OCF, and IoTivity
 

Semelhante a Data, AI and Tokens: A Glimpse of What is to Come

big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptx
NATASHABANO
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
Inside Analysis
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
Thinkful
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data Modeling
Data Blueprint
 

Semelhante a Data, AI and Tokens: A Glimpse of What is to Come (20)

big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptx
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Big data
Big dataBig data
Big data
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Department of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsDepartment of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data Dashboards
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data Modeling
 
Data-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingData-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data Modeling
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
Data sci sd-11.6.17
Data sci sd-11.6.17Data sci sd-11.6.17
Data sci sd-11.6.17
 
Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdf
 
Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdf
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 

Mais de Claire Ingram Bogusz

Mais de Claire Ingram Bogusz (20)

Digital Traces, Ethics and Insight: Data-Driven Services in FinTech
Digital Traces, Ethics and Insight: Data-Driven Services in FinTechDigital Traces, Ethics and Insight: Data-Driven Services in FinTech
Digital Traces, Ethics and Insight: Data-Driven Services in FinTech
 
Identity: A Conversation Online
Identity: A Conversation OnlineIdentity: A Conversation Online
Identity: A Conversation Online
 
From "Cash on the Internet" to "Digital Gold"
From "Cash on the Internet" to "Digital Gold"From "Cash on the Internet" to "Digital Gold"
From "Cash on the Internet" to "Digital Gold"
 
Bitcoin, Blockchains -- and Supply Chains?
Bitcoin, Blockchains -- and Supply Chains?Bitcoin, Blockchains -- and Supply Chains?
Bitcoin, Blockchains -- and Supply Chains?
 
Crowdfunding and a Digital Innovation Though Experiment
Crowdfunding and a Digital Innovation Though ExperimentCrowdfunding and a Digital Innovation Though Experiment
Crowdfunding and a Digital Innovation Though Experiment
 
Blockchain and Trade Finance
Blockchain and Trade FinanceBlockchain and Trade Finance
Blockchain and Trade Finance
 
Digital traces, Ethics and FinTech
Digital traces, Ethics and FinTech Digital traces, Ethics and FinTech
Digital traces, Ethics and FinTech
 
FinTech forum, Setterwalls
FinTech forum, SetterwallsFinTech forum, Setterwalls
FinTech forum, Setterwalls
 
Translating problems into (data driven) solutions
Translating problems into (data driven) solutionsTranslating problems into (data driven) solutions
Translating problems into (data driven) solutions
 
Value Creation beyond GDP and the Sharing Economy
Value Creation beyond GDP and the Sharing EconomyValue Creation beyond GDP and the Sharing Economy
Value Creation beyond GDP and the Sharing Economy
 
FinTech, automation and the future of work
FinTech, automation and the future of workFinTech, automation and the future of work
FinTech, automation and the future of work
 
FinTech investment in Stockholm: March 2016
FinTech investment in Stockholm: March 2016 FinTech investment in Stockholm: March 2016
FinTech investment in Stockholm: March 2016
 
Stockholm, Europe's No.2 FinTech city
Stockholm, Europe's No.2 FinTech cityStockholm, Europe's No.2 FinTech city
Stockholm, Europe's No.2 FinTech city
 
The Sharing Economy: Embracing Change with Caution
The Sharing Economy: Embracing Change with CautionThe Sharing Economy: Embracing Change with Caution
The Sharing Economy: Embracing Change with Caution
 
Crowdfunding for science, March 2016
Crowdfunding for science, March 2016Crowdfunding for science, March 2016
Crowdfunding for science, March 2016
 
The Sharing Economy and Sweden
The Sharing Economy and SwedenThe Sharing Economy and Sweden
The Sharing Economy and Sweden
 
International Payments possibilities in Bitcoin
International Payments possibilities in BitcoinInternational Payments possibilities in Bitcoin
International Payments possibilities in Bitcoin
 
Legal formation and intellectual capital
Legal formation and intellectual capitalLegal formation and intellectual capital
Legal formation and intellectual capital
 
Swedish Sources of Financial Capital
Swedish Sources of Financial CapitalSwedish Sources of Financial Capital
Swedish Sources of Financial Capital
 
Launch of #SthlmFinTech
Launch of #SthlmFinTechLaunch of #SthlmFinTech
Launch of #SthlmFinTech
 

Último

Último (20)

Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 

Data, AI and Tokens: A Glimpse of What is to Come

  • 1. DATA, AI & TOKENS: A GLIMPSE OF WHAT’S TO COME
  • 3. AGENDA 1. What are “Big Data”? 2. Data and Data Science 3. Machine learning at scale 4. Still to come
  • 4. WHAT ARE BIG DATA? • Big Data is abroad term used to describe data sets that are large, complex, and cannot be addressed by traditional IT methodologies and applications (Davenport, 2013) • New technologies—both hardware and software—have had to be designed to manage the volume
  • 5.
  • 6. DIGITAL TRACE DATA Taxonomy of Digital Data Understanding different data types is crucial to correctly address problematic areas relating to the use and collection of digital trace data Data that we leave behind Content Data Metadata - User’s name and address - Substantial and personal: can be identified/linked to a person - Explicitly shared or traced through content shared on social media like Facebook - User’s IP address, time of login (data about data) - Strength is in scale: as companies can use it to recognise user patterns - Potentially problematic: if it reveals things that we don’t want to reveal, example presence of a mobile device at a protest in might reveal the identities of protesters Entrusted Data: content we post on medium not controlled by us (FB). We don’t control what firms do with our traces Incidental Data: data about us shared by others (tagged photos). We neither influence nor control our data traces Service Data: Information we provide to be able to use a service Disclosed Data: Content that we post online, but on a medium that we control, example blogs, limiting our data traces Behavioural Data: unintentionally shared; captured by services from our devices. Example, time spent on a site Derived Data: data inferred about us from other data. Example, our credit profiles built by firms using personal data
  • 7. DIGITAL TRACES • Make existing services more efficient • Create new services • Access (or create?) new markets
  • 8. “The loan amounts users are initially presented with currently tend to be either £111 or £265, although I have also achieved figures of £350 and £361. In my informal survey, those using Apple products (a Safari browser, or say an iPhone or an iPad) seemed to be most consistently offered £265. Although tests with some obscure browsers suggest that it is likely that it is less that you are ‘uprated’ by using Apple products, than you are ‘down rated’ by using less niche browsers like Firefox and Internet explorer.” (Deville 2013) “The firm has found that people who immediately shove the slider up to the maximum amount on offer, currently £400 for 30 days for a first-time applicant for a personal loan, are more likely than others to default.” (Pollock 2012)
  • 9.
  • 10. STRUCTURED VS UNSTRUCTURED DATA • Structured: clean, organised, in a database format. Has relational properties and can be divided into fields (e.g. what you have been working with in SQL) • Thought to be 5-10% of all data • Semi-structured: unstructured data that has some organisational properties that make it easier to query, but not enough to be considered structured (e.g. your CSV files) • Also around 5-10% of data • Unstructured data: no structure, no clear relational properties (e.g. images, multimedia, business documents) • Around 80% of all data
  • 11. AGENDA 1. What are “Big Data”? 2. Data and Data Science 3. Machine learning at scale 4. Still to come
  • 13. HAVE YOU EVER TRIED TO GET DATA?
  • 14.
  • 15.
  • 16. AGENDA 1. What are “Big Data”? 2. Data and Data Science 3. Machine learning at scale 4. Still to come
  • 17.
  • 18. “TRAINING” AN ALGO “A computer program is said to learn from experience (E) with some class of tasks (T) and a performance measure (P) if its performance at tasks in T as measured by P improves with E” Training data Feature Extraction Model ML Algorithm Test data Model (learnt during training phase) predictions
  • 19. TERMINOLOGY • Features: features or distinct traits that can be used to describe each item in a quantitative manner. • Sample: item(s) to process (e.g. classify). It can be a document, a picture, a sound, a video, a row in database or CSV file, or whatever you can describe with a fixed set of quantitative traits. • Feature extraction: simplifies samples into, e.g. vectors • Training data: data to discover potentially predictive relationships. • Test data: different data used to test the model built
  • 20. CATEGORIES • Supervised Learning • Unsupervised Learning • Semi-Supervised Learning • Reinforcement Learning
  • 21. SUPERVISED LEARNING • the correct classes of the training data are known Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 22. UNSUPERVISED LEARNING • the correct classes of the training data are not known Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 23. SEMI-SUPERVISED LEARNING • A Mix of Supervised and Unsupervised learning Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 24. REINFORCEMENT LEARNING • allows the machine or software agent to learn its behavior based on feedback from the environment. • This behavior can be learnt once and for all, or keep on adapting as time goes by. Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 27. IMPLICATIONS • Consent • Ownership • Decontextualisation • Third party services
  • 28. MANAGERIAL CHALLENGES • Leadership Set clear goals, define success, ask the right questions, be creative, create a vision, deal with stakeholders … • Talent management Obvious: Data scientists, computer scientists. Also: Those who can reframe questions so that data can answer them, design experiments, visualize and interpret data, speak the language of business. • Technology Commonly used: Hadoop. IT departments will need to adapt. • Decision making Bring people who understand the problem together with the relevant data. • Company culture Stop relying on hunches. Ask yourself ”What do we know?”, not ”What do we think?”
  • 29. RECOMMENDATIONS • Self-regulate • Be transparent / educate your customers • Need for clear rules around ownership • Public infrastructure? • Is data collection anti-competitive? • Trust?
  • 30. AGENDA 1. What are “Big Data”? 2. Data and Data Science 3. Machine learning at scale 4. Still to come
  • 32. ARTIFICIAL INTELLIGENCE • “ [The automation of] activities that we associate with human thinking, activities such as decision-making, problem solving, learning ...“ (Bellman, 1978) • "A field of study that seeks to explain and emulate intelligent behavior in terms of computational processes" (Schalkoff, 1990) • Turing Test: “Is a machine able to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human?”
  • 34.
  • 35.
  • 36.
  • 38. The Rise and Development of FinTech Crowds, Coins and Communities
  • 39. Dr. Claire Ingram Bogusz Stockholm School of Economics claire@clairebogusz.com @Claire_EBI slides.clairebogusz.com