SlideShare a Scribd company logo
1 of 19
Download to read offline
Machine Learning for Big Data
Alexander Gammerman
Computer Learning Research Centre
Royal Holloway, University of London
Trends in Big Data
STFC/RUSI: Big Data for Security and Resillience
March 7th, 2014
1 / 19
Layout
1 Debunking the myth
2 Machine Learning (Data Analytics)
3 Trends in Machine Learning for Big Data
4 Conclusions
2 / 19
”Fashionable” pursuit
AI, Cybernetics, Neural Networks, Expert Systems,
Big Data?
Big Data, small data, any data – what we need is Data Analysis or
Data Analytics or Machine Learning
3 / 19
Machine Learning: what is it?
ML is intersection of Statistics and Computer Science.
Statistics deals with inferences to obtain valid conclusions from data under
various models and assumptions.
Computer Science considers what is computable, develops efficient
algorithms and concerns with data storage and manipulation.
ML takes the past data, ”learns”, tries to find some rules, regularities in
the data in order to make predictions for the future examples. Efficient
algorithms have to be developed to make valid predictions.
4 / 19
Computer Learning Research Centre (CLRC) at Royal
Holloway, University of London
Established in 1998 to develop machine learning theory, including design of
efficient algorithms for data analysis.
CLRC Fellows, including several prominent ones, such as: Vapnik and
Chervonenkis (the two founders of statistical learning theory), Shafer
(co-founder of the DempsterShafer theory), Rissanen (inventor of the
Minumum Description Length principle), Levin (one of the 3 founders of
the theory of NP-completeness, made fundamental contributions to
Kolmogorov complexity)
5 / 19
Recent years: explosion of interest in machine-learning methods, in
particular statistical learning theory. Statistical learning theory: similar
goals to statistical science, but
it is nonparametric and
concerned with the problem of prediction.
6 / 19
Problems and Current Techniques
Classical techniques: small scale, low-dimensional data. But conceptual
and computational difficulties for high-dimensional data. Validity of
predictions. Confidence measures. Online prediction.
Current techniques for dimensionality problem: Support Vector Machine
(Vapnik, 1995, 1998; Vapnik and Chervonenkis, 1974); Kernel Methods.
New technique for validity problem: Conformal Predictors.
7 / 19
Projects
Compact Descriptors for Automatic Target Identification (with
QinetiQ).
Statistical profiling of offenders (with the Home Office).
Material identification with atmosphere corrections (with Watefall
Solutions).
Unmixing spectra (with Qinetiq).
Anomaly detection (vehicles) (with Thales).
Fault Diagnosis (with Marconi Instruments).
8 / 19
Projects – cont’d
Abdominal Pain (with Western General Hospital, Edinburgh).
Ovarian Cancer (with Institute for Women’s Health, UCL).
Depression (with Institute of Psychiatry, Kings College)
Child Leukemia (with Royal London Hospital)
Heart Diseases ((with Institute for Women’s Health, UCL).
Analysis of microarrays (with Veterinary Laboratory Agency –
DEFRA)
Protein-Protein Interaction (EU project)
9 / 19
How much data do we need to answer our questions?
Big Data: V 3
Volume: Gigabyte(109); Terabyte (1012); Petabyte (1015); Exabyte
(1018); Zettabyte (1021).
Variety: structured, semi-structured, unstructured; text, image, audio,
video.
Velocity: dynamic; time-varying, etc.
Plus: high-dimensionality
But: if the answer is a Zettabyte what is the question?
The global data supply reached 2.8 zettabytes (ZB) in 2012 - or 2.8
trillion GB - but just 0.5% of this is used for analysis, according to the
Digital Universe Study. Volumes of data are projected to reach 40ZB by
2020, or 5,247 GB per person.
10 / 19
We don’t need the big data per se - we need to have a problem first and
then decide how much data we need to solve the problem.
If a child wants to learn a concept of a car, he/she doesn’t need to have 1
million or billion cars to learn the concept - enough 10 or 100.
If we want to predict digits, we can learn on the first 100 or 1000 digits
and confidently with high accuracy, identify the next one.
11 / 19
Figure : USPS data
12 / 19
Figure : Conformal Predictors on USPS data: Online cumulative multiple
predictions at different confidence levels (”Hedging predictions in Machine
Learning” by A.Gammerman and V.Vovk The Computer Journal (2007) 50 (2):
151-163).
13 / 19
In fact, there is a well-known concept in machine learning. If in the past
people thought that the larger training set of data we have the more
accurate results can be obtained. But the founders of statistical learning
theory, V.Vapnik and A.Cherovnenkis, showed that it is not just the length
of the training data - it is actually another charachterisitcs called
”capacity” that is more important.
14 / 19
Trends in Machine Learning for Big Data
How do we make machine learning algorithms scale to large datasets?
There are two main approaches: (1) developing parallelizable ML
algorithms and integrating them with large parallel systems and (2)
developing more efficient algorithms.
The data growth is driving the need for parallel and online algorithms and
models that can handle this ”Big Data”.
Need to explore the computational foundations associated with performing
these analyses in the context of parallel and cloud architectures.
15 / 19
Large-scale modeling techniques and algorithms include
transductive and inductive models,
online compression models (extension of conformal predictors),
graphical models,
deep learning and semi-supervised learning algorithms,
clustering algorithms,
parallel learning algorithms.
The computational techniques provide a basic foundation in large-scale
programming, ranging from the basic ”parfor” to parallel abstractions,
such as MapReduce (Hadoop) and GraphLab.
16 / 19
Transduction
Data General
Knowledgelearning
Particular
(future examples)
(past examples)
inductive
transduction deduction
Figure : Induction and Transduction [V.Vapnik, 1995]
17 / 19
Why use conformal predictions?
Why, after 100 years of research in statistics, do we need yet another
method of prediction?
It is simple and rigorous.
Given any of a wide range of learning/statistical prediction methods,
conformal prediction can be used as a wrapper to provide a measure
of confidence.
It is valid under weak assumptions.
It limits the fraction of prediction mistakes from the start. (Crudely, a
predictor can either make a prediction, or else say dont know, possibly
in a graded way, such as giving a wide prediction interval.)
It works in practice.
18 / 19
Conclusions
”It took Deep Thought 7.5 million years to answer the ultimate question.
As nobody knew what the ultimate question to Life, The Universe and
Everything actually was, nobody knows what to make of the answer (42)”.
Nowdays, as John Poppelaars noticed, many people think that the Big
Data would help to find the ultimate question.
But I already know that it is not Big Data, and the answer is not 42, but
the Machine Learning.
19 / 19

More Related Content

What's hot

Industrial training ppt
Industrial training pptIndustrial training ppt
Industrial training pptHRJEETSINGH
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Frank Kienle
 
Mba lab with CO
Mba lab with COMba lab with CO
Mba lab with COcsehod2
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Intel® Software
 
Parametric and nonparametric
Parametric and nonparametricParametric and nonparametric
Parametric and nonparametricSivapriyaS12
 
Practitioner Integration in Computational Thinking Education
Practitioner Integration in Computational Thinking EducationPractitioner Integration in Computational Thinking Education
Practitioner Integration in Computational Thinking EducationMartin Ebner
 
Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Rehan Guha
 

What's hot (9)

Industrial training ppt
Industrial training pptIndustrial training ppt
Industrial training ppt
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science
 
Mba lab with CO
Mba lab with COMba lab with CO
Mba lab with CO
 
L11. The Future of Machine Learning
L11. The Future of Machine LearningL11. The Future of Machine Learning
L11. The Future of Machine Learning
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
 
Parametric and nonparametric
Parametric and nonparametricParametric and nonparametric
Parametric and nonparametric
 
Practitioner Integration in Computational Thinking Education
Practitioner Integration in Computational Thinking EducationPractitioner Integration in Computational Thinking Education
Practitioner Integration in Computational Thinking Education
 
Academic paper - Final
Academic paper - FinalAcademic paper - Final
Academic paper - Final
 
Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)
 

Similar to Alexander Gammerman - Machine Learning for Big Data

Rapid COVID-19 Diagnosis Using Deep Learning of the Computerized Tomography ...
Rapid COVID-19 Diagnosis Using Deep Learning  of the Computerized Tomography ...Rapid COVID-19 Diagnosis Using Deep Learning  of the Computerized Tomography ...
Rapid COVID-19 Diagnosis Using Deep Learning of the Computerized Tomography ...Dr. Amir Mosavi, PhD., P.Eng.
 
machine_learning_section1_ebook.pdf
machine_learning_section1_ebook.pdfmachine_learning_section1_ebook.pdf
machine_learning_section1_ebook.pdfagfi
 
Case study on machine learning
Case study on machine learningCase study on machine learning
Case study on machine learningHarshitBarde
 
Detecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streamsDetecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streamsfatimabenjelloun1
 
Guy Riese Literature Review
Guy Riese Literature ReviewGuy Riese Literature Review
Guy Riese Literature Reviewguyrie
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learningUmmeSalmaM1
 
Ml topic1 a
Ml topic1 aMl topic1 a
Ml topic1 abosycs1
 
Machine Learning Final presentation
Machine Learning Final presentation Machine Learning Final presentation
Machine Learning Final presentation AyanaRukasar
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 
Quant university MRM and machine learning
Quant university MRM and machine learningQuant university MRM and machine learning
Quant university MRM and machine learningQuantUniversity
 
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion ApproachEnhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion ApproachIJCI JOURNAL
 
Machine learning - session 1
Machine learning - session 1Machine learning - session 1
Machine learning - session 1Luis Borbon
 
Machine Learning Overview.pptx
Machine Learning Overview.pptxMachine Learning Overview.pptx
Machine Learning Overview.pptxRushikeshChikane2
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsIstituto nazionale di statistica
 

Similar to Alexander Gammerman - Machine Learning for Big Data (20)

Rapid COVID-19 Diagnosis Using Deep Learning of the Computerized Tomography ...
Rapid COVID-19 Diagnosis Using Deep Learning  of the Computerized Tomography ...Rapid COVID-19 Diagnosis Using Deep Learning  of the Computerized Tomography ...
Rapid COVID-19 Diagnosis Using Deep Learning of the Computerized Tomography ...
 
machine_learning_section1_ebook.pdf
machine_learning_section1_ebook.pdfmachine_learning_section1_ebook.pdf
machine_learning_section1_ebook.pdf
 
Case study on machine learning
Case study on machine learningCase study on machine learning
Case study on machine learning
 
Detecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streamsDetecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streams
 
Guy Riese Literature Review
Guy Riese Literature ReviewGuy Riese Literature Review
Guy Riese Literature Review
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
 
Research Proposal
Research ProposalResearch Proposal
Research Proposal
 
International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI), International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI),
 
Ml topic1 a
Ml topic1 aMl topic1 a
Ml topic1 a
 
PREDICT 422 - Module 1.pptx
PREDICT 422 - Module 1.pptxPREDICT 422 - Module 1.pptx
PREDICT 422 - Module 1.pptx
 
Machine Learning Final presentation
Machine Learning Final presentation Machine Learning Final presentation
Machine Learning Final presentation
 
Intro to AI.pptx
Intro to AI.pptxIntro to AI.pptx
Intro to AI.pptx
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
Quant university MRM and machine learning
Quant university MRM and machine learningQuant university MRM and machine learning
Quant university MRM and machine learning
 
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion ApproachEnhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
 
Machine learning - session 1
Machine learning - session 1Machine learning - session 1
Machine learning - session 1
 
Machine Learning Overview.pptx
Machine Learning Overview.pptxMachine Learning Overview.pptx
Machine Learning Overview.pptx
 
AI Presentation 1
AI Presentation 1AI Presentation 1
AI Presentation 1
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statistics
 

More from Royal United Services Institute for Defence and Security Studies

More from Royal United Services Institute for Defence and Security Studies (20)

NCSC Speaker
NCSC Speaker NCSC Speaker
NCSC Speaker
 
Dr Stuart Eves
Dr Stuart Eves   Dr Stuart Eves
Dr Stuart Eves
 
Professor Steve Roberts
Professor Steve RobertsProfessor Steve Roberts
Professor Steve Roberts
 
Air Vice Marshal Stubbs
Air Vice Marshal StubbsAir Vice Marshal Stubbs
Air Vice Marshal Stubbs
 
Air Marshal Leo Davies
Air Marshal Leo DaviesAir Marshal Leo Davies
Air Marshal Leo Davies
 
Colonel (Retd) Thomas X Hammes USMC
Colonel (Retd) Thomas X Hammes USMC Colonel (Retd) Thomas X Hammes USMC
Colonel (Retd) Thomas X Hammes USMC
 
Professor John Louth
Professor John Louth Professor John Louth
Professor John Louth
 
Clive Wright
Clive Wright Clive Wright
Clive Wright
 
Andrew Wilson
Andrew WilsonAndrew Wilson
Andrew Wilson
 
Dr Christina Balis
Dr Christina BalisDr Christina Balis
Dr Christina Balis
 
Mr Simon Fovargue - RUSI Land Warfare Conference 2015
Mr Simon Fovargue - RUSI Land Warfare Conference 2015Mr Simon Fovargue - RUSI Land Warfare Conference 2015
Mr Simon Fovargue - RUSI Land Warfare Conference 2015
 
Mr Claes-Peter Cederlöf - RUSI Land Warfare Conference 2015
Mr Claes-Peter Cederlöf - RUSI Land Warfare Conference 2015Mr Claes-Peter Cederlöf - RUSI Land Warfare Conference 2015
Mr Claes-Peter Cederlöf - RUSI Land Warfare Conference 2015
 
Lieutenant General Timothy Evans - RUSI Land Warfare Conference 2015
Lieutenant General Timothy Evans - RUSI Land Warfare Conference 2015Lieutenant General Timothy Evans - RUSI Land Warfare Conference 2015
Lieutenant General Timothy Evans - RUSI Land Warfare Conference 2015
 
Major General William Hix - RUSI Land Warfare Conference 2015
Major General William Hix - RUSI Land Warfare Conference 2015Major General William Hix - RUSI Land Warfare Conference 2015
Major General William Hix - RUSI Land Warfare Conference 2015
 
Brigadier Richard Toomey - RUSI Land Warfare Conference 2015
Brigadier Richard Toomey - RUSI Land Warfare Conference 2015Brigadier Richard Toomey - RUSI Land Warfare Conference 2015
Brigadier Richard Toomey - RUSI Land Warfare Conference 2015
 
Mr Allan Mallinson - RUSI Land Warfare Conference 2015
Mr Allan Mallinson - RUSI Land Warfare Conference 2015Mr Allan Mallinson - RUSI Land Warfare Conference 2015
Mr Allan Mallinson - RUSI Land Warfare Conference 2015
 
Professor Malcolm Chalmers
Professor Malcolm ChalmersProfessor Malcolm Chalmers
Professor Malcolm Chalmers
 
Professor Trevor taylor
Professor Trevor taylorProfessor Trevor taylor
Professor Trevor taylor
 
Professor Peter Dutton
Professor Peter DuttonProfessor Peter Dutton
Professor Peter Dutton
 
Michael Keegan
Michael KeeganMichael Keegan
Michael Keegan
 

Recently uploaded

PEO AVRIL POUR LA COMMUNE D'ORGERUS INFO
PEO AVRIL POUR LA COMMUNE D'ORGERUS INFOPEO AVRIL POUR LA COMMUNE D'ORGERUS INFO
PEO AVRIL POUR LA COMMUNE D'ORGERUS INFOMAIRIEORGERUS
 
Press Freedom in Europe - Time to turn the tide.
Press Freedom in Europe - Time to turn the tide.Press Freedom in Europe - Time to turn the tide.
Press Freedom in Europe - Time to turn the tide.Christina Parmionova
 
No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...
No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...
No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...narwatsonia7
 
call girls in Mayapuri DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Mayapuri DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Mayapuri DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Mayapuri DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
How to design healthy team dynamics to deliver successful digital projects.pptx
How to design healthy team dynamics to deliver successful digital projects.pptxHow to design healthy team dynamics to deliver successful digital projects.pptx
How to design healthy team dynamics to deliver successful digital projects.pptxTechSoupConnectLondo
 
办理约克大学毕业证成绩单|购买加拿大文凭证书
办理约克大学毕业证成绩单|购买加拿大文凭证书办理约克大学毕业证成绩单|购买加拿大文凭证书
办理约克大学毕业证成绩单|购买加拿大文凭证书zdzoqco
 
call girls in sector 22 Gurgaon 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in sector 22 Gurgaon  🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in sector 22 Gurgaon  🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in sector 22 Gurgaon 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Enhancing Indigenous Peoples' right to self-determination in the context of t...
Enhancing Indigenous Peoples' right to self-determination in the context of t...Enhancing Indigenous Peoples' right to self-determination in the context of t...
Enhancing Indigenous Peoples' right to self-determination in the context of t...Christina Parmionova
 
Call Girls Near Surya International Hotel New Delhi 9873777170
Call Girls Near Surya International Hotel New Delhi 9873777170Call Girls Near Surya International Hotel New Delhi 9873777170
Call Girls Near Surya International Hotel New Delhi 9873777170Sonam Pathan
 
2023 Ecological Profile of Ilocos Norte.pdf
2023 Ecological Profile of Ilocos Norte.pdf2023 Ecological Profile of Ilocos Norte.pdf
2023 Ecological Profile of Ilocos Norte.pdfilocosnortegovph
 
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdfMonastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdfCharlynTorres1
 
call girls in West Patel Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in West Patel Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in West Patel Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in West Patel Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...saminamagar
 
In credit? Assessing where Universal Credit’s long rollout has left the benef...
In credit? Assessing where Universal Credit’s long rollout has left the benef...In credit? Assessing where Universal Credit’s long rollout has left the benef...
In credit? Assessing where Universal Credit’s long rollout has left the benef...ResolutionFoundation
 
Jewish Efforts to Influence American Immigration Policy in the Years Before t...
Jewish Efforts to Influence American Immigration Policy in the Years Before t...Jewish Efforts to Influence American Immigration Policy in the Years Before t...
Jewish Efforts to Influence American Immigration Policy in the Years Before t...yalehistoricalreview
 
Yellow is My Favorite Color By Annabelle.pdf
Yellow is My Favorite Color By Annabelle.pdfYellow is My Favorite Color By Annabelle.pdf
Yellow is My Favorite Color By Annabelle.pdfAmir Saranga
 
Earth Day 2024 - AMC "COMMON GROUND'' movie night.
Earth Day 2024 - AMC "COMMON GROUND'' movie night.Earth Day 2024 - AMC "COMMON GROUND'' movie night.
Earth Day 2024 - AMC "COMMON GROUND'' movie night.Christina Parmionova
 
Action Toolkit - Earth Day 2024 - April 22nd.
Action Toolkit - Earth Day 2024 - April 22nd.Action Toolkit - Earth Day 2024 - April 22nd.
Action Toolkit - Earth Day 2024 - April 22nd.Christina Parmionova
 
call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
High Class Call Girls Bangalore Komal 7001305949 Independent Escort Service B...
High Class Call Girls Bangalore Komal 7001305949 Independent Escort Service B...High Class Call Girls Bangalore Komal 7001305949 Independent Escort Service B...
High Class Call Girls Bangalore Komal 7001305949 Independent Escort Service B...narwatsonia7
 

Recently uploaded (20)

PEO AVRIL POUR LA COMMUNE D'ORGERUS INFO
PEO AVRIL POUR LA COMMUNE D'ORGERUS INFOPEO AVRIL POUR LA COMMUNE D'ORGERUS INFO
PEO AVRIL POUR LA COMMUNE D'ORGERUS INFO
 
Press Freedom in Europe - Time to turn the tide.
Press Freedom in Europe - Time to turn the tide.Press Freedom in Europe - Time to turn the tide.
Press Freedom in Europe - Time to turn the tide.
 
No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...
No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...
No.1 Call Girls in Basavanagudi ! 7001305949 ₹2999 Only and Free Hotel Delive...
 
call girls in Mayapuri DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Mayapuri DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Mayapuri DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Mayapuri DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
How to design healthy team dynamics to deliver successful digital projects.pptx
How to design healthy team dynamics to deliver successful digital projects.pptxHow to design healthy team dynamics to deliver successful digital projects.pptx
How to design healthy team dynamics to deliver successful digital projects.pptx
 
办理约克大学毕业证成绩单|购买加拿大文凭证书
办理约克大学毕业证成绩单|购买加拿大文凭证书办理约克大学毕业证成绩单|购买加拿大文凭证书
办理约克大学毕业证成绩单|购买加拿大文凭证书
 
call girls in sector 22 Gurgaon 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in sector 22 Gurgaon  🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in sector 22 Gurgaon  🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in sector 22 Gurgaon 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Enhancing Indigenous Peoples' right to self-determination in the context of t...
Enhancing Indigenous Peoples' right to self-determination in the context of t...Enhancing Indigenous Peoples' right to self-determination in the context of t...
Enhancing Indigenous Peoples' right to self-determination in the context of t...
 
Call Girls Near Surya International Hotel New Delhi 9873777170
Call Girls Near Surya International Hotel New Delhi 9873777170Call Girls Near Surya International Hotel New Delhi 9873777170
Call Girls Near Surya International Hotel New Delhi 9873777170
 
2023 Ecological Profile of Ilocos Norte.pdf
2023 Ecological Profile of Ilocos Norte.pdf2023 Ecological Profile of Ilocos Norte.pdf
2023 Ecological Profile of Ilocos Norte.pdf
 
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdfMonastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
 
call girls in West Patel Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in West Patel Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in West Patel Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in West Patel Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
 
In credit? Assessing where Universal Credit’s long rollout has left the benef...
In credit? Assessing where Universal Credit’s long rollout has left the benef...In credit? Assessing where Universal Credit’s long rollout has left the benef...
In credit? Assessing where Universal Credit’s long rollout has left the benef...
 
Jewish Efforts to Influence American Immigration Policy in the Years Before t...
Jewish Efforts to Influence American Immigration Policy in the Years Before t...Jewish Efforts to Influence American Immigration Policy in the Years Before t...
Jewish Efforts to Influence American Immigration Policy in the Years Before t...
 
Yellow is My Favorite Color By Annabelle.pdf
Yellow is My Favorite Color By Annabelle.pdfYellow is My Favorite Color By Annabelle.pdf
Yellow is My Favorite Color By Annabelle.pdf
 
Earth Day 2024 - AMC "COMMON GROUND'' movie night.
Earth Day 2024 - AMC "COMMON GROUND'' movie night.Earth Day 2024 - AMC "COMMON GROUND'' movie night.
Earth Day 2024 - AMC "COMMON GROUND'' movie night.
 
Action Toolkit - Earth Day 2024 - April 22nd.
Action Toolkit - Earth Day 2024 - April 22nd.Action Toolkit - Earth Day 2024 - April 22nd.
Action Toolkit - Earth Day 2024 - April 22nd.
 
call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
High Class Call Girls Bangalore Komal 7001305949 Independent Escort Service B...
High Class Call Girls Bangalore Komal 7001305949 Independent Escort Service B...High Class Call Girls Bangalore Komal 7001305949 Independent Escort Service B...
High Class Call Girls Bangalore Komal 7001305949 Independent Escort Service B...
 

Alexander Gammerman - Machine Learning for Big Data

  • 1. Machine Learning for Big Data Alexander Gammerman Computer Learning Research Centre Royal Holloway, University of London Trends in Big Data STFC/RUSI: Big Data for Security and Resillience March 7th, 2014 1 / 19
  • 2. Layout 1 Debunking the myth 2 Machine Learning (Data Analytics) 3 Trends in Machine Learning for Big Data 4 Conclusions 2 / 19
  • 3. ”Fashionable” pursuit AI, Cybernetics, Neural Networks, Expert Systems, Big Data? Big Data, small data, any data – what we need is Data Analysis or Data Analytics or Machine Learning 3 / 19
  • 4. Machine Learning: what is it? ML is intersection of Statistics and Computer Science. Statistics deals with inferences to obtain valid conclusions from data under various models and assumptions. Computer Science considers what is computable, develops efficient algorithms and concerns with data storage and manipulation. ML takes the past data, ”learns”, tries to find some rules, regularities in the data in order to make predictions for the future examples. Efficient algorithms have to be developed to make valid predictions. 4 / 19
  • 5. Computer Learning Research Centre (CLRC) at Royal Holloway, University of London Established in 1998 to develop machine learning theory, including design of efficient algorithms for data analysis. CLRC Fellows, including several prominent ones, such as: Vapnik and Chervonenkis (the two founders of statistical learning theory), Shafer (co-founder of the DempsterShafer theory), Rissanen (inventor of the Minumum Description Length principle), Levin (one of the 3 founders of the theory of NP-completeness, made fundamental contributions to Kolmogorov complexity) 5 / 19
  • 6. Recent years: explosion of interest in machine-learning methods, in particular statistical learning theory. Statistical learning theory: similar goals to statistical science, but it is nonparametric and concerned with the problem of prediction. 6 / 19
  • 7. Problems and Current Techniques Classical techniques: small scale, low-dimensional data. But conceptual and computational difficulties for high-dimensional data. Validity of predictions. Confidence measures. Online prediction. Current techniques for dimensionality problem: Support Vector Machine (Vapnik, 1995, 1998; Vapnik and Chervonenkis, 1974); Kernel Methods. New technique for validity problem: Conformal Predictors. 7 / 19
  • 8. Projects Compact Descriptors for Automatic Target Identification (with QinetiQ). Statistical profiling of offenders (with the Home Office). Material identification with atmosphere corrections (with Watefall Solutions). Unmixing spectra (with Qinetiq). Anomaly detection (vehicles) (with Thales). Fault Diagnosis (with Marconi Instruments). 8 / 19
  • 9. Projects – cont’d Abdominal Pain (with Western General Hospital, Edinburgh). Ovarian Cancer (with Institute for Women’s Health, UCL). Depression (with Institute of Psychiatry, Kings College) Child Leukemia (with Royal London Hospital) Heart Diseases ((with Institute for Women’s Health, UCL). Analysis of microarrays (with Veterinary Laboratory Agency – DEFRA) Protein-Protein Interaction (EU project) 9 / 19
  • 10. How much data do we need to answer our questions? Big Data: V 3 Volume: Gigabyte(109); Terabyte (1012); Petabyte (1015); Exabyte (1018); Zettabyte (1021). Variety: structured, semi-structured, unstructured; text, image, audio, video. Velocity: dynamic; time-varying, etc. Plus: high-dimensionality But: if the answer is a Zettabyte what is the question? The global data supply reached 2.8 zettabytes (ZB) in 2012 - or 2.8 trillion GB - but just 0.5% of this is used for analysis, according to the Digital Universe Study. Volumes of data are projected to reach 40ZB by 2020, or 5,247 GB per person. 10 / 19
  • 11. We don’t need the big data per se - we need to have a problem first and then decide how much data we need to solve the problem. If a child wants to learn a concept of a car, he/she doesn’t need to have 1 million or billion cars to learn the concept - enough 10 or 100. If we want to predict digits, we can learn on the first 100 or 1000 digits and confidently with high accuracy, identify the next one. 11 / 19
  • 12. Figure : USPS data 12 / 19
  • 13. Figure : Conformal Predictors on USPS data: Online cumulative multiple predictions at different confidence levels (”Hedging predictions in Machine Learning” by A.Gammerman and V.Vovk The Computer Journal (2007) 50 (2): 151-163). 13 / 19
  • 14. In fact, there is a well-known concept in machine learning. If in the past people thought that the larger training set of data we have the more accurate results can be obtained. But the founders of statistical learning theory, V.Vapnik and A.Cherovnenkis, showed that it is not just the length of the training data - it is actually another charachterisitcs called ”capacity” that is more important. 14 / 19
  • 15. Trends in Machine Learning for Big Data How do we make machine learning algorithms scale to large datasets? There are two main approaches: (1) developing parallelizable ML algorithms and integrating them with large parallel systems and (2) developing more efficient algorithms. The data growth is driving the need for parallel and online algorithms and models that can handle this ”Big Data”. Need to explore the computational foundations associated with performing these analyses in the context of parallel and cloud architectures. 15 / 19
  • 16. Large-scale modeling techniques and algorithms include transductive and inductive models, online compression models (extension of conformal predictors), graphical models, deep learning and semi-supervised learning algorithms, clustering algorithms, parallel learning algorithms. The computational techniques provide a basic foundation in large-scale programming, ranging from the basic ”parfor” to parallel abstractions, such as MapReduce (Hadoop) and GraphLab. 16 / 19
  • 17. Transduction Data General Knowledgelearning Particular (future examples) (past examples) inductive transduction deduction Figure : Induction and Transduction [V.Vapnik, 1995] 17 / 19
  • 18. Why use conformal predictions? Why, after 100 years of research in statistics, do we need yet another method of prediction? It is simple and rigorous. Given any of a wide range of learning/statistical prediction methods, conformal prediction can be used as a wrapper to provide a measure of confidence. It is valid under weak assumptions. It limits the fraction of prediction mistakes from the start. (Crudely, a predictor can either make a prediction, or else say dont know, possibly in a graded way, such as giving a wide prediction interval.) It works in practice. 18 / 19
  • 19. Conclusions ”It took Deep Thought 7.5 million years to answer the ultimate question. As nobody knew what the ultimate question to Life, The Universe and Everything actually was, nobody knows what to make of the answer (42)”. Nowdays, as John Poppelaars noticed, many people think that the Big Data would help to find the ultimate question. But I already know that it is not Big Data, and the answer is not 42, but the Machine Learning. 19 / 19