SlideShare uma empresa Scribd logo
1 de 37
Data Science for Advanced
Dummies
Introduction to Big Data
What is Big Data?
What makes data, “Big” Data?
2
Big Data Definition
• No single standard definition…
“Big Data” is data whose scale, diversity, and complexity require new architecture,
techniques, algorithms, and analytics to manage it and extract value and hidden
knowledge from it…
3
Characteristics of Big Data:
1-Scale (Volume)
• Data Volume
• 44x increase from 2009 2020
• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
4
Exponential increase in
collected/generated data
Characteristics of Big Data:
2-Complexity (Varity)
• Various formats, types, and structures
• Text, numerical, images, audio, video, sequences, time
series, social media data, multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be generating/collecting many
types of data
5
To extract knowledge all these types of data need to
linked together
Characteristics of Big Data:
3-Speed (Velocity)
• Data is begin generated fast and need to be processed fast
• Online Data Analytics
• Late decisions  missing opportunities
• Examples
• E-Promotions: Based on your current location, your purchase history, what you like  send
promotions right now for store next to you
• Healthcare monitoring: sensors monitoring your activities and body  any abnormal
measurements require immediate reaction
6
Big Data: 3V’s
7
Some Make it 4V’s
8
Who’s Generating Big Data
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the
collected data in a timely manner and in a scalable fashion
9
What Technology Do We Have
For Big Data ??
10
11
Which Movie Do You
Like?
Designing a movie recommendation system
Can you describe the movie you would
like?
Recommender Systems
• Movie Problem: Find “Similar” movies to my taste.
• Movies have many “Features” – Western, Clint Eastwood, Tarantino, 90s,
• A viewer as preferences –”Features” – Likes ‘Western’; hates ‘content based
filtering movies’
Netflix Prize
From Wikipedia, the free encyclopedia
The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict
user ratings for films, based on previous ratings without any other information about the users or
films, i.e. without the users or the films being identified except by numbers assigned for the contest.
The competition was held by Netflix, an online DVD-rental service, and was open to anyone not
connected with Netflix (current and former employees, agents, close relatives of Netflix employees,
etc.) or a resident of Cuba, Iran, Syria, North Korea, Burma or Sudan.[1] On 21 September 2009, the
grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's
own algorithm for predicting ratings by 10.06%.[2]
A Highly Simple Solution
Comedy Action Blockbu
ster
…. … … … Is Tom Cruise
the Lead?
6 5 0 … … … … 1
7 8 1 … … … … 0
… … … … … … … …
Saurav
2
8
…
Saurav’s Score = .2*Comedy + .1*Action + 10*Blockbuster + …+ … -.9*Tom Cruise
Comedy Action Blockbu
ster
…. … … … Is Tom Cruise
the Lead?
2 8 0 … … … … 0
Saurav
7
Quiz #1
• Is google search a recommender systems?
Supervised Learning
Design an Accurate Vending Machine
This is a Classification Problem – This line is called the
Decision Boundary or Separating Hyper plane
Quiz #2
• Give an example where you think supervised learning is used –
• Hint – Spam vs. Ham in Emails
Some Common Supervised Algorithms
• Classification
• Decision Trees
• Random Forest
• Support Vector Machine
• Neural Network
• Logistic Regression
• Regression
• Linear Regression
• Non-linear Regression
• Logistic Regression
• Association Rule Learning
• Arules
• Even Sequence Analysis
In Action
• Handwriting Recognition System
• Classification
• Input?
• Output?
200 200 10 …
200 200 8 …
180 200 20 …
… … … …
6
Features Labels
Note the
similarity
Classification Algorithms Try to
Separate items into “Classes”
Demo
Quiz #3
• Is driverless cars a learning problem?
• What are the features?
• What is the label?
Unsupervised Learning
Flowers
Tetramerous flower of Ludwigia
octovalvis showing petals and
sepals
Sepal lengthSepal width Petal length Petal width
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2
5.4 3.9 1.7 0.4
4.6 3.4 1.4 0.3
5.0 3.4 1.5 0.2
4.4 2.9 1.4 0.2
4.9 3.1 1.5 0.1
5.4 3.7 1.5 0.2
Clustering
• Cluster: A collection/group of data objects/points
• similar (or related) to one another within the same group
• dissimilar (or unrelated) to the objects in other groups
• Cluster analysis
• find similarities between data according to characteristics underlying the data and
grouping similar data objects into clusters
• Unsupervised learning
• no predefined classes for a training data set
• Two general tasks: identify the “natural” clustering number and properly grouping
objects into “sensible” clusters
Plot
Quiz #4
• How many types (species) of flowers are there?
Can you see 3 species?
Examples of Unsupervised Learning
• Clustering
• Dimensionality Reduction
• Feature Extraction
• Self Organizing Maps
Quiz #5
• Which of the below are supervised and which are unsupervised
• Take a collection of 1000 essays written on the US Economy, and find a way to automatically
group these essays into a small number of groups of essays that are somehow "similar" or
"related".
• Examine a large collection of emails that are known to be spam email, to discover if there
are sub-types of spam mail.
• Given historical data of children‘s ages and heights, predict children's height as a function of
their age.
• Have a computer examine an audio clip of a piece of music, and classify whether or not
there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only musical
instruments (and no vocals).
• Given a set of news articles from many different news websites, find out what are the main
topics covered.
• Suppose you are working on weather prediction, and you would like to predict
whether or not it will be raining at 5pm tomorrow. You want to use a learning
algorithm for this. Would you treat this as a classification or a regression problem?
Where is Big Data???
Lets start from (Big) Data
• How do you design this system?
• How do you pay for this?
• How do you trust someone to do it
right?
• How expensive will such a system be?
I need Data. Good reusable data. High quality data. Else
all the smarts are waste.
Here comes BIG Data to help
• Image
• Audio
• Learning
• HUGE data sets
Thank you!

Mais conteúdo relacionado

Destaque

Quero trabalhar com big data data science, como faço-
Quero trabalhar com big data   data science, como faço-Quero trabalhar com big data   data science, como faço-
Quero trabalhar com big data data science, como faço-Alexandre Uehara
 
Data Science & Big Data, made in Switzerland
Data Science & Big Data, made in SwitzerlandData Science & Big Data, made in Switzerland
Data Science & Big Data, made in SwitzerlandThilo Stadelmann
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRaveen Perera
 
7 historical software bugs
 7 historical software bugs 7 historical software bugs
7 historical software bugsAlexandre Uehara
 
Europython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with PythonEuropython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with PythonJavier Arias Losada
 
Introduction of Machine Learning
Introduction of Machine LearningIntroduction of Machine Learning
Introduction of Machine LearningMohammad Hossain
 
TDC2016SP - Trilha Data Science
TDC2016SP - Trilha Data ScienceTDC2016SP - Trilha Data Science
TDC2016SP - Trilha Data Sciencetdc-globalcode
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learningKnoldus Inc.
 
Elastically scalable architectures with microservices. The end of the monolith?
Elastically scalable architectures with microservices. The end of the monolith?Elastically scalable architectures with microservices. The end of the monolith?
Elastically scalable architectures with microservices. The end of the monolith?Javier Arias Losada
 
[Eestec] Machine Learning online seminar 1, 12 2016
[Eestec] Machine Learning online seminar 1, 12 2016[Eestec] Machine Learning online seminar 1, 12 2016
[Eestec] Machine Learning online seminar 1, 12 2016Grigoris C
 
Introduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventureIntroduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventuremylittleadventure
 
Pybcn machine learning for dummies with python
Pybcn machine learning for dummies with pythonPybcn machine learning for dummies with python
Pybcn machine learning for dummies with pythonJavier Arias Losada
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine LearningCodeForFrankfurt
 
Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science VMware Tanzu
 
Machine learning for dummies - Azuges November 2016
Machine learning for dummies - Azuges November 2016Machine learning for dummies - Azuges November 2016
Machine learning for dummies - Azuges November 2016Carlos Landeras Martínez
 
Machine learning for dummies
Machine learning for dummiesMachine learning for dummies
Machine learning for dummiesAlexandre Uehara
 
Getting Started with Amazon Machine Learning
Getting Started with Amazon Machine LearningGetting Started with Amazon Machine Learning
Getting Started with Amazon Machine LearningAmazon Web Services
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningJames Ward
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Gabriel Moreira
 

Destaque (20)

ES6 metaprogramming unleashed
ES6 metaprogramming unleashedES6 metaprogramming unleashed
ES6 metaprogramming unleashed
 
Quero trabalhar com big data data science, como faço-
Quero trabalhar com big data   data science, como faço-Quero trabalhar com big data   data science, como faço-
Quero trabalhar com big data data science, como faço-
 
Data Science & Big Data, made in Switzerland
Data Science & Big Data, made in SwitzerlandData Science & Big Data, made in Switzerland
Data Science & Big Data, made in Switzerland
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
7 historical software bugs
 7 historical software bugs 7 historical software bugs
7 historical software bugs
 
Europython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with PythonEuropython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with Python
 
Introduction of Machine Learning
Introduction of Machine LearningIntroduction of Machine Learning
Introduction of Machine Learning
 
TDC2016SP - Trilha Data Science
TDC2016SP - Trilha Data ScienceTDC2016SP - Trilha Data Science
TDC2016SP - Trilha Data Science
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
Elastically scalable architectures with microservices. The end of the monolith?
Elastically scalable architectures with microservices. The end of the monolith?Elastically scalable architectures with microservices. The end of the monolith?
Elastically scalable architectures with microservices. The end of the monolith?
 
[Eestec] Machine Learning online seminar 1, 12 2016
[Eestec] Machine Learning online seminar 1, 12 2016[Eestec] Machine Learning online seminar 1, 12 2016
[Eestec] Machine Learning online seminar 1, 12 2016
 
Introduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventureIntroduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventure
 
Pybcn machine learning for dummies with python
Pybcn machine learning for dummies with pythonPybcn machine learning for dummies with python
Pybcn machine learning for dummies with python
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine Learning
 
Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science
 
Machine learning for dummies - Azuges November 2016
Machine learning for dummies - Azuges November 2016Machine learning for dummies - Azuges November 2016
Machine learning for dummies - Azuges November 2016
 
Machine learning for dummies
Machine learning for dummiesMachine learning for dummies
Machine learning for dummies
 
Getting Started with Amazon Machine Learning
Getting Started with Amazon Machine LearningGetting Started with Amazon Machine Learning
Getting Started with Amazon Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 

Semelhante a Data science for advanced dummies

Introduction to apache spark and machine learning
Introduction to apache spark and machine learningIntroduction to apache spark and machine learning
Introduction to apache spark and machine learningAwoyemi Ezekiel
 
Data Science 101
Data Science 101Data Science 101
Data Science 101ideatoipo
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningJeff Heaton
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmVaibhav Varshney
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellSri Ambati
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managersNitin T Bhat
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantLynne Thomas
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onwordSulman Ahmed
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
 
Business intelligence and data warehousing
Business intelligence and data warehousingBusiness intelligence and data warehousing
Business intelligence and data warehousingVaishnavi
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk KnowledgeKrishna Sankar
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introductionAdwait Bhave
 

Semelhante a Data science for advanced dummies (20)

Introduction to apache spark and machine learning
Introduction to apache spark and machine learningIntroduction to apache spark and machine learning
Introduction to apache spark and machine learning
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
machine learning
machine learningmachine learning
machine learning
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Summit EU Machine Learning
Summit EU Machine LearningSummit EU Machine Learning
Summit EU Machine Learning
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
 
DataScience_introduction.pdf
DataScience_introduction.pdfDataScience_introduction.pdf
DataScience_introduction.pdf
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
 
Business intelligence and data warehousing
Business intelligence and data warehousingBusiness intelligence and data warehousing
Business intelligence and data warehousing
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 

Último

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Último (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

Data science for advanced dummies

  • 1. Data Science for Advanced Dummies
  • 2. Introduction to Big Data What is Big Data? What makes data, “Big” Data? 2
  • 3. Big Data Definition • No single standard definition… “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it… 3
  • 4. Characteristics of Big Data: 1-Scale (Volume) • Data Volume • 44x increase from 2009 2020 • From 0.8 zettabytes to 35zb • Data volume is increasing exponentially 4 Exponential increase in collected/generated data
  • 5. Characteristics of Big Data: 2-Complexity (Varity) • Various formats, types, and structures • Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc… • Static data vs. streaming data • A single application can be generating/collecting many types of data 5 To extract knowledge all these types of data need to linked together
  • 6. Characteristics of Big Data: 3-Speed (Velocity) • Data is begin generated fast and need to be processed fast • Online Data Analytics • Late decisions  missing opportunities • Examples • E-Promotions: Based on your current location, your purchase history, what you like  send promotions right now for store next to you • Healthcare monitoring: sensors monitoring your activities and body  any abnormal measurements require immediate reaction 6
  • 8. Some Make it 4V’s 8
  • 9. Who’s Generating Big Data Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data) • The progress and innovation is no longer hindered by the ability to collect data • But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 9
  • 10. What Technology Do We Have For Big Data ?? 10
  • 11. 11
  • 12. Which Movie Do You Like? Designing a movie recommendation system
  • 13. Can you describe the movie you would like?
  • 14. Recommender Systems • Movie Problem: Find “Similar” movies to my taste. • Movies have many “Features” – Western, Clint Eastwood, Tarantino, 90s, • A viewer as preferences –”Features” – Likes ‘Western’; hates ‘content based filtering movies’ Netflix Prize From Wikipedia, the free encyclopedia The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users or the films being identified except by numbers assigned for the contest. The competition was held by Netflix, an online DVD-rental service, and was open to anyone not connected with Netflix (current and former employees, agents, close relatives of Netflix employees, etc.) or a resident of Cuba, Iran, Syria, North Korea, Burma or Sudan.[1] On 21 September 2009, the grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%.[2]
  • 15.
  • 16.
  • 17. A Highly Simple Solution Comedy Action Blockbu ster …. … … … Is Tom Cruise the Lead? 6 5 0 … … … … 1 7 8 1 … … … … 0 … … … … … … … … Saurav 2 8 … Saurav’s Score = .2*Comedy + .1*Action + 10*Blockbuster + …+ … -.9*Tom Cruise Comedy Action Blockbu ster …. … … … Is Tom Cruise the Lead? 2 8 0 … … … … 0 Saurav 7
  • 18. Quiz #1 • Is google search a recommender systems?
  • 19. Supervised Learning Design an Accurate Vending Machine This is a Classification Problem – This line is called the Decision Boundary or Separating Hyper plane
  • 20. Quiz #2 • Give an example where you think supervised learning is used – • Hint – Spam vs. Ham in Emails
  • 21. Some Common Supervised Algorithms • Classification • Decision Trees • Random Forest • Support Vector Machine • Neural Network • Logistic Regression • Regression • Linear Regression • Non-linear Regression • Logistic Regression • Association Rule Learning • Arules • Even Sequence Analysis
  • 22. In Action • Handwriting Recognition System • Classification • Input? • Output? 200 200 10 … 200 200 8 … 180 200 20 … … … … … 6 Features Labels
  • 23. Note the similarity Classification Algorithms Try to Separate items into “Classes”
  • 24. Demo
  • 25. Quiz #3 • Is driverless cars a learning problem? • What are the features? • What is the label?
  • 27. Flowers Tetramerous flower of Ludwigia octovalvis showing petals and sepals Sepal lengthSepal width Petal length Petal width 5.1 3.5 1.4 0.2 4.9 3.0 1.4 0.2 4.7 3.2 1.3 0.2 4.6 3.1 1.5 0.2 5.0 3.6 1.4 0.2 5.4 3.9 1.7 0.4 4.6 3.4 1.4 0.3 5.0 3.4 1.5 0.2 4.4 2.9 1.4 0.2 4.9 3.1 1.5 0.1 5.4 3.7 1.5 0.2
  • 28. Clustering • Cluster: A collection/group of data objects/points • similar (or related) to one another within the same group • dissimilar (or unrelated) to the objects in other groups • Cluster analysis • find similarities between data according to characteristics underlying the data and grouping similar data objects into clusters • Unsupervised learning • no predefined classes for a training data set • Two general tasks: identify the “natural” clustering number and properly grouping objects into “sensible” clusters
  • 29. Plot
  • 30. Quiz #4 • How many types (species) of flowers are there?
  • 31. Can you see 3 species?
  • 32. Examples of Unsupervised Learning • Clustering • Dimensionality Reduction • Feature Extraction • Self Organizing Maps
  • 33. Quiz #5 • Which of the below are supervised and which are unsupervised • Take a collection of 1000 essays written on the US Economy, and find a way to automatically group these essays into a small number of groups of essays that are somehow "similar" or "related". • Examine a large collection of emails that are known to be spam email, to discover if there are sub-types of spam mail. • Given historical data of children‘s ages and heights, predict children's height as a function of their age. • Have a computer examine an audio clip of a piece of music, and classify whether or not there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only musical instruments (and no vocals). • Given a set of news articles from many different news websites, find out what are the main topics covered. • Suppose you are working on weather prediction, and you would like to predict whether or not it will be raining at 5pm tomorrow. You want to use a learning algorithm for this. Would you treat this as a classification or a regression problem?
  • 34. Where is Big Data???
  • 35. Lets start from (Big) Data • How do you design this system? • How do you pay for this? • How do you trust someone to do it right? • How expensive will such a system be? I need Data. Good reusable data. High quality data. Else all the smarts are waste.
  • 36. Here comes BIG Data to help • Image • Audio • Learning • HUGE data sets