SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
Big Data



Gaetan Lion
April 5, 2013
                1
Table of Content

1)   Big Data trends.
2)   How Big is your Data?
3)   Big Data Potential.
4)   Big technologies. New databases.
5)   Big quantitative methods. New stats.
6)   Big Data temperaments.
7)   Is Big always better?



                                            2
1) Big Data Trends




                     3
Cost of Data storage has dropped




                                   4
Social Media (Facebook & Twitter) has
                      grown exponentially
             Facebook vs Twitter # Active Users in 000
                       exponential growth                Facebook started
 1,200,000                                               in Feb 2004. Has
                                                         1 billion active
 1,000,000
                                                         users.
  800,000


  600,000                                                Twitter started in
                                                         March 2006.
  400,000
                                                         Has 500 million
  200,000                                                users.

        0
     Ap 8




     Ap 9


     O 9


     Ap 0


     O 0


     Ap 1


     O 1


     Ap 2


     O 2


           13
     O 8
     Ja 8




     Ja 9



      Ju 0




      Ju 1


     Ja 1



      Ju 2


     Ja 2
      Ju 8




      Ju 9




     Ja 0
        l-0




        l-1




        l-1



           1


        l-1
           0
        r-0

        l-0

          -0

           0
        r-0



          -0

           1
        r-1



         -1

           1
        r-1



          -1


        r-1



          -1
       n-




       n-
       n-




       n-




       n-




       n-
       ct




       ct




       ct




       ct
       ct
     Ja




                              Facebook   Twitter



Social networks are creating a huge live Unstructured Data.           5
Unstructured Data is taking over…




                                    6
2) How Big is your Data?

•   How Tall is it? How large is your sample (rows)?
•   How Wide is it? How many variables (columns)?
•   What is its Velocity? How frequently is it updated?
•   Does it include unstructured data (documents,
    emails, Social Media)?




                                                     7
3) Big Data Potential




                        8
4) Big Technologies.
   New Databases




                       9
10
Database: Structured vs Unstructured

                   Database         Database          Database          Reporting
   Data Type                          type
                   language                           structure           tool

Structured.        SQL                               Data Warehouse
                                    Relational                        Oracle Essbase
Customers,         structured                        Data Marts
                                    database                          & IBM Cognos
transactions,      query language                                                       Reporting
numbers in rows.                                                                        Business
                                                                                       Intelligence
                                                                        Hadoop
                                                                       Connectors
Unstructured.      NoSQL            Non-relational      Hadoop
Social Media,      not only SQL     database
Text documents,
Web services




                                                                                          11
5) Big quantitative methods.
         New Stats




                               12
New Stats Map
                                        A/B Testing
                                        (hypothesis testing)

                 Statistics &           Regression             Spatial Analysis
                 Regression

                                        Time Series            Signal Processing
                                        Analysis
Predictive
Analytics                               Association
                                        Rule Learning

             Data Mining &              Cluster Analysis
             Machine Learning
             (formerly Artificial       Classification
             Intelligence)
                                                               Pattern Recognition

                                        Neural Networks
                                                               Optimization          Genetic Algorithms

                                        Natural Language
                                                               Sentiment Analysis
                                        Processing


                                                                                               13
Definitions. Part I
Association Rule Learning: method to uncover interesting relationships
by generating and testing possible rules. One application is “market
basket analysis”, where a retailer figures out what products are
frequently bought together. A cited example is that shoppers who buy
diapers often buy beer.
Classification: identifies the categories in which new data belongs,
based on an existing data set grouped in predefined categories. It
differs from Cluster Analysis that starts without predefined categories.
Genetic algorithms: an optimization method inspired by the “survival of
the fittest” process. Potential solutions are encoded as “chromosomes”
that can combine and mutate. The chromosomes are selected for
survival within a modeled “environment.” Examples: optimizing the
performance of an investment portfolio.


                                                                           14
Definitions. Part II
Natural language processing (NLP): it uses algorithms to analyze text data.
 Sentiment Analysis is a common application. It measures customers’
reaction to a product campaign by analyzing social media.
Neural networks: models inspired by the workings of neurons and
synapses within the brain. Used for finding nonlinear patterns. They can
be used for Pattern recognition and Optimization. Examples of neural
network applications include identifying customers that may leave and
identifying fraudulent insurance claims.
Signal processing: an electrical engineering method to analyze signals
(radio, etc…) and discern between signal and noise. It is used to extract
the signal from the noise from a set of less precise data [Signal Detection
Theory].




                                                                       15
Definitions. Part III
Spatial Analysis: it analyzes geographic location encoded within
the data. The information comes from GPS. Applications
include spatial regression to figure a consumer willingness to
purchase a product given his location.




                                                                   16
6) Big Data Temperaments




Source: Harvard Business Review, April 2012 by Shvetank Shah, Andrew Horne
and Jaime Capella.

                                                                             17
7) Is Big always better?




                           18
No! says Nate Silver


•“I came to realize that prediction in the era of Big Data was
not going very well.”
•“If the quantity of information is increasing [exponentially]…
Most of it is just noise.”
•He refers to John P. Ioannidis 2005
paper: “Why Most Published
Research Findings are False.”
2/3ds of scientific papers’ results
can’t be replicated!

“… numbers have no way of speaking for
themselves. We speak for them.”
                                                              19
Nate’s targets

• Political pundits. Their “intuitive” election predictions have
  been disastrous. Granted, it was not because of Big Data
  but instead No Data. He showed them how to do it using
  Small Data (polls with samples < 1,000);
• Economists forecasters. They have used Big Data with
  poor results. The majority of them can’t forecast a
  recession already underway. ECRI predicted with certainty
  a double dip recession in 2011 using tens of variables they
  did not understand. Instead, the economy improved;
• Stock market & financial market forecasters. Similar
  performance as economists forecasters;
• Earthquake forecasting. The field is not well understood.

  “… Statistical inferences are much stronger when backed
  up by theory… about their root causes.”               20
No! says Vincent Granville



• Big Data is huge, but information is very sparse;
• Storing and processing the entire data is very inefficient;
• You can do better by smartly sampling only 5% of the
  data;


You don’t need Big Data, you need Smart Data.


                                                           21
Yes! Says Chris Anderson


     • He quotes Peter Norvig, Google’s research director: “All models
     are wrong, and increasingly you can succeed without them.”
     • “… with massive data, [the scientific method] is becoming
     obsolete.”
     • “We can throw the numbers into the biggest computing clusters …
     and let statistical algorithms find patterns where science cannot.”
     He mentions examples such as J.Craig Venter gene sequencing,
     Google Search, and Google Translator, among other successes.

“With enough data, the numbers speak for themselves.”

“Correlation supersedes causation, and science can advance without
                                                                   22
coherent models, unified theories, or … any … explanation at all.”
Big Data Effectiveness Map
            Field needing causal understanding                              Field not needing
                                                          Rule Based              causal
             Theory not well        Theory well
                                                                             understanding
               understood           understood
             More data more        More data more
Tall data        Noise                Signal
             Oversampling          Oversampling
                                                         More data better    More data better
          More variables more More variables more
                                                        model performance   model performance
            false positives      explanation
Wide data
           Multicollinearity   Multicollinearity
           Model overfitting   Model overfitting


               Economics,                                                    Google Search,
                                                         Games & Sports
            Financial markets,   Weather forecasting,                       Google Translator,
Examples                                                [Chess, Baseball,
               Earthquake        Customer behavior                          Google Flu-trends,
                                                         etc…], Politics
               forecasting                                                  Customer behavior
                                                                                          23

Mais conteúdo relacionado

Destaque

MapReduce frameworks and methods - Adam Horvath, Google Technology User Grou...
MapReduce frameworks and methods  - Adam Horvath, Google Technology User Grou...MapReduce frameworks and methods  - Adam Horvath, Google Technology User Grou...
MapReduce frameworks and methods - Adam Horvath, Google Technology User Grou...adamhorvath
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
structured and unstructured interview
structured and unstructured interviewstructured and unstructured interview
structured and unstructured interviewahsan mubeen
 
Why Structured Data & Semantic SEO Are Important - SMX East 2013
Why Structured Data & Semantic SEO Are Important - SMX East 2013Why Structured Data & Semantic SEO Are Important - SMX East 2013
Why Structured Data & Semantic SEO Are Important - SMX East 2013Mike Arnesen
 

Destaque (6)

MapReduce frameworks and methods - Adam Horvath, Google Technology User Grou...
MapReduce frameworks and methods  - Adam Horvath, Google Technology User Grou...MapReduce frameworks and methods  - Adam Horvath, Google Technology User Grou...
MapReduce frameworks and methods - Adam Horvath, Google Technology User Grou...
 
Gdg 2013
Gdg 2013Gdg 2013
Gdg 2013
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
structured and unstructured interview
structured and unstructured interviewstructured and unstructured interview
structured and unstructured interview
 
Why Structured Data & Semantic SEO Are Important - SMX East 2013
Why Structured Data & Semantic SEO Are Important - SMX East 2013Why Structured Data & Semantic SEO Are Important - SMX East 2013
Why Structured Data & Semantic SEO Are Important - SMX East 2013
 

Semelhante a Big data

Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesVishy Poosala
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Mark Heid
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
There's no such thing as big data
There's no such thing as big dataThere's no such thing as big data
There's no such thing as big dataAndrew Clegg
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overviewjdijcks
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.docbutest
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentStrategy 2 Market, Inc,
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentationAASTHA PANDEY
 
Workshop_Presentation.pptx
Workshop_Presentation.pptxWorkshop_Presentation.pptx
Workshop_Presentation.pptxRUDRAPRASADSABAR
 

Semelhante a Big data (20)

Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, Opportunities
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
IBM Stream au Hadoop User Group
IBM Stream au Hadoop User GroupIBM Stream au Hadoop User Group
IBM Stream au Hadoop User Group
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
There's no such thing as big data
There's no such thing as big dataThere's no such thing as big data
There's no such thing as big data
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.doc
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product Development
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentation
 
Data mining
Data miningData mining
Data mining
 
Workshop_Presentation.pptx
Workshop_Presentation.pptxWorkshop_Presentation.pptx
Workshop_Presentation.pptx
 

Mais de Gaetan Lion

DRU projections testing.pptx
DRU projections testing.pptxDRU projections testing.pptx
DRU projections testing.pptxGaetan Lion
 
Climate Change in 24 US Cities
Climate Change in 24 US CitiesClimate Change in 24 US Cities
Climate Change in 24 US CitiesGaetan Lion
 
Compact Letter Display (CLD). How it works
Compact Letter Display (CLD).  How it worksCompact Letter Display (CLD).  How it works
Compact Letter Display (CLD). How it worksGaetan Lion
 
CalPERS pensions vs. Social Security
CalPERS pensions vs. Social SecurityCalPERS pensions vs. Social Security
CalPERS pensions vs. Social SecurityGaetan Lion
 
Inequality in the United States
Inequality in the United StatesInequality in the United States
Inequality in the United StatesGaetan Lion
 
Housing Price Models
Housing Price ModelsHousing Price Models
Housing Price ModelsGaetan Lion
 
Global Aging.pdf
Global Aging.pdfGlobal Aging.pdf
Global Aging.pdfGaetan Lion
 
Cryptocurrencies as an asset class
Cryptocurrencies as an asset classCryptocurrencies as an asset class
Cryptocurrencies as an asset classGaetan Lion
 
Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?Gaetan Lion
 
Can Treasury Inflation Protected Securities predict Inflation?
Can Treasury Inflation Protected Securities predict Inflation?Can Treasury Inflation Protected Securities predict Inflation?
Can Treasury Inflation Protected Securities predict Inflation?Gaetan Lion
 
How overvalued is the Stock Market?
How overvalued is the Stock Market? How overvalued is the Stock Market?
How overvalued is the Stock Market? Gaetan Lion
 
The relationship between the Stock Market and Interest Rates
The relationship between the Stock Market and Interest RatesThe relationship between the Stock Market and Interest Rates
The relationship between the Stock Market and Interest RatesGaetan Lion
 
Comparing R vs. Python for data visualization
Comparing R vs. Python for data visualizationComparing R vs. Python for data visualization
Comparing R vs. Python for data visualizationGaetan Lion
 
Will Stock Markets survive in 200 years?
Will Stock Markets survive in 200 years?Will Stock Markets survive in 200 years?
Will Stock Markets survive in 200 years?Gaetan Lion
 
Is Tom Brady the greatest quarterback?
Is Tom Brady the greatest quarterback?Is Tom Brady the greatest quarterback?
Is Tom Brady the greatest quarterback?Gaetan Lion
 
Regularization why you should avoid them
Regularization why you should avoid themRegularization why you should avoid them
Regularization why you should avoid themGaetan Lion
 
Basketball the 3 pt game
Basketball the 3 pt gameBasketball the 3 pt game
Basketball the 3 pt gameGaetan Lion
 

Mais de Gaetan Lion (20)

DRU projections testing.pptx
DRU projections testing.pptxDRU projections testing.pptx
DRU projections testing.pptx
 
Climate Change in 24 US Cities
Climate Change in 24 US CitiesClimate Change in 24 US Cities
Climate Change in 24 US Cities
 
Compact Letter Display (CLD). How it works
Compact Letter Display (CLD).  How it worksCompact Letter Display (CLD).  How it works
Compact Letter Display (CLD). How it works
 
CalPERS pensions vs. Social Security
CalPERS pensions vs. Social SecurityCalPERS pensions vs. Social Security
CalPERS pensions vs. Social Security
 
Recessions.pptx
Recessions.pptxRecessions.pptx
Recessions.pptx
 
Inequality in the United States
Inequality in the United StatesInequality in the United States
Inequality in the United States
 
Housing Price Models
Housing Price ModelsHousing Price Models
Housing Price Models
 
Global Aging.pdf
Global Aging.pdfGlobal Aging.pdf
Global Aging.pdf
 
Cryptocurrencies as an asset class
Cryptocurrencies as an asset classCryptocurrencies as an asset class
Cryptocurrencies as an asset class
 
Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?Can you Deep Learn the Stock Market?
Can you Deep Learn the Stock Market?
 
Can Treasury Inflation Protected Securities predict Inflation?
Can Treasury Inflation Protected Securities predict Inflation?Can Treasury Inflation Protected Securities predict Inflation?
Can Treasury Inflation Protected Securities predict Inflation?
 
How overvalued is the Stock Market?
How overvalued is the Stock Market? How overvalued is the Stock Market?
How overvalued is the Stock Market?
 
The relationship between the Stock Market and Interest Rates
The relationship between the Stock Market and Interest RatesThe relationship between the Stock Market and Interest Rates
The relationship between the Stock Market and Interest Rates
 
Life expectancy
Life expectancyLife expectancy
Life expectancy
 
Comparing R vs. Python for data visualization
Comparing R vs. Python for data visualizationComparing R vs. Python for data visualization
Comparing R vs. Python for data visualization
 
Will Stock Markets survive in 200 years?
Will Stock Markets survive in 200 years?Will Stock Markets survive in 200 years?
Will Stock Markets survive in 200 years?
 
Standardization
StandardizationStandardization
Standardization
 
Is Tom Brady the greatest quarterback?
Is Tom Brady the greatest quarterback?Is Tom Brady the greatest quarterback?
Is Tom Brady the greatest quarterback?
 
Regularization why you should avoid them
Regularization why you should avoid themRegularization why you should avoid them
Regularization why you should avoid them
 
Basketball the 3 pt game
Basketball the 3 pt gameBasketball the 3 pt game
Basketball the 3 pt game
 

Último

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Último (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Big data

  • 2. Table of Content 1) Big Data trends. 2) How Big is your Data? 3) Big Data Potential. 4) Big technologies. New databases. 5) Big quantitative methods. New stats. 6) Big Data temperaments. 7) Is Big always better? 2
  • 3. 1) Big Data Trends 3
  • 4. Cost of Data storage has dropped 4
  • 5. Social Media (Facebook & Twitter) has grown exponentially Facebook vs Twitter # Active Users in 000 exponential growth Facebook started 1,200,000 in Feb 2004. Has 1 billion active 1,000,000 users. 800,000 600,000 Twitter started in March 2006. 400,000 Has 500 million 200,000 users. 0 Ap 8 Ap 9 O 9 Ap 0 O 0 Ap 1 O 1 Ap 2 O 2 13 O 8 Ja 8 Ja 9 Ju 0 Ju 1 Ja 1 Ju 2 Ja 2 Ju 8 Ju 9 Ja 0 l-0 l-1 l-1 1 l-1 0 r-0 l-0 -0 0 r-0 -0 1 r-1 -1 1 r-1 -1 r-1 -1 n- n- n- n- n- n- ct ct ct ct ct Ja Facebook Twitter Social networks are creating a huge live Unstructured Data. 5
  • 6. Unstructured Data is taking over… 6
  • 7. 2) How Big is your Data? • How Tall is it? How large is your sample (rows)? • How Wide is it? How many variables (columns)? • What is its Velocity? How frequently is it updated? • Does it include unstructured data (documents, emails, Social Media)? 7
  • 8. 3) Big Data Potential 8
  • 9. 4) Big Technologies. New Databases 9
  • 10. 10
  • 11. Database: Structured vs Unstructured Database Database Database Reporting Data Type type language structure tool Structured. SQL Data Warehouse Relational Oracle Essbase Customers, structured Data Marts database & IBM Cognos transactions, query language Reporting numbers in rows. Business Intelligence Hadoop Connectors Unstructured. NoSQL Non-relational Hadoop Social Media, not only SQL database Text documents, Web services 11
  • 12. 5) Big quantitative methods. New Stats 12
  • 13. New Stats Map A/B Testing (hypothesis testing) Statistics & Regression Spatial Analysis Regression Time Series Signal Processing Analysis Predictive Analytics Association Rule Learning Data Mining & Cluster Analysis Machine Learning (formerly Artificial Classification Intelligence) Pattern Recognition Neural Networks Optimization Genetic Algorithms Natural Language Sentiment Analysis Processing 13
  • 14. Definitions. Part I Association Rule Learning: method to uncover interesting relationships by generating and testing possible rules. One application is “market basket analysis”, where a retailer figures out what products are frequently bought together. A cited example is that shoppers who buy diapers often buy beer. Classification: identifies the categories in which new data belongs, based on an existing data set grouped in predefined categories. It differs from Cluster Analysis that starts without predefined categories. Genetic algorithms: an optimization method inspired by the “survival of the fittest” process. Potential solutions are encoded as “chromosomes” that can combine and mutate. The chromosomes are selected for survival within a modeled “environment.” Examples: optimizing the performance of an investment portfolio. 14
  • 15. Definitions. Part II Natural language processing (NLP): it uses algorithms to analyze text data. Sentiment Analysis is a common application. It measures customers’ reaction to a product campaign by analyzing social media. Neural networks: models inspired by the workings of neurons and synapses within the brain. Used for finding nonlinear patterns. They can be used for Pattern recognition and Optimization. Examples of neural network applications include identifying customers that may leave and identifying fraudulent insurance claims. Signal processing: an electrical engineering method to analyze signals (radio, etc…) and discern between signal and noise. It is used to extract the signal from the noise from a set of less precise data [Signal Detection Theory]. 15
  • 16. Definitions. Part III Spatial Analysis: it analyzes geographic location encoded within the data. The information comes from GPS. Applications include spatial regression to figure a consumer willingness to purchase a product given his location. 16
  • 17. 6) Big Data Temperaments Source: Harvard Business Review, April 2012 by Shvetank Shah, Andrew Horne and Jaime Capella. 17
  • 18. 7) Is Big always better? 18
  • 19. No! says Nate Silver •“I came to realize that prediction in the era of Big Data was not going very well.” •“If the quantity of information is increasing [exponentially]… Most of it is just noise.” •He refers to John P. Ioannidis 2005 paper: “Why Most Published Research Findings are False.” 2/3ds of scientific papers’ results can’t be replicated! “… numbers have no way of speaking for themselves. We speak for them.” 19
  • 20. Nate’s targets • Political pundits. Their “intuitive” election predictions have been disastrous. Granted, it was not because of Big Data but instead No Data. He showed them how to do it using Small Data (polls with samples < 1,000); • Economists forecasters. They have used Big Data with poor results. The majority of them can’t forecast a recession already underway. ECRI predicted with certainty a double dip recession in 2011 using tens of variables they did not understand. Instead, the economy improved; • Stock market & financial market forecasters. Similar performance as economists forecasters; • Earthquake forecasting. The field is not well understood. “… Statistical inferences are much stronger when backed up by theory… about their root causes.” 20
  • 21. No! says Vincent Granville • Big Data is huge, but information is very sparse; • Storing and processing the entire data is very inefficient; • You can do better by smartly sampling only 5% of the data; You don’t need Big Data, you need Smart Data. 21
  • 22. Yes! Says Chris Anderson • He quotes Peter Norvig, Google’s research director: “All models are wrong, and increasingly you can succeed without them.” • “… with massive data, [the scientific method] is becoming obsolete.” • “We can throw the numbers into the biggest computing clusters … and let statistical algorithms find patterns where science cannot.” He mentions examples such as J.Craig Venter gene sequencing, Google Search, and Google Translator, among other successes. “With enough data, the numbers speak for themselves.” “Correlation supersedes causation, and science can advance without 22 coherent models, unified theories, or … any … explanation at all.”
  • 23. Big Data Effectiveness Map Field needing causal understanding Field not needing Rule Based causal Theory not well Theory well understanding understood understood More data more More data more Tall data Noise Signal Oversampling Oversampling More data better More data better More variables more More variables more model performance model performance false positives explanation Wide data Multicollinearity Multicollinearity Model overfitting Model overfitting Economics, Google Search, Games & Sports Financial markets, Weather forecasting, Google Translator, Examples [Chess, Baseball, Earthquake Customer behavior Google Flu-trends, etc…], Politics forecasting Customer behavior 23