SlideShare uma empresa Scribd logo
1 de 62
Baixar para ler offline
Big Data Concepts &
                                            Practice
                                                              Vladimir Suvorov
                                                              vladimir.suvorov@emc.com

                                                             EMC &
                                                             DataScienceSquad.com


Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   1
About myself




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   2
Why Big Data
    How We Got Here




February 16, 2013     © 2012 IBM Corporation
…by the end of 2011, this was about 30
                                                                 In 2005 there were 1.3 billion RFID




                                                                                                                               billion and growing even faster
                                                                  tags in circulation…




4   Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc                   4
An increasingly sensor-enabled and instrumented
       business environment generates HUGE volumes of
          data with MACHINE SPEED characteristics…




            1 BILLION lines of code
  EACH engine generating 10 TB every 30 minutes!
Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   5
350B
                                                                                                                                         Transactions/Year

                                                                                                                                          Meter Reads
                                                                                                                                          every 15 min.


                                                     120M – meter reads/month                   3.65B – meter reads/day

Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc            6
In August of 2010, Adam
                                                                                  Savage, of “Myth Busters,”
                                                                                  took a photo of his vehicle
                                                                                  using his smartphone. He
                                                                                  then posted the photo to his
                                                                                  Twitter account including the
                                                                                  phrase “Off to work.”

                                                                                Since the photo was taken by
                                                                                  his smartphone, the image
                                                                                  contained metadata revealing
                                                                                  the exact geographical
                                                                                  location the photo was taken

                                                                                By simply taking and posting a
                                                                                 photo, Savage revealed the
                                                                                 exact location of his home,
                                                                                 the vehicle he drives, and the
                                                                                 time he leaves for work

Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   7
The Social Layer in an Instrumented Interconnected World
                                                                                                                                                          4.6
                                                                                     30 billion                                                       billion
                                                                                  RFID tags today
                                                                                                                                                       camera
                            12+ TBs                                                (1.3B in 2005)
                                                                                                                                                       phones
                          of tweet data                                                                                                                  world
                           every day                                                                                                                      wide



                                                                                                                                                  100s of
                                                                                                                                                  millions
                                                                                                                                                   of GPS
 data every
         of




                                                                                                                                                  enabled
 ? TBs




                                                                                                                                                       devices
    day




                                                                                                                                                          sold
                                                                                                                                                      annually

                                     25+ TBs of                                                                                                           2+
                                         log data                                                                                                     billion
                                        every day                                                                                                     people
                                                                                                                                                       on the
                                                                        76 million smart                                                              Web by
                                                                          meters in 2009…                                                                end
                                                                           200M by 2014                                                                 2011
  Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc              8
Twitter Tweets per Second Record Breakers of 2011




 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   9
Extract Intent, Life Events, Micro Segmentation
Attributes

                                                        Pauline

                                                                Name, Birthday, Family
                                                        Tom Sit

                                                                     Not Relevant - Noise
                                                        Tina Mu

                                                                         Monetizable Intent
                                                        Jo Jobs
                                                                     Not Relevant - Noise


                                          Location                                                     Wishful Thinking

                                       Relocation
                                              Monetizable Intent
                                                                                                                  SPAMbots
  Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   10
Big Data Includes Any of the following Characteristics
Extracting insight from an immense volume, variety and velocity of data, in
                context, beyond what was previously possible


Variety: Manage the complexity of
         data in many different
         structures, ranging from
         relational, to logs,
          to raw text

Velocity: Streaming data and large
          volume data movement

Volume: Scale from Terabytes to
        Petabytes (1K TBs) to
        Zetabytes (1B TBs)
 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   11
Bigger and Bigger Volumes of Data
• Retailers collect click-stream data from Web site interactions and loyalty card data
    – This traditional POS information is used by retailer for shopping basket analysis,
        inventory replenishment, +++
    – But data is being provided to suppliers for customer buying analysis

• Healthcare has traditionally been dominated by paper-based systems, but this information is
  getting digitized

• Science is increasingly dominated by big science initiatives
    – Large-scale experiments generate over 15 PB of data a year and can’t be stored within
       the data center; sent to laboratories

• Financial services are seeing large and large volumes through smaller trading sizes,
  increased market volatility, and technological improvements in automated and algorithmic
  trading

• Improved instrument and sensory technology
    – Large Synoptic Survey Telescope’s GPixel camera generates 6PB+ of image data per
       year or consider Oil and Gas industry



 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   12
The Big Data Conundrum

• The percentage of available data an enterprise can analyze is decreasing
  proportionately to the available to it

Quite simply, this means as enterprises, we are getting
 “more naive” about our business over time

We don’t know what we could already know….



                                                                         Data AVAILABLE to
                                                                          an organization




                                                                                                                    Data an organization
                                                                                                                       can PROCESS
  Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   13
Why Not All of Big Data Before: Didn’t have the Tools?




 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   14
Applications for Big Data Analytics
Smarter Healthcare                            Multi-channel                                    Finance                                Log Analysis
                                                  sales




Homeland Security                             Traffic Control                                  Telecom                               Search Quality




  Manufacturing                                   Trading                                    Fraud and                               Retail: Churn,
                                                 Analytics                                      Risk                                      NBO




   Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   15
Most Requested Uses of Big Data
• Log Analytics & Storage

• Smart Grid / Smarter Utilities

• RFID Tracking & Analytics

• Fraud / Risk Management & Modeling

• 360° View of the Customer

• Warehouse Extension

• Email / Call Center Transcript Analysis

• Call Detail Record Analysis
                         16
  Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   16
What companies &
                                            analytics think of Big
                                            Data




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   17
Gartner & McKinsley




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   18
Hype Cycle of Big Data




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   19
Priority matrix




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   20
Key vision
• Predictive modeling is gaining momentum with property
  and casualty (P&C) companies who are using them to
  support claims analysis, CRM, risk management, pricing
  and actuarial workflows, quoting, and underwriting.
• Social content is the fastest growing category of new
  content in the enterprise and will eventually attain 20%
  market penetration.
• Gartner reports that 45% as sales management teams
  identify sales analytics as a priority to help them
  understand sales performance, market conditions and
  opportunities.
• Over 80% of Web Analytics solutions are delivered via
  Software-as-a-Service (SaaS).
Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   21
Big Data deliverables by McKinsley




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   22
Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   23
Intel


Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   24
Intel Big Data Cluster Example
   Application                            Big Data                               Algorithms                                    Compute
                                                                                                                               Style

   Scientific study                       Ground model                           Earthquake                                    HPC
   (e.g. earthquake                                                              simulation, thermal
   study)                                                                        conduction, …
   Internet library                       Historic web                           Data mining                                   MapReduce
   search                                 snapshots

   Virtual world                          Virtual world                          Data mining                                   TBD
   analysis                               database

   Language                               Text corpuses,                         Speech recognition,                           MapReduce &
   translation                            audio archives,…                       machine translation,                          HPC
                                                                                 text-to-speech, …
   Video search                           Video data                             Object/gesture                                MapReduce
                                                                                 identification, face
                                                                                 recognition, …
  There has been more video uploaded to YouTube in the last 2 months than if ABC,
  NBC, and CBS had been airing content 24/7/365 continuously since 1948. - Gartner


                                                                                                                                                    25
Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc        25
Example Motivating Application:
            Online Processing of Archival Video
             •    Research project: Develop a context recognition system that is 90% accurate over
                  90% of your day
                      • Leverage a combination of low- and high-rate sensing for perception
                      • Federate many sensors for improved perception
                      • Big Data: Terabytes of archived video from many egocentric cameras
             •    Example query 1: “Where did I leave my briefcase?”
                      • Sequential search through all video streams [Parallel Camera]
             •    Example query 2: “Now that I’ve found my briefcase, track it”
                      • Cross-cutting search among related video streams [Parallel Time]




                                                                                                            Big Data Cluster


 26

                                                                                                                                                    26
Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc        26
Oracle


Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   27
Big Data Use Cases
       Today’s Challenge                                             New Data                                    What’s Possible

          Healthcare                                             Remote patient                                 Preventive care,
      Expensive office visits                                      monitoring                                reduced hospitalization
             Manufacturing                                                                                     Automated diagnosis,
                                                                Product sensors
           In-person support                                                                                        support
          Location-Based
             Services                                                                                        Geo-advertising, traffic,
                                                         Real time location data
         Based on home zip                                                                                        local search
               code
          Public Sector                                                                                            Tailored services,
                                                                 Citizen surveys
      Standardized services                                                                                         cost reductions
                 Retail
                                                                                                                 Sentiment analysis
             One size fits all                                      Social media
                                                                                                                   segmentation
               marketing


Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   28
What’s in Big Data for Public Sector
                   •Operational efficiency and productivity
                   •Fraud detection and prevention
                   •Close tax gaps
                   •Value for money for citizens
                   •Prevent crime waves
                   •Customize actions based on population
                     segments
                   •Public utilities to reduce consumption
                   •Produce safety from farm to fork




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   29
Microsoft


Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   30
New opportunities




                                                         Measures and ranks online user
   Increases ad revenue by processing 3.5                influence by processing 3 billion signals            Improving investigation time by analyzing
   billion events per day                                per day                                              large volume & variety of data




   Massive Volumes                                       Cloud Connectivity                                   Real-Time Insight
   Processes 464 billion rows per quarter,               Connects across 15 social networks via               Cut investigation time from 2 years to
   with average query time under 10 secs.                the cloud for data and API access                    15 days




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc         31
Microsoft’s Approach to Big Data




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   32
A Holistic Big Data Solution from Microsoft




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   33
Data
                                            Scientist
                                            Job
Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   34
Sexy Job of Data Scientist

                                                                      Tom Davenport, who is teaching an executive
                                                                      program in Big Data and analytics at Harvard
                                                                      University, said some data scientists are
                                                                      earning annual salaries as high as $300,000,
                                                                      which is “pretty good for somebody that
                                                                      doesn't have anyone else working for them.”
                                                                      Davenport also said such workers are
                                                                      motivated by the problems and opportunities
                                                                      data provides.




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   35
What EMC Think of Data Scientists




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   36
Job evolution




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   37
What Forbes think of Data Scientists




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   38
Data
                                            Science
                                            Courses
Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   39
Course Modules and Navigation Icons
                                  Data Science and Big Data Analytics
             1.      Introduction to Big Data Analytics
             2.      Data Analytics Lifecycle + Lab
             3.      Review of Basic Data Analytics Methods Using R +
                     Labs
             4.      Advanced Analytics - Theory & Methods + Labs
             5.      Advanced Analytics - Technology & Tools + Labs
             6.      The Endgame, or Putting it All Together + Final Lab




                                                                                                                                40
Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   40
Topics : DataofScience and Big Advanced Analytics
 Introducti Review    Basic Advanced     Data       The Endgame,
 on to Big  Data Analytic   Analytics – Analytics - or Putting it All
Course Methods Using R Theory and Technology
 Data                                               Together
     Analytics                                               Methods                      and Tools                     +
     +                                                                                                                  Final Lab on Big
     Data                                                                                                               Data Analytics
     Analytics
     Lifecycle
     Big Data             Using R to Look at                 K-means                      Analytics for                 Operationalizing
     Overview             Data -                             Clustering                   Unstructured                  an Analytics
                          Introduction to R                                               Data                          Project
     State of                                                Association                  (MapReduce
     the                  Analyzing and                      Rules                        and Hadoop)                   Creating the
     Practice in          Exploring the Data                                                                            Final
     Analytics                                               Linear                       The Hadoop                    Deliverables
                          Statistics for                     Regression                   Ecosystem
     The Data             Model Building                                                                                Data
     Scientist            and Evaluation                     Logistic                     In-database                   Visualization
                                                             Regression                   Analytics –                   Techniques
     Big Data                                                                             SQL Essentials
     Analytics                                               Naive                                                      + Final Lab –
     in                                                      Bayesian                     Advanced SQL                  Application of
     Industry                                                Classifier                   and MADlib for                the Data
     Verticals                                                                            In-database                   Analytics
                                                             Decision Trees               Analytics                     Lifecycle to a
     Data                                                                                                               Big Data
     Analytics                                               Time Series                                                Analytics
     Lifecycle                                               Analysis                                                   Challenge

                                                             Text Analysis
Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   41
                                                                                                                                                     41
Hadoop


Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   42
Top companies need Hadoop




 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   43
What is Hadoop and Where did it start?

• Created by Doug Cutting, formerly of Yahoo!
  Now Cloudera
         – HDFS (storage) & MapReduce (compute)

         – Inspired by Google’s MapReduce and Google
           File System (GFS) papers

• Much of the initial work on Hadoop was done
  by Yahoo

• It is now a top-level Apache project backed by
  large open source development community




 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   44
What is Hadoop?
                                            Two Core Components


                                  HDFS                                                                    MapReduce



                  Storage in the                                                               Compute via the
                  Hadoop Distributed                                                           MapReduce distributed
                  File System                                                                  processing platform



• Storage & Compute in 1 Framework
• Open Source Project of the Apache Software Foundation
• Written in Java



Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   45
Hadoop cluster architecture




 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   46
MapReduce example




 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   47
Hadoop versions




 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   48
Hadoop Wave Report




                                                                                  “EMC Greenplum is the first mover in Hadoop
                                                                                  appliances. EMC Greenplum the first EDW vendor to
                                                                                  provide a full-featured enterprise-grade Hadoop
                                                                                  appliance and roll out an appliance family that integrates
                                                                                  its Hadoop, EDW, and data integration in a single rack. It
                                                                                  provides its own open source Hadoop distribution
                                                                                  software, integrates EMC’s strong storage product
                                                                                  portfolio in its appliances, and has an extensive
                                                                                  professional services force of EMC technical consultants
                                                                                  and data scientists with Hadoop expertise.”




 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   49
Hadoop Players Today




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   50
Get Started With Hadoop Today
Data Scientists & Hadoop Architecture teams deliver customer success


                                                 Hadoop Architecture Services
                                                         – POC planning and deployment
                                                         – Installation and best practices
                                                         – Educate the team
                                                 Greenplum Analytics Labs
                                                         – Leverage the expertise of Greenplum’s
                                                           Data Scientists
                                                         – Packaged solutions that produce business
                                                           value and actionable results
                                                         – Accelerate Hadoop capabilities on your
                                                           data with your analysts
                                                 Establish a strategic vision
                                                         – Roadmap for Hadoop and unified analytics



 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   51
The Greenplum Unified Analytics Platform




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   52
NoSQL


Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   53
Definition
from nosql-databases.org
• Next Generation Databases mostly addressing
  some of the points: being non-relational,
  distributed, open-source and horizontal
  scalable. The original intention has been modern
  web-scale databases. The movement began
  early 2009 and is growing rapidly. Often more
  characteristics apply as: schema-free, easy
  replication support, simple API, eventually
  consistent /BASE (not ACID), a huge data
  amount, and more. So the misleading term "nosql"
  (the community now translates it mostly with "not
  only sql") should be seen as an alias to
  something like the definition above.

Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   54
NoSQL
http://nosql-database.org/
• Non relational
• Scalability
        – Vertically
                • Add more data
        – Horizontally
                • Add more storage
• Collection of structures
        – Hashtables, maps, dictionaries
• No pre-defined schema
• No join operations
• CAP not ACID
        – Consistency, Availability and Partitioning (but not all three at
          once!)
        – Atomicity, Consistency, Isolation and Durability

Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   55
Advantages of NoSQL
• Cheap, easy to implement
• Data are replicated and can be partitioned
• Easy to distribute
• Don't require a schema
• Can scale up and down
• Quickly process large amounts of data
• Relax the data consistency requirement (CAP)
• Can handle web-scale data, whereas Relational
  DBs cannot




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   56
Disadvantages of NoSQL
• New and sometimes buggy
• Data is generally duplicated, potential for
  inconsistency
• No standardized schema
• No standard format for queries
• No standard language
• Difficult to impose complicated structures
• Depend on the application layer to enforce data
  integrity
• No guarantee of support
• Too many options, which one, or ones to pick

Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   57
NoSQL Options
Key-Value Stores
• This technology you know and love and use all the
  time
        – Hashmap for example
• Put(key,value)
• value = Get(key)
• Examples
        – Redis (my favorite!!) – in memory store
        – Memcached
        – and 100s more



Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   58
Column Stores

  • Not to be confused with the relational-db version
    of this
          – Sybase-IQ etc.
  • Multi-dimensional map
  • Not all entries are relevant each time
          – Column families
  • Examples
          – Cassandra
          – Hbase
          – Amazon SimpleDB



Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   59
Document Stores

   • Key-document stores
            – However the document can be seen as a value so
              you can consider this is a super-set of key-value
   • Big difference is that in document stores one can
     query also on the document, i.e. the document
     portion is structured (not just a blob of data)
   • Examples
            – MongoDB
            – CouchDB




Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   60
Graph Stores

   • Use a graph structure
            – Labeled, directed, attributed multi-graph
                     •   Label for each edge
                     •   Directed edges
                     •   Multiple attributes per node
                     •   Multiple edges between nodes
            – Relational DBs can model graphs, but an edge
              requires a join which is expensive
   • Example Neo4j
            – http://www.infoq.com/articles/graph-nosql-neo4j



Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   61
Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc   62

Mais conteúdo relacionado

Destaque

Big Data for the CMO
Big Data for the CMOBig Data for the CMO
Big Data for the CMOBruno Aziza
 
ANTS and BIG DATA - The it outsourcing trend - ICTCom 2016
ANTS and BIG DATA - The it outsourcing trend - ICTCom 2016ANTS and BIG DATA - The it outsourcing trend - ICTCom 2016
ANTS and BIG DATA - The it outsourcing trend - ICTCom 2016Dinh Le Dat (Kevin D.)
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野Etu Solution
 
The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...
The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...
The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...gogo6
 
Growing Data Scientists by Amparo Alonso Betanzos
Growing Data Scientists by Amparo Alonso BetanzosGrowing Data Scientists by Amparo Alonso Betanzos
Growing Data Scientists by Amparo Alonso BetanzosBig Data Spain
 
Inferring the effect of an event using CausalImpact by Kay H. Brodersen
Inferring the effect of an event using CausalImpact by Kay H. BrodersenInferring the effect of an event using CausalImpact by Kay H. Brodersen
Inferring the effect of an event using CausalImpact by Kay H. BrodersenBig Data Spain
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015 Den Reymer
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
 
Big Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of ThingsBig Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of ThingsAnthony Chen
 
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享Etu Solution
 
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)Amazon Web Services
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big DataBernard Marr
 
Internet of Things and Big Data: Vision and Concrete Use Cases
Internet of Things and Big Data: Vision and Concrete Use CasesInternet of Things and Big Data: Vision and Concrete Use Cases
Internet of Things and Big Data: Vision and Concrete Use CasesMongoDB
 
Function overloading
Function overloadingFunction overloading
Function overloadingAshish Kelwa
 
Gartner: Top 10 Strategic Technology Trends 2016
Gartner: Top 10 Strategic Technology Trends 2016Gartner: Top 10 Strategic Technology Trends 2016
Gartner: Top 10 Strategic Technology Trends 2016Den Reymer
 

Destaque (18)

Big Data for the CMO
Big Data for the CMOBig Data for the CMO
Big Data for the CMO
 
ANTS and BIG DATA - The it outsourcing trend - ICTCom 2016
ANTS and BIG DATA - The it outsourcing trend - ICTCom 2016ANTS and BIG DATA - The it outsourcing trend - ICTCom 2016
ANTS and BIG DATA - The it outsourcing trend - ICTCom 2016
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
 
The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...
The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...
The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...
 
Growing Data Scientists by Amparo Alonso Betanzos
Growing Data Scientists by Amparo Alonso BetanzosGrowing Data Scientists by Amparo Alonso Betanzos
Growing Data Scientists by Amparo Alonso Betanzos
 
Inferring the effect of an event using CausalImpact by Kay H. Brodersen
Inferring the effect of an event using CausalImpact by Kay H. BrodersenInferring the effect of an event using CausalImpact by Kay H. Brodersen
Inferring the effect of an event using CausalImpact by Kay H. Brodersen
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
Big Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of ThingsBig Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of Things
 
推動數位革命
推動數位革命推動數位革命
推動數位革命
 
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
 
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Internet of Things and Big Data: Vision and Concrete Use Cases
Internet of Things and Big Data: Vision and Concrete Use CasesInternet of Things and Big Data: Vision and Concrete Use Cases
Internet of Things and Big Data: Vision and Concrete Use Cases
 
Function overloading
Function overloadingFunction overloading
Function overloading
 
What is big data?
What is big data?What is big data?
What is big data?
 
Gartner: Top 10 Strategic Technology Trends 2016
Gartner: Top 10 Strategic Technology Trends 2016Gartner: Top 10 Strategic Technology Trends 2016
Gartner: Top 10 Strategic Technology Trends 2016
 

Semelhante a Vladimir_Suvorov_Big_data

Ibm swg day 2012 jhb big data (white)
Ibm swg day 2012 jhb big data (white)Ibm swg day 2012 jhb big data (white)
Ibm swg day 2012 jhb big data (white)simonje
 
Cloud e seus impactos nos testes de software
Cloud e seus impactos nos testes de softwareCloud e seus impactos nos testes de software
Cloud e seus impactos nos testes de softwareCezar Taurion
 
Accenture - Bubble over Barcelona 2013 MWC - Mobility Trends
Accenture  - Bubble over Barcelona 2013 MWC - Mobility TrendsAccenture  - Bubble over Barcelona 2013 MWC - Mobility Trends
Accenture - Bubble over Barcelona 2013 MWC - Mobility TrendsLars Kamp
 
Palestra "Technology Trends To Watch In 2012 and beyond"
Palestra "Technology Trends To Watch In 2012 and beyond"Palestra "Technology Trends To Watch In 2012 and beyond"
Palestra "Technology Trends To Watch In 2012 and beyond"Dígitro Tecnologia
 
Big data 20120327
Big data 20120327Big data 20120327
Big data 20120327Accenture
 
IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?Kun Le
 
[JAM 2.0] CTIA 2011: Mobile Business (Evgeny Kaziak)
[JAM 2.0] CTIA 2011: Mobile Business (Evgeny Kaziak)[JAM 2.0] CTIA 2011: Mobile Business (Evgeny Kaziak)
[JAM 2.0] CTIA 2011: Mobile Business (Evgeny Kaziak)jam_team
 
SharePoint Saturdays_ECM_SCN20_Webinar
SharePoint Saturdays_ECM_SCN20_WebinarSharePoint Saturdays_ECM_SCN20_Webinar
SharePoint Saturdays_ECM_SCN20_WebinarSanjeev Samala
 
IT Innovation @ The Internet of Things
IT Innovation @ The Internet of ThingsIT Innovation @ The Internet of Things
IT Innovation @ The Internet of ThingsKim Escherich
 
SFMobile: Founder Labs Mobile Edition 01/09/11
SFMobile: Founder Labs Mobile Edition 01/09/11SFMobile: Founder Labs Mobile Edition 01/09/11
SFMobile: Founder Labs Mobile Edition 01/09/11Jim Porter
 
SF Mobile: Founder Labs Mobile Edition
SF Mobile: Founder Labs Mobile Edition SF Mobile: Founder Labs Mobile Edition
SF Mobile: Founder Labs Mobile Edition Lars Kamp
 
A Mobile Centric View of Silicon Valley - January 2011
A Mobile Centric View of Silicon Valley - January 2011A Mobile Centric View of Silicon Valley - January 2011
A Mobile Centric View of Silicon Valley - January 2011Lars Kamp
 
Proliferation of Mobile Devices = Opportunity for Apps & Developers
Proliferation of Mobile Devices = Opportunity for Apps & DevelopersProliferation of Mobile Devices = Opportunity for Apps & Developers
Proliferation of Mobile Devices = Opportunity for Apps & DevelopersVDC Research Group
 
Konceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBMKonceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBMIBM Danmark
 
QRcodes & Augmented Reality, by Martha Gabriel
QRcodes & Augmented Reality, by Martha GabrielQRcodes & Augmented Reality, by Martha Gabriel
QRcodes & Augmented Reality, by Martha GabrielMartha Gabriel
 
Information Management on Mobile Steroids
Information Management on Mobile SteroidsInformation Management on Mobile Steroids
Information Management on Mobile SteroidsJohn Mancini
 
IBM and the Metaverse - HIT SGS 2008
IBM and the Metaverse - HIT SGS 2008IBM and the Metaverse - HIT SGS 2008
IBM and the Metaverse - HIT SGS 2008Dvir Reznik
 
Io t and machine learning smart cities
Io t and machine learning smart cities Io t and machine learning smart cities
Io t and machine learning smart cities Ajit Jaokar
 

Semelhante a Vladimir_Suvorov_Big_data (20)

Ibm swg day 2012 jhb big data (white)
Ibm swg day 2012 jhb big data (white)Ibm swg day 2012 jhb big data (white)
Ibm swg day 2012 jhb big data (white)
 
Cloud e seus impactos nos testes de software
Cloud e seus impactos nos testes de softwareCloud e seus impactos nos testes de software
Cloud e seus impactos nos testes de software
 
Accenture - Bubble over Barcelona 2013 MWC - Mobility Trends
Accenture  - Bubble over Barcelona 2013 MWC - Mobility TrendsAccenture  - Bubble over Barcelona 2013 MWC - Mobility Trends
Accenture - Bubble over Barcelona 2013 MWC - Mobility Trends
 
Palestra "Technology Trends To Watch In 2012 and beyond"
Palestra "Technology Trends To Watch In 2012 and beyond"Palestra "Technology Trends To Watch In 2012 and beyond"
Palestra "Technology Trends To Watch In 2012 and beyond"
 
TI em 2020
TI em 2020 TI em 2020
TI em 2020
 
Big data 20120327
Big data 20120327Big data 20120327
Big data 20120327
 
IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?
 
[JAM 2.0] CTIA 2011: Mobile Business (Evgeny Kaziak)
[JAM 2.0] CTIA 2011: Mobile Business (Evgeny Kaziak)[JAM 2.0] CTIA 2011: Mobile Business (Evgeny Kaziak)
[JAM 2.0] CTIA 2011: Mobile Business (Evgeny Kaziak)
 
SharePoint Saturdays_ECM_SCN20_Webinar
SharePoint Saturdays_ECM_SCN20_WebinarSharePoint Saturdays_ECM_SCN20_Webinar
SharePoint Saturdays_ECM_SCN20_Webinar
 
IT Innovation @ The Internet of Things
IT Innovation @ The Internet of ThingsIT Innovation @ The Internet of Things
IT Innovation @ The Internet of Things
 
How to succeed in the cloud
How to succeed in the cloudHow to succeed in the cloud
How to succeed in the cloud
 
SFMobile: Founder Labs Mobile Edition 01/09/11
SFMobile: Founder Labs Mobile Edition 01/09/11SFMobile: Founder Labs Mobile Edition 01/09/11
SFMobile: Founder Labs Mobile Edition 01/09/11
 
SF Mobile: Founder Labs Mobile Edition
SF Mobile: Founder Labs Mobile Edition SF Mobile: Founder Labs Mobile Edition
SF Mobile: Founder Labs Mobile Edition
 
A Mobile Centric View of Silicon Valley - January 2011
A Mobile Centric View of Silicon Valley - January 2011A Mobile Centric View of Silicon Valley - January 2011
A Mobile Centric View of Silicon Valley - January 2011
 
Proliferation of Mobile Devices = Opportunity for Apps & Developers
Proliferation of Mobile Devices = Opportunity for Apps & DevelopersProliferation of Mobile Devices = Opportunity for Apps & Developers
Proliferation of Mobile Devices = Opportunity for Apps & Developers
 
Konceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBMKonceptuelt overblik over Big Data, Flemming Bagger, IBM
Konceptuelt overblik over Big Data, Flemming Bagger, IBM
 
QRcodes & Augmented Reality, by Martha Gabriel
QRcodes & Augmented Reality, by Martha GabrielQRcodes & Augmented Reality, by Martha Gabriel
QRcodes & Augmented Reality, by Martha Gabriel
 
Information Management on Mobile Steroids
Information Management on Mobile SteroidsInformation Management on Mobile Steroids
Information Management on Mobile Steroids
 
IBM and the Metaverse - HIT SGS 2008
IBM and the Metaverse - HIT SGS 2008IBM and the Metaverse - HIT SGS 2008
IBM and the Metaverse - HIT SGS 2008
 
Io t and machine learning smart cities
Io t and machine learning smart cities Io t and machine learning smart cities
Io t and machine learning smart cities
 

Último

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 

Último (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 

Vladimir_Suvorov_Big_data

  • 1. Big Data Concepts & Practice Vladimir Suvorov vladimir.suvorov@emc.com EMC & DataScienceSquad.com Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 1
  • 2. About myself Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 2
  • 3. Why Big Data How We Got Here February 16, 2013 © 2012 IBM Corporation
  • 4. …by the end of 2011, this was about 30 In 2005 there were 1.3 billion RFID billion and growing even faster tags in circulation… 4 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 4
  • 5. An increasingly sensor-enabled and instrumented business environment generates HUGE volumes of data with MACHINE SPEED characteristics… 1 BILLION lines of code EACH engine generating 10 TB every 30 minutes! Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 5
  • 6. 350B Transactions/Year Meter Reads every 15 min. 120M – meter reads/month 3.65B – meter reads/day Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 6
  • 7. In August of 2010, Adam Savage, of “Myth Busters,” took a photo of his vehicle using his smartphone. He then posted the photo to his Twitter account including the phrase “Off to work.” Since the photo was taken by his smartphone, the image contained metadata revealing the exact geographical location the photo was taken By simply taking and posting a photo, Savage revealed the exact location of his home, the vehicle he drives, and the time he leaves for work Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 7
  • 8. The Social Layer in an Instrumented Interconnected World 4.6 30 billion billion RFID tags today camera 12+ TBs (1.3B in 2005) phones of tweet data world every day wide 100s of millions of GPS data every of enabled ? TBs devices day sold annually 25+ TBs of 2+ log data billion every day people on the 76 million smart Web by meters in 2009… end 200M by 2014 2011 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 8
  • 9. Twitter Tweets per Second Record Breakers of 2011 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 9
  • 10. Extract Intent, Life Events, Micro Segmentation Attributes Pauline Name, Birthday, Family Tom Sit Not Relevant - Noise Tina Mu Monetizable Intent Jo Jobs Not Relevant - Noise Location Wishful Thinking Relocation Monetizable Intent SPAMbots Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 10
  • 11. Big Data Includes Any of the following Characteristics Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible Variety: Manage the complexity of data in many different structures, ranging from relational, to logs, to raw text Velocity: Streaming data and large volume data movement Volume: Scale from Terabytes to Petabytes (1K TBs) to Zetabytes (1B TBs) Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 11
  • 12. Bigger and Bigger Volumes of Data • Retailers collect click-stream data from Web site interactions and loyalty card data – This traditional POS information is used by retailer for shopping basket analysis, inventory replenishment, +++ – But data is being provided to suppliers for customer buying analysis • Healthcare has traditionally been dominated by paper-based systems, but this information is getting digitized • Science is increasingly dominated by big science initiatives – Large-scale experiments generate over 15 PB of data a year and can’t be stored within the data center; sent to laboratories • Financial services are seeing large and large volumes through smaller trading sizes, increased market volatility, and technological improvements in automated and algorithmic trading • Improved instrument and sensory technology – Large Synoptic Survey Telescope’s GPixel camera generates 6PB+ of image data per year or consider Oil and Gas industry Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 12
  • 13. The Big Data Conundrum • The percentage of available data an enterprise can analyze is decreasing proportionately to the available to it Quite simply, this means as enterprises, we are getting “more naive” about our business over time We don’t know what we could already know…. Data AVAILABLE to an organization Data an organization can PROCESS Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 13
  • 14. Why Not All of Big Data Before: Didn’t have the Tools? Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 14
  • 15. Applications for Big Data Analytics Smarter Healthcare Multi-channel Finance Log Analysis sales Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Fraud and Retail: Churn, Analytics Risk NBO Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 15
  • 16. Most Requested Uses of Big Data • Log Analytics & Storage • Smart Grid / Smarter Utilities • RFID Tracking & Analytics • Fraud / Risk Management & Modeling • 360° View of the Customer • Warehouse Extension • Email / Call Center Transcript Analysis • Call Detail Record Analysis 16 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 16
  • 17. What companies & analytics think of Big Data Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 17
  • 18. Gartner & McKinsley Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 18
  • 19. Hype Cycle of Big Data Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 19
  • 20. Priority matrix Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 20
  • 21. Key vision • Predictive modeling is gaining momentum with property and casualty (P&C) companies who are using them to support claims analysis, CRM, risk management, pricing and actuarial workflows, quoting, and underwriting. • Social content is the fastest growing category of new content in the enterprise and will eventually attain 20% market penetration. • Gartner reports that 45% as sales management teams identify sales analytics as a priority to help them understand sales performance, market conditions and opportunities. • Over 80% of Web Analytics solutions are delivered via Software-as-a-Service (SaaS). Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 21
  • 22. Big Data deliverables by McKinsley Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 22
  • 23. Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 23
  • 24. Intel Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 24
  • 25. Intel Big Data Cluster Example Application Big Data Algorithms Compute Style Scientific study Ground model Earthquake HPC (e.g. earthquake simulation, thermal study) conduction, … Internet library Historic web Data mining MapReduce search snapshots Virtual world Virtual world Data mining TBD analysis database Language Text corpuses, Speech recognition, MapReduce & translation audio archives,… machine translation, HPC text-to-speech, … Video search Video data Object/gesture MapReduce identification, face recognition, … There has been more video uploaded to YouTube in the last 2 months than if ABC, NBC, and CBS had been airing content 24/7/365 continuously since 1948. - Gartner 25 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 25
  • 26. Example Motivating Application: Online Processing of Archival Video • Research project: Develop a context recognition system that is 90% accurate over 90% of your day • Leverage a combination of low- and high-rate sensing for perception • Federate many sensors for improved perception • Big Data: Terabytes of archived video from many egocentric cameras • Example query 1: “Where did I leave my briefcase?” • Sequential search through all video streams [Parallel Camera] • Example query 2: “Now that I’ve found my briefcase, track it” • Cross-cutting search among related video streams [Parallel Time] Big Data Cluster 26 26 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 26
  • 27. Oracle Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 27
  • 28. Big Data Use Cases Today’s Challenge New Data What’s Possible Healthcare Remote patient Preventive care, Expensive office visits monitoring reduced hospitalization Manufacturing Automated diagnosis, Product sensors In-person support support Location-Based Services Geo-advertising, traffic, Real time location data Based on home zip local search code Public Sector Tailored services, Citizen surveys Standardized services cost reductions Retail Sentiment analysis One size fits all Social media segmentation marketing Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 28
  • 29. What’s in Big Data for Public Sector •Operational efficiency and productivity •Fraud detection and prevention •Close tax gaps •Value for money for citizens •Prevent crime waves •Customize actions based on population segments •Public utilities to reduce consumption •Produce safety from farm to fork Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 29
  • 30. Microsoft Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 30
  • 31. New opportunities Measures and ranks online user Increases ad revenue by processing 3.5 influence by processing 3 billion signals Improving investigation time by analyzing billion events per day per day large volume & variety of data Massive Volumes Cloud Connectivity Real-Time Insight Processes 464 billion rows per quarter, Connects across 15 social networks via Cut investigation time from 2 years to with average query time under 10 secs. the cloud for data and API access 15 days Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 31
  • 32. Microsoft’s Approach to Big Data Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 32
  • 33. A Holistic Big Data Solution from Microsoft Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 33
  • 34. Data Scientist Job Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 34
  • 35. Sexy Job of Data Scientist Tom Davenport, who is teaching an executive program in Big Data and analytics at Harvard University, said some data scientists are earning annual salaries as high as $300,000, which is “pretty good for somebody that doesn't have anyone else working for them.” Davenport also said such workers are motivated by the problems and opportunities data provides. Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 35
  • 36. What EMC Think of Data Scientists Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 36
  • 37. Job evolution Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 37
  • 38. What Forbes think of Data Scientists Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 38
  • 39. Data Science Courses Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 39
  • 40. Course Modules and Navigation Icons Data Science and Big Data Analytics 1. Introduction to Big Data Analytics 2. Data Analytics Lifecycle + Lab 3. Review of Basic Data Analytics Methods Using R + Labs 4. Advanced Analytics - Theory & Methods + Labs 5. Advanced Analytics - Technology & Tools + Labs 6. The Endgame, or Putting it All Together + Final Lab 40 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 40
  • 41. Topics : DataofScience and Big Advanced Analytics Introducti Review Basic Advanced Data The Endgame, on to Big Data Analytic Analytics – Analytics - or Putting it All Course Methods Using R Theory and Technology Data Together Analytics Methods and Tools + + Final Lab on Big Data Data Analytics Analytics Lifecycle Big Data Using R to Look at K-means Analytics for Operationalizing Overview Data - Clustering Unstructured an Analytics Introduction to R Data Project State of Association (MapReduce the Analyzing and Rules and Hadoop) Creating the Practice in Exploring the Data Final Analytics Linear The Hadoop Deliverables Statistics for Regression Ecosystem The Data Model Building Data Scientist and Evaluation Logistic In-database Visualization Regression Analytics – Techniques Big Data SQL Essentials Analytics Naive + Final Lab – in Bayesian Advanced SQL Application of Industry Classifier and MADlib for the Data Verticals In-database Analytics Decision Trees Analytics Lifecycle to a Data Big Data Analytics Time Series Analytics Lifecycle Analysis Challenge Text Analysis Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 41 41
  • 42. Hadoop Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 42
  • 43. Top companies need Hadoop Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 43
  • 44. What is Hadoop and Where did it start? • Created by Doug Cutting, formerly of Yahoo! Now Cloudera – HDFS (storage) & MapReduce (compute) – Inspired by Google’s MapReduce and Google File System (GFS) papers • Much of the initial work on Hadoop was done by Yahoo • It is now a top-level Apache project backed by large open source development community Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 44
  • 45. What is Hadoop? Two Core Components HDFS MapReduce Storage in the Compute via the Hadoop Distributed MapReduce distributed File System processing platform • Storage & Compute in 1 Framework • Open Source Project of the Apache Software Foundation • Written in Java Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 45
  • 46. Hadoop cluster architecture Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 46
  • 47. MapReduce example Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 47
  • 48. Hadoop versions Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 48
  • 49. Hadoop Wave Report “EMC Greenplum is the first mover in Hadoop appliances. EMC Greenplum the first EDW vendor to provide a full-featured enterprise-grade Hadoop appliance and roll out an appliance family that integrates its Hadoop, EDW, and data integration in a single rack. It provides its own open source Hadoop distribution software, integrates EMC’s strong storage product portfolio in its appliances, and has an extensive professional services force of EMC technical consultants and data scientists with Hadoop expertise.” Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 49
  • 50. Hadoop Players Today Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 50
  • 51. Get Started With Hadoop Today Data Scientists & Hadoop Architecture teams deliver customer success  Hadoop Architecture Services – POC planning and deployment – Installation and best practices – Educate the team  Greenplum Analytics Labs – Leverage the expertise of Greenplum’s Data Scientists – Packaged solutions that produce business value and actionable results – Accelerate Hadoop capabilities on your data with your analysts  Establish a strategic vision – Roadmap for Hadoop and unified analytics Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 51
  • 52. The Greenplum Unified Analytics Platform Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 52
  • 53. NoSQL Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 53
  • 54. Definition from nosql-databases.org • Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent /BASE (not ACID), a huge data amount, and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above. Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 54
  • 55. NoSQL http://nosql-database.org/ • Non relational • Scalability – Vertically • Add more data – Horizontally • Add more storage • Collection of structures – Hashtables, maps, dictionaries • No pre-defined schema • No join operations • CAP not ACID – Consistency, Availability and Partitioning (but not all three at once!) – Atomicity, Consistency, Isolation and Durability Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 55
  • 56. Advantages of NoSQL • Cheap, easy to implement • Data are replicated and can be partitioned • Easy to distribute • Don't require a schema • Can scale up and down • Quickly process large amounts of data • Relax the data consistency requirement (CAP) • Can handle web-scale data, whereas Relational DBs cannot Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 56
  • 57. Disadvantages of NoSQL • New and sometimes buggy • Data is generally duplicated, potential for inconsistency • No standardized schema • No standard format for queries • No standard language • Difficult to impose complicated structures • Depend on the application layer to enforce data integrity • No guarantee of support • Too many options, which one, or ones to pick Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 57
  • 58. NoSQL Options Key-Value Stores • This technology you know and love and use all the time – Hashmap for example • Put(key,value) • value = Get(key) • Examples – Redis (my favorite!!) – in memory store – Memcached – and 100s more Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 58
  • 59. Column Stores • Not to be confused with the relational-db version of this – Sybase-IQ etc. • Multi-dimensional map • Not all entries are relevant each time – Column families • Examples – Cassandra – Hbase – Amazon SimpleDB Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 59
  • 60. Document Stores • Key-document stores – However the document can be seen as a value so you can consider this is a super-set of key-value • Big difference is that in document stores one can query also on the document, i.e. the document portion is structured (not just a blob of data) • Examples – MongoDB – CouchDB Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 60
  • 61. Graph Stores • Use a graph structure – Labeled, directed, attributed multi-graph • Label for each edge • Directed edges • Multiple attributes per node • Multiple edges between nodes – Relational DBs can model graphs, but an edge requires a join which is expensive • Example Neo4j – http://www.infoq.com/articles/graph-nosql-neo4j Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 61
  • 62. Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 62