SlideShare uma empresa Scribd logo
1 de 18
Hadoop for DBA’s



Michael Naumov
DB Expert
michaeln@liveperson.com
LP Facts




           8500+ customers

           450M unique visitors
           per month

           1.3B visits per month

           60TB of new data
           every month
LP Facts

                    Databases in Liveperson

     Oracle – B2B                 SQL Server – B2C



     Hadoop – Raw Data            Vertica – Forecasting / BI



     Cassandra – Application HA   MySQL - Segmentation


     MySQL NDB - ETL              MongoDB – Predictive Targeting
Hadoop in Liveperson



 • 2TB Of data streamed into Hadoop each day

 • 100+ Data Nodes serving our data needs

 • DataNodes are 36GB RAM, 1TBx12 SATA disks, 2
   quad CPU

 • Dozens (and growing) of daily MR jobs

 • 5 different project (and growing) based on Hadoop
   eco-system
Hadoop

What is Hadoop?

•   Open Source project from Apache
•   Able to store and process large amounts of data. Including not only structured
    data but also complex, unstructured data as well.

•   Hadoop is not actually a single product but instead
    a collection of several components

•   Commodity hardware. Low software and hardware costs

•   Shared nothing machines - Scalability
Example Comparison: RDBMS vs. Hadoop


                  Typical Traditional RDBMS   Hadoop

 Data Size        Gigabytes                   Petabytes

 Access           Interactive and Batch       Batch – NOT Interactive

 Updates          Read / Write many times     Write once, Read many times

 Structure        Static Schema               Dynamic Schema

 Scaling          Nonlinear                   Linear

 Query Response   Can be near immediate       Has latency (due to batch processing)
 Time
Hadoop Distributed File System - HDFS


•    HDFS has three types of Nodes

•    Namenode (MasterNode)
      • Distribute files in the cluster
      • Responsible for the replication
        between the datanodes and for file
        blocks location

 •   Datanodes
     • Responsible for actual file store
     • Serving data from files(data) to client

•    BackupNode (version 0.23 and up)
      • It‟s a backup of the NameNode
Hadoop Ecosystem




     •   MapReduce - Framework for writing/executing algorithms
     •   Hbase – Distributed, versioned, column-oriented database
     •   Hive - SQL like interface for large datasets stored in HDFS.
     •   Pig - Pig consists on a high-level language (Pig Latin) for
         expressing data analysis programs
     •   Sqoop - (“SQL-to-Hadoop”) is a straightforward tool. Able to
         Imports/export tables or databases in HDFS/Hive/Hbase
MapReduce

 What is MapReduce?

 • Runs programs (jobs) across many computers
 • Protects against single server failure by re-run failed steps.
 • MR jobs can be written in Java, C, Phyton, Ruby and etc
 • Users only write Map and Reduce functions
    • MAP - Takes a large problem and divides into sub problems.
              Performs the same function on all subsystems
    • REDUCE - Combine the output from all sub-problems


 Example:   $HADOOP_HOME/bin/hadoop jar @HADOOP_HOME/hadoop-streaming.jar 
                    - input myInputDirs 
                    - output myOutputDir 
                    - mapper /bin/cat 
                    - reducer /bin/wc
Hbase

What is Hbase and why should you use HBase?

• Huge volumes of randomly accessed data.
• There is no restrictions on column numbers for rows it‟s dynamic.
• Consider HBase when you‟re loading data by key, searching data by key (or
  range), serving data by key, querying data by key or when storing data by
  row that doesn‟t conform well to a schema.

Hbase dont’s?
• It doesn‟t talk SQL, have an optimizer, support in transactions or joins. If
  you don‟t use any of these in your database application then HBase could
  very well be the perfect fit.

Example:     create ‘blogposts’, ‘post’, ‘image’                          ---create table
             put „blogposts‟, „id1′, „post:title‟, „Hello World‟          ---insert value
             put „blogposts‟, „id1′, „post:body‟, „This is a blog post‟   ---insert value
             put „blogposts‟, „id1′, „image:header‟, „image1.jpg‟         ---insert value
             get „blogposts‟, „id1′                                       ---select records
Hive

What is Hive?

• Built on the MapReduce framework so it generates M/R jobs behind
• Hive is a data warehouse that enables easy data summarization and
  ad-hoc queries via an SQL-like interface for large datasets stored in
  HDFS/Hbase.
• Have partitioning and partition swapping
• Good for random sampling
Example:    CREATE EXTERNAL TABLE vs_hdfs (           select session_id,
            site_id string,                                  get_json_object(concat(tttt, "}"), '$.BY'),
            session_id string,                               get_json_object(concat(tttt, "}"), '$.TEXT') from
            time_stamp bigint,                        (
            visitor_id bigint,                              select session_id,concat("{",
            row_unit string,                                regexp_replace(event, "[{|}]", ""), "}") tttt
            evts string,                               from (
            biz string,                                      select session_id,get_json_object(plne,
            plne string,                                             '$.PLine.evts[*]') pln
            dims string)                                           from vs_hdfs_v1 where site='6964264'
            partitioned by (site string,day string)         and day='20120201' and plne!='{}' limit 10 ) t
            ROW FORMAT DELIMITED FIELDS TERMINATED         LATERAL VIEW explode(split(pln, "},{"))
            BY '001'                                 adTable AS event )t2
            STORED AS SEQUENCEFILE LOCATION
            '/home/data/';
Pig
 What is Pig?
 • Data flow processing
 • Uses Pig Latin query language
 • Highly parallel in order to distribute data processing across many
   servers
 • Combining multiple data sources (Files, Hbase, Hive)

 Example:
Sqoop

 What is Sqoop?

 • It‟s a command line tool for moving data between HDFS and
   relational database systems.
 • You can download drivers for Sqoop from Microsoft and
      •   Import Data/Query results from SQL Server to Hadoop.
      •   Export Data from Hadoop to SQL Server.
 • It‟s like BCP


 Example:
 $bin/sqoop import --connect
    'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' 
     --table lineitem --hive-import

 $bin/sqoop export --connect
     'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' --table
     lineitem --export-dir /data/lineitemData
Other projects in Hadoop


 There is many other projects we didn‟t talk about

 •   Chukwa
 •   Mahout
 •   Avro
 •   Zookeeper
 •   Fuse
 •   Flume
 •   Oozie
 •   Hue
 •   Hiho
 •   ………………..
Hadoop and Microsoft




 • Bring Hive data directly to Excel through the Microsoft Hive Add-in for Excel
 • Build a PowerPivot/PowerView on top of Hive
 • Instead of manually refreshing a PowerPivot workbook based on Hive on
   their desktop, users can use PowerPivot for SharePoint to schedule a data
   refresh feature to refresh a central copy shared with others, without worrying
   about the time or resources it takes.
 • BI Professionals can build BI Semantic Model or Reporting Services
    Reports on Hive in SQL Server Data tools
HOW DO I GET STARTED

• Microsoft
    •   https://www.hadooponazure.com/
    •   http://www.microsoft.com/download/en/details.aspx?id=27584 (sqoop driver)


• Open Source: http://hadoop.apacha.org
• Vendors
    •   http://www.cloudera.com
    •   http://hortonworks.com
    •   http://mapr.com
@




Demo
Questions?

Mais conteúdo relacionado

Mais procurados

Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationSameer Tiwari
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceobdit
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 

Mais procurados (20)

Hadoop
HadoopHadoop
Hadoop
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
HDFS
HDFSHDFS
HDFS
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Hadoop - Introduction to Hadoop
Hadoop - Introduction to HadoopHadoop - Introduction to Hadoop
Hadoop - Introduction to Hadoop
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Hive sq lfor-hadoop
Hive sq lfor-hadoopHive sq lfor-hadoop
Hive sq lfor-hadoop
 
Nextag talk
Nextag talkNextag talk
Nextag talk
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 

Destaque

Adi Sapir ISUG 123 11/10/2012
Adi Sapir ISUG 123 11/10/2012Adi Sapir ISUG 123 11/10/2012
Adi Sapir ISUG 123 11/10/2012sqlserver.co.il
 
Products.intro.forum version
Products.intro.forum versionProducts.intro.forum version
Products.intro.forum versionsqlserver.co.il
 
Query handlingbytheserver
Query handlingbytheserverQuery handlingbytheserver
Query handlingbytheserversqlserver.co.il
 
Sql server user group news january 2013
Sql server user group news   january 2013Sql server user group news   january 2013
Sql server user group news january 2013sqlserver.co.il
 
Things you can find in the plan cache
Things you can find in the plan cacheThings you can find in the plan cache
Things you can find in the plan cachesqlserver.co.il
 
Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013sqlserver.co.il
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 

Destaque (8)

Adi Sapir ISUG 123 11/10/2012
Adi Sapir ISUG 123 11/10/2012Adi Sapir ISUG 123 11/10/2012
Adi Sapir ISUG 123 11/10/2012
 
Products.intro.forum version
Products.intro.forum versionProducts.intro.forum version
Products.intro.forum version
 
Query handlingbytheserver
Query handlingbytheserverQuery handlingbytheserver
Query handlingbytheserver
 
DAC 2012
DAC 2012DAC 2012
DAC 2012
 
Sql server user group news january 2013
Sql server user group news   january 2013Sql server user group news   january 2013
Sql server user group news january 2013
 
Things you can find in the plan cache
Things you can find in the plan cacheThings you can find in the plan cache
Things you can find in the plan cache
 
Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 

Semelhante a מיכאל

Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderDmitry Makarchuk
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it bettergvernik
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?gvernik
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1Sperasoft
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKSkills Matter
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxvishwasgarade1
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pigSudar Muthu
 

Semelhante a מיכאל (20)

Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 
Hive
HiveHive
Hive
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it better
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptx
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
6.hive
6.hive6.hive
6.hive
 

Mais de sqlserver.co.il

SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3sqlserver.co.il
 
SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2sqlserver.co.il
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1sqlserver.co.il
 
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended EventsSQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Eventssqlserver.co.il
 
SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoreSQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoresqlserver.co.il
 
SQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACSQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACsqlserver.co.il
 
SQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: SpatialSQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: Spatialsqlserver.co.il
 
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf FraenkelBi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf Fraenkelsqlserver.co.il
 
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...sqlserver.co.il
 
Extreme performance - IDF UG
Extreme performance - IDF UGExtreme performance - IDF UG
Extreme performance - IDF UGsqlserver.co.il
 
3 extreme performance - databases acceleration using ssd
3   extreme performance - databases acceleration using ssd 3   extreme performance - databases acceleration using ssd
3 extreme performance - databases acceleration using ssd sqlserver.co.il
 
3 extreme performance - databases acceleration using ssd
3   extreme performance - databases acceleration using ssd 3   extreme performance - databases acceleration using ssd
3 extreme performance - databases acceleration using ssd sqlserver.co.il
 
4 extreme performance - part ii
4   extreme performance - part ii4   extreme performance - part ii
4 extreme performance - part iisqlserver.co.il
 
2 extreme performance - smaller is better
2   extreme performance - smaller is better2   extreme performance - smaller is better
2 extreme performance - smaller is bettersqlserver.co.il
 
1 extreme performance - part i
1   extreme performance - part i1   extreme performance - part i
1 extreme performance - part isqlserver.co.il
 

Mais de sqlserver.co.il (20)

SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3
 
SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
 
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended EventsSQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
 
SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoreSQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStore
 
SQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACSQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DAC
 
SQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: SpatialSQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: Spatial
 
נועם
נועםנועם
נועם
 
עדי
עדיעדי
עדי
 
מיכאל
מיכאלמיכאל
מיכאל
 
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf FraenkelBi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
 
DBCC - Dubi Lebel
DBCC - Dubi LebelDBCC - Dubi Lebel
DBCC - Dubi Lebel
 
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
 
ISUG 113: File stream
ISUG 113: File streamISUG 113: File stream
ISUG 113: File stream
 
Extreme performance - IDF UG
Extreme performance - IDF UGExtreme performance - IDF UG
Extreme performance - IDF UG
 
3 extreme performance - databases acceleration using ssd
3   extreme performance - databases acceleration using ssd 3   extreme performance - databases acceleration using ssd
3 extreme performance - databases acceleration using ssd
 
3 extreme performance - databases acceleration using ssd
3   extreme performance - databases acceleration using ssd 3   extreme performance - databases acceleration using ssd
3 extreme performance - databases acceleration using ssd
 
4 extreme performance - part ii
4   extreme performance - part ii4   extreme performance - part ii
4 extreme performance - part ii
 
2 extreme performance - smaller is better
2   extreme performance - smaller is better2   extreme performance - smaller is better
2 extreme performance - smaller is better
 
1 extreme performance - part i
1   extreme performance - part i1   extreme performance - part i
1 extreme performance - part i
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

מיכאל

  • 1. Hadoop for DBA’s Michael Naumov DB Expert michaeln@liveperson.com
  • 2. LP Facts 8500+ customers 450M unique visitors per month 1.3B visits per month 60TB of new data every month
  • 3. LP Facts Databases in Liveperson Oracle – B2B SQL Server – B2C Hadoop – Raw Data Vertica – Forecasting / BI Cassandra – Application HA MySQL - Segmentation MySQL NDB - ETL MongoDB – Predictive Targeting
  • 4. Hadoop in Liveperson • 2TB Of data streamed into Hadoop each day • 100+ Data Nodes serving our data needs • DataNodes are 36GB RAM, 1TBx12 SATA disks, 2 quad CPU • Dozens (and growing) of daily MR jobs • 5 different project (and growing) based on Hadoop eco-system
  • 5. Hadoop What is Hadoop? • Open Source project from Apache • Able to store and process large amounts of data. Including not only structured data but also complex, unstructured data as well. • Hadoop is not actually a single product but instead a collection of several components • Commodity hardware. Low software and hardware costs • Shared nothing machines - Scalability
  • 6. Example Comparison: RDBMS vs. Hadoop Typical Traditional RDBMS Hadoop Data Size Gigabytes Petabytes Access Interactive and Batch Batch – NOT Interactive Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Scaling Nonlinear Linear Query Response Can be near immediate Has latency (due to batch processing) Time
  • 7. Hadoop Distributed File System - HDFS • HDFS has three types of Nodes • Namenode (MasterNode) • Distribute files in the cluster • Responsible for the replication between the datanodes and for file blocks location • Datanodes • Responsible for actual file store • Serving data from files(data) to client • BackupNode (version 0.23 and up) • It‟s a backup of the NameNode
  • 8. Hadoop Ecosystem • MapReduce - Framework for writing/executing algorithms • Hbase – Distributed, versioned, column-oriented database • Hive - SQL like interface for large datasets stored in HDFS. • Pig - Pig consists on a high-level language (Pig Latin) for expressing data analysis programs • Sqoop - (“SQL-to-Hadoop”) is a straightforward tool. Able to Imports/export tables or databases in HDFS/Hive/Hbase
  • 9. MapReduce What is MapReduce? • Runs programs (jobs) across many computers • Protects against single server failure by re-run failed steps. • MR jobs can be written in Java, C, Phyton, Ruby and etc • Users only write Map and Reduce functions • MAP - Takes a large problem and divides into sub problems. Performs the same function on all subsystems • REDUCE - Combine the output from all sub-problems Example: $HADOOP_HOME/bin/hadoop jar @HADOOP_HOME/hadoop-streaming.jar - input myInputDirs - output myOutputDir - mapper /bin/cat - reducer /bin/wc
  • 10. Hbase What is Hbase and why should you use HBase? • Huge volumes of randomly accessed data. • There is no restrictions on column numbers for rows it‟s dynamic. • Consider HBase when you‟re loading data by key, searching data by key (or range), serving data by key, querying data by key or when storing data by row that doesn‟t conform well to a schema. Hbase dont’s? • It doesn‟t talk SQL, have an optimizer, support in transactions or joins. If you don‟t use any of these in your database application then HBase could very well be the perfect fit. Example: create ‘blogposts’, ‘post’, ‘image’ ---create table put „blogposts‟, „id1′, „post:title‟, „Hello World‟ ---insert value put „blogposts‟, „id1′, „post:body‟, „This is a blog post‟ ---insert value put „blogposts‟, „id1′, „image:header‟, „image1.jpg‟ ---insert value get „blogposts‟, „id1′ ---select records
  • 11. Hive What is Hive? • Built on the MapReduce framework so it generates M/R jobs behind • Hive is a data warehouse that enables easy data summarization and ad-hoc queries via an SQL-like interface for large datasets stored in HDFS/Hbase. • Have partitioning and partition swapping • Good for random sampling Example: CREATE EXTERNAL TABLE vs_hdfs ( select session_id, site_id string, get_json_object(concat(tttt, "}"), '$.BY'), session_id string, get_json_object(concat(tttt, "}"), '$.TEXT') from time_stamp bigint, ( visitor_id bigint, select session_id,concat("{", row_unit string, regexp_replace(event, "[{|}]", ""), "}") tttt evts string, from ( biz string, select session_id,get_json_object(plne, plne string, '$.PLine.evts[*]') pln dims string) from vs_hdfs_v1 where site='6964264' partitioned by (site string,day string) and day='20120201' and plne!='{}' limit 10 ) t ROW FORMAT DELIMITED FIELDS TERMINATED LATERAL VIEW explode(split(pln, "},{")) BY '001' adTable AS event )t2 STORED AS SEQUENCEFILE LOCATION '/home/data/';
  • 12. Pig What is Pig? • Data flow processing • Uses Pig Latin query language • Highly parallel in order to distribute data processing across many servers • Combining multiple data sources (Files, Hbase, Hive) Example:
  • 13. Sqoop What is Sqoop? • It‟s a command line tool for moving data between HDFS and relational database systems. • You can download drivers for Sqoop from Microsoft and • Import Data/Query results from SQL Server to Hadoop. • Export Data from Hadoop to SQL Server. • It‟s like BCP Example: $bin/sqoop import --connect 'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' --table lineitem --hive-import $bin/sqoop export --connect 'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' --table lineitem --export-dir /data/lineitemData
  • 14. Other projects in Hadoop There is many other projects we didn‟t talk about • Chukwa • Mahout • Avro • Zookeeper • Fuse • Flume • Oozie • Hue • Hiho • ………………..
  • 15. Hadoop and Microsoft • Bring Hive data directly to Excel through the Microsoft Hive Add-in for Excel • Build a PowerPivot/PowerView on top of Hive • Instead of manually refreshing a PowerPivot workbook based on Hive on their desktop, users can use PowerPivot for SharePoint to schedule a data refresh feature to refresh a central copy shared with others, without worrying about the time or resources it takes. • BI Professionals can build BI Semantic Model or Reporting Services Reports on Hive in SQL Server Data tools
  • 16. HOW DO I GET STARTED • Microsoft • https://www.hadooponazure.com/ • http://www.microsoft.com/download/en/details.aspx?id=27584 (sqoop driver) • Open Source: http://hadoop.apacha.org • Vendors • http://www.cloudera.com • http://hortonworks.com • http://mapr.com

Notas do Editor

  1. https://www.hadooponazure.com/Account