SlideShare uma empresa Scribd logo
1 de 177
Baixar para ler offline
Replacing Telco DB/DW to
                           Hadoop and Hive


                                JunHo Cho

                          Data Analysis Platform Team




Friday, July 1, 2011
•   Cloud Computing Platform - Xen

                   •   Cloud Storage Platform - hadoop

                   •   Massive Email Archiving Solution - hadoop, lucene

                       •   HIVE : social network analysis using email

                   •   Log Archiving Solution - hadoop



                   •   Data Analysis
                              data mining, machine learning, data statistic

                   •   Data Platform - hadoop, lucene, hive

                   •   Cloud Architecture - KT Cloud

Friday, July 1, 2011
Telco Data




Friday, July 1, 2011
Telco Data




Friday, July 1, 2011
Telco Data




Friday, July 1, 2011
Telco Data




Friday, July 1, 2011
Telco Data




Friday, July 1, 2011
Telco Data


Friday, July 1, 2011
Telco Data


Friday, July 1, 2011
Telco Data

Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
OpenSource




Friday, July 1, 2011
OpenSource


                       Storage & Computing




Friday, July 1, 2011
OpenSource




Friday, July 1, 2011
OpenSource




          Collection

Friday, July 1, 2011
OpenSource




Friday, July 1, 2011
OpenSource

             Search




Friday, July 1, 2011
OpenSource




Friday, July 1, 2011
OpenSource


                                    Analysis




Friday, July 1, 2011
OpenSource




Friday, July 1, 2011
OpenSource




                         Coordination

Friday, July 1, 2011
OpenSource




Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Hive Internal



Friday, July 1, 2011
Hive Architecture

                       UI         Driver

                       DDL           HQL
                                                   Execution
                                             Works
                                                    Engine
            MetaStore             Compiler
                            ORM                     Hadoop
                                             Result



Friday, July 1, 2011
Hive Architecture

                       UI         Driver   select col1 from tab1 where ...


                       DDL           HQL
                                                    Execution
                                              Works
                                                     Engine
            MetaStore             Compiler
                            ORM                       Hadoop
                                               Result



Friday, July 1, 2011
Hive Architecture

                       UI         Driver

                       DDL           HQL
                                                   Execution
                                             Works
                                                    Engine
            MetaStore             Compiler
                            ORM                     Hadoop
                                             Result



Friday, July 1, 2011
Hive Architecture

                       UI         Driver

                       DDL           HQL
                                                   Execution
                                             Works
                                                    Engine
            MetaStore             Compiler
                            ORM                     Hadoop
                                             Result



Friday, July 1, 2011
Hive Architecture

                       UI         Driver

                       DDL           HQL
                                                   Execution
                                             Works
                                                    Engine
            MetaStore             Compiler
                            ORM                     Hadoop
                                             Result



Friday, July 1, 2011
Hive Architecture
                                    a 123344
                                    b 121211
                                    c 342434

                       UI         Driver

                       DDL            HQL
                                                     Execution
                                               Works
                                                      Engine
            MetaStore             Compiler
                            ORM                       Hadoop
                                               Result



Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Parser
                       Parser
                                                   Select col1,col2 From tab1 Where col3 > 5


                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR


                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                                 TOK_TMP_FILE
                                                                                               TOK_TABLE_OR_COL    5




Friday, July 1, 2011
Parser
                       Parser
                                                     Select col1,col2 From tab1 Where col3 > 5

                                                QB
                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR


                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                                 TOK_TMP_FILE
                                                                                               TOK_TABLE_OR_COL    5




Friday, July 1, 2011
Parser
                       Parser
                                                   Select col1,col2 From tab1 Where col3 > 5


                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR
              QB tab1
                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                                 TOK_TMP_FILE
                                                                                               TOK_TABLE_OR_COL    5




Friday, July 1, 2011
Parser
                       Parser
                                                   Select col1,col2 From tab1 Where col3 > 5


                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR
                       tab1
                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                                 TOK_TMP_FILE
                                                                                               TOK_TABLE_OR_COL    5

                                   QB     insclause-0



Friday, July 1, 2011
Parser
                       Parser
                                                   Select col1,col2 From tab1 Where col3 > 5


                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR
                       tab1
                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                                 TOK_TMP_FILE
                                                       col1 QB
                                                                                               TOK_TABLE_OR_COL    5

                                          insclause-0



Friday, July 1, 2011
Parser
                       Parser
                                                   Select col1,col2 From tab1 Where col3 > 5


                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR
                       tab1
                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL


                                                       col1                      col2           QB
                                 TOK_TMP_FILE
                                                                                               TOK_TABLE_OR_COL    5

                                          insclause-0



Friday, July 1, 2011
Parser
                       Parser
                                                   Select col1,col2 From tab1 Where col3 > 5


                                          TOK_QUERY




              TOK_FROM                                      TOK_INSERT




                                TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE   QB
           TOK_TABNAME
                                                  TOK_SELEXPR                    TOK_SELEXPR
                       tab1
                                   TOK_DIR
                                                                                                            >


                                                TOK_TABLE_OR_COL                TOK_TABLE_OR_COL


                                                       col1                      col2
                                 TOK_TMP_FILE
                                                                                               TOK_TABLE_OR_COL         5

                                          insclause-0



Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM

            TOK_WHERE

            TOK_SELECT

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE

            TOK_SELECT

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE

            TOK_SELECT

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE                              FilterOperator

            TOK_SELECT

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE                              FilterOperator

            TOK_SELECT

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE                              FilterOperator

            TOK_SELECT                             SelectOperator

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE                              FilterOperator

            TOK_SELECT                             SelectOperator

            TOK_DESTINATION




Friday, July 1, 2011
Plan
                         Plan
                                Select col1,col2 From tab1 Where col3 > 5


                         QB



            TOK_FROM                             TableScanOperator

            TOK_WHERE                              FilterOperator

            TOK_SELECT                             SelectOperator

            TOK_DESTINATION                       FileSinkOperator




Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Optimizer
                  Optimizer   Select col1,col2 From tab1 Where col3 > 5




                              TableScanOperator

                                FilterOperator

                                SelectOperator

                               FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer   Select col1,col2 From tab1 Where col3 > 5

                              tab1 {col1, col2, col3, col4,col5,col6,col7}



                              TableScanOperator

                                FilterOperator

                                SelectOperator

                               FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer   Select col1,col2 From tab1 Where col3 > 5

                              tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

       FilterOperator

      SelectOperator

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer   Select col1,col2 From tab1 Where col3 > 5

                              tab1 {col1, col2, col3, col4,col5,col6,col7}
                                     Context




 TableScanOperator

       FilterOperator
                                                     ColumnPruner


      SelectOperator

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer   Select col1,col2 From tab1 Where col3 > 5

                              tab1 {col1, col2, col3, col4,col5,col6,col7}
                                     Context




 TableScanOperator

       FilterOperator                                                 FIL
                                                     ColumnPruner     TS
                                                                      SEL
      SelectOperator

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

       FilterOperator                                                   FIL
                                                       ColumnPruner     TS
                                                                        SEL
      SelectOperator

    FileSinkOperator          Context




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

       FilterOperator
                                                       ColumnPruner


      SelectOperator
                                        FIL
    FileSinkOperator          Context   TS
                                        SEL




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

       FilterOperator
                                                       ColumnPruner

                                        FIL
      SelectOperator          Context   TS
                                        SEL

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

       FilterOperator
                                                           ColumnPruner

                                        FIL
      SelectOperator          Context   TS
                                        SEL   col1, col2

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

       FilterOperator
                                                       ColumnPruner

                                        FIL
      SelectOperator          Context   TS
                                        SEL

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

                                        FIL   col1, col2, col3
       FilterOperator         Context   TS
                                                                 ColumnPruner
                                        SEL

      SelectOperator

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}



 TableScanOperator

                                        FIL
       FilterOperator         Context   TS
                                                       ColumnPruner
                                        SEL

      SelectOperator

    FileSinkOperator




Friday, July 1, 2011
Optimizer
                  Optimizer     Select col1,col2 From tab1 Where col3 > 5

                                tab1 {col1, col2, col3, col4,col5,col6,col7}


                                        FIL
 TableScanOperator            Context   TS    col1, col2, col3
                                        SEL

          FilterOperator
                                                                 ColumnPruner


       FilterOperator

      SelectOperator

    FileSinkOperator

Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Hive Internal
                                                            Map Reduce
               Web UI       Hive CLI      JDBC
                                                    TSOperator           User Script
                       Browse, Query, DDL
                                                                         UDF/UDAF
                                                    SELOperator
                                                                           substr
                                                                            sum
        MetaStore                      Hive QL      FSOperator            average

         Thrift API                    Parser          ExecMapper/ExecReducer
                                        Plan                     SerDe

                                   Optimizer           Input/OutputFormat

                                        Task
                                                   HDFS             StorageHandler
                                                   RCFile
                                                                  DB     ...     HBase

Friday, July 1, 2011
Task
                         Task   Select col1,col2 From tab1 Where col3 > 5

                                                       TS - GenMRTableScan1
                                     TaskFactory
                                                       FS - GenMRFileSink1
                       QB




Friday, July 1, 2011
Task
                         Task   Select col1,col2 From tab1 Where col3 > 5

                                                       TS - GenMRTableScan1
                                     TaskFactory
                                                       FS - GenMRFileSink1
                       QB




                                                       FetchTask




Friday, July 1, 2011
Task
                         Task     Select col1,col2 From tab1 Where col3 > 5

                                                         TS - GenMRTableScan1
                                       TaskFactory
                                                         FS - GenMRFileSink1
                       QB



             TableScanOperator



                FilterOperator                           FetchTask

                FilterOperator



                SelectOperator



               FileSinkOperator




Friday, July 1, 2011
Task
                         Task     Select col1,col2 From tab1 Where col3 > 5

                                                         TS - GenMRTableScan1
                                        TaskFactory
                                                         FS - GenMRFileSink1
                       QB



                                     TableScanOperator



                FilterOperator                           FetchTask

                FilterOperator



                SelectOperator



               FileSinkOperator




Friday, July 1, 2011
Task
                         Task     Select col1,col2 From tab1 Where col3 > 5


                                        TaskFactory
                                                         FS - GenMRFileSink1
                       QB
                                      MapRedTask


                                     TableScanOperator



                FilterOperator                           FetchTask

                FilterOperator



                SelectOperator



               FileSinkOperator




Friday, July 1, 2011
Task
                         Task     Select col1,col2 From tab1 Where col3 > 5


                                        TaskFactory
                                                         FS - GenMRFileSink1
                       QB
                                      MapRedTask


                                     TableScanOperator



                                     FilterOperator      FetchTask

                FilterOperator



                SelectOperator



               FileSinkOperator




Friday, July 1, 2011
Task
                         Task     Select col1,col2 From tab1 Where col3 > 5


                                        TaskFactory
                                                         FS - GenMRFileSink1
                       QB
                                      MapRedTask


                                     TableScanOperator



                                     FilterOperator      FetchTask

                                      FilterOperator



                SelectOperator



               FileSinkOperator




Friday, July 1, 2011
Task
                         Task     Select col1,col2 From tab1 Where col3 > 5


                                        TaskFactory
                                                         FS - GenMRFileSink1
                       QB
                                      MapRedTask


                                     TableScanOperator



                                     FilterOperator      FetchTask

                                      FilterOperator



                                      SelectOperator



               FileSinkOperator




Friday, July 1, 2011
Task
                         Task   Select col1,col2 From tab1 Where col3 > 5


                                      TaskFactory
                                                       FS - GenMRFileSink1
                       QB
                                    MapRedTask


                                   TableScanOperator



                                   FilterOperator      FetchTask

                                    FilterOperator



                                    SelectOperator



                                   FileSinkOperator




Friday, July 1, 2011
Task
                         Task   Select col1,col2 From tab1 Where col3 > 5


                                      TaskFactory

                       QB
                                    MapRedTask


                                   TableScanOperator



                                   FilterOperator      FetchTask

                                    FilterOperator



                                    SelectOperator



                                   FileSinkOperator




Friday, July 1, 2011
Task
                         Task   Select col1,col2 From tab1 Where col3 > 5


                                      TaskFactory

                       QB
                                    MapRedTask
                                                       MapRedTask
                                   TableScanOperator



                                   FilterOperator       FetchTask

                                    FilterOperator



                                    SelectOperator



                                   FileSinkOperator




Friday, July 1, 2011
Hive Internal
                                                                Map Reduce
               Web UI       Hive CLI      JDBC
                                                  TSOperator                     User Script
                       Browse, Query, DDL
                                                                                    UDF
                                                  FILOperator    SELOperator


        MetaStore                      Hive QL    FILOperator     FSOperator

         Thrift API                    Parser             ExecMapper/ExecReducer
                                        Plan                      SerDe

                                   Optimizer              Input/OutputFormat

                                        Task
                                                     HDFS            StorageHandler
                                                      RCFile
                                                                   DB      ...       HBase

Friday, July 1, 2011
Hive Internal
                                                                Map Reduce
               Web UI       Hive CLI      JDBC
                                                  TSOperator                     User Script
                       Browse, Query, DDL
                                                                                    UDF
                                                  FILOperator    SELOperator


        MetaStore                      Hive QL    FILOperator     FSOperator

         Thrift API                    Parser             ExecMapper/ExecReducer
                                        Plan                      SerDe

                                   Optimizer              Input/OutputFormat

                                        Task
                                                     HDFS            StorageHandler
                                                      RCFile
                                                                   DB      ...       HBase

Friday, July 1, 2011
Oracle Migration
                            to Hive



Friday, July 1, 2011
l	 
             l	 

             l	       	 

             l	        	 




Friday, July 1, 2011
l	                     l	 
             l	                     l	    	 
             l	       	             l	 
             l	        	            l	    	  	    	    	 




Friday, July 1, 2011
l	                     l	 
             l	                     l	    	 
             l	       	             l	 
             l	        	            l	    	  	         	         	 


                                                    	 
                                                    	 
                                                    	         	 
Friday, July 1, 2011
Understand Oracle SQL


                       • more than 3000 ETL SQL
                       • understand Data-Flow
                       • Group similar SQL Pattern
                       • Investigate used Oracle Function


Friday, July 1, 2011
Oracle SQL



Friday, July 1, 2011
Data Model Convert




Friday, July 1, 2011
Data Model Convert



                       Table




Friday, July 1, 2011
Data Model Convert



                       Table           Table




Friday, July 1, 2011
Data Model Convert



                        Table           Table

                       Partition




Friday, July 1, 2011
Data Model Convert



                        Table           Table

                       Partition       Partition




Friday, July 1, 2011
Data Model Convert



                        Table           Table

                       Partition       Partition


                       Sampling



Friday, July 1, 2011
Data Model Convert



                        Table           Table

                       Partition       Partition


                       Sampling         Bucket



Friday, July 1, 2011
DataType Convert




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)         TINYINT
                                 INT/BIGINT




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)         TINYINT
                                 INT/BIGINT

               NUMBER(n,m)




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)         TINYINT
                                 INT/BIGINT

               NUMBER(n,m)      FLOAT/DOUBLE




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)         TINYINT
                                 INT/BIGINT

               NUMBER(n,m)      FLOAT/DOUBLE

                  VARCHAR2




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)         TINYINT
                                 INT/BIGINT

               NUMBER(n,m)      FLOAT/DOUBLE

                  VARCHAR2         STRING




Friday, July 1, 2011
DataType Convert


                 NUMBER(n)            TINYINT
                                    INT/BIGINT

               NUMBER(n,m)         FLOAT/DOUBLE

                  VARCHAR2            STRING

                       DATE


Friday, July 1, 2011
DataType Convert


                 NUMBER(n)              TINYINT
                                      INT/BIGINT

               NUMBER(n,m)          FLOAT/DOUBLE

                  VARCHAR2               STRING

                       DATE               STRING
                                   “yyyy-MM-dd HH:mm:ss” format



Friday, July 1, 2011
HIVE DML

                       • HIVE supports ANSI-SQL
                       • Only Support Sub-Queries in FROM clause
                       • Join query : equi-join/inner-join
                                   outer-join
                                   self-join




Friday, July 1, 2011
IN Clause




Friday, July 1, 2011
IN Clause
             IN SubQuery




Friday, July 1, 2011
IN Clause
             IN SubQuery
              SELECT * from Employee e WHERE e.DeptNo

              IN(SELECT d.DeptNo FROM Dept d)




Friday, July 1, 2011
IN Clause
             IN SubQuery
              SELECT * from Employee e WHERE e.DeptNo

              IN(SELECT d.DeptNo FROM Dept d)




              SELECT * from Employee e

              LEFT SEMI JOIN                     Dept d   ON   (e.DeptNo=d.DeptNo)




Friday, July 1, 2011
NOT IN Clause




Friday, July 1, 2011
NOT IN Clause
             NOT IN SubQuery




Friday, July 1, 2011
NOT IN Clause
             NOT IN SubQuery
              SELECT * from Employee e WHERE e.DeptNo

              NOT IN(SELECT               d.DeptNo FROM Dept d)




Friday, July 1, 2011
NOT IN Clause
             NOT IN SubQuery
              SELECT * from Employee e WHERE e.DeptNo

              NOT IN(SELECT                d.DeptNo FROM Dept d)




              SELECT e.* from Employee e

              LEFT OUTER JOIN Dept d ON                    (e.DeptNo=d.DeptNo)

              WHERE d.DeptNo IS NULL



Friday, July 1, 2011
JOIN Operator




Friday, July 1, 2011
JOIN Operator
              JOIN




Friday, July 1, 2011
JOIN Operator
              JOIN
              SELECT *

              FROM       Employee e1, Dept d1   WHERE   e1.ID = d1.Id




Friday, July 1, 2011
JOIN Operator
              JOIN
              SELECT *

              FROM       Employee e1, Dept d1   WHERE   e1.ID = d1.Id




              SELECT *

              FROM Employee e1 JOIN        Dept d1   ON (e1.ID   = d1.Id   )


Friday, July 1, 2011
Oracle Function



Friday, July 1, 2011
Functions




Friday, July 1, 2011
Functions


            Math Function
                        round,ceil,mod,
                       power,sqrt,sin/cos




Friday, July 1, 2011
Functions


            Math Function                          Math Function
                        round,ceil,mod,                  round,ceil,pmod,
                       power,sqrt,sin/cos               power,sqrt,sin/cos




Friday, July 1, 2011
Functions


            Math Function                          Math Function
                        round,ceil,mod,                  round,ceil,pmod,
                       power,sqrt,sin/cos               power,sqrt,sin/cos


     Character Function
                       substr,trim,lpad/rpad
                        ltrim/rtrim,replace




Friday, July 1, 2011
Functions


            Math Function                          Math Function
                        round,ceil,mod,                  round,ceil,pmod,
                       power,sqrt,sin/cos               power,sqrt,sin/cos


     Character Function Character Function
                       substr,trim,lpad/rpad            substr,trim,lpad/rpad
                        ltrim/rtrim,replace         ltrim/rtrim,regexp_replace




Friday, July 1, 2011
Functions


            Math Function                          Math Function
                        round,ceil,mod,                  round,ceil,pmod,
                       power,sqrt,sin/cos               power,sqrt,sin/cos


     Character Function Character Function
                       substr,trim,lpad/rpad            substr,trim,lpad/rpad
                        ltrim/rtrim,replace         ltrim/rtrim,regexp_replace



             NULL Function
                        coalesce,nvl,nvl2




Friday, July 1, 2011
Functions


            Math Function                          Math Function
                        round,ceil,mod,                  round,ceil,pmod,
                       power,sqrt,sin/cos               power,sqrt,sin/cos


     Character Function Character Function
                       substr,trim,lpad/rpad            substr,trim,lpad/rpad
                        ltrim/rtrim,replace         ltrim/rtrim,regexp_replace



             NULL Function                         NULL Function
                        coalesce,nvl,nvl2                     coalesce




Friday, July 1, 2011
Functions


            Math Function                          Math Function
                        round,ceil,mod,                  round,ceil,pmod,
                       power,sqrt,sin/cos               power,sqrt,sin/cos


     Character Function Character Function
                       substr,trim,lpad/rpad            substr,trim,lpad/rpad
                        ltrim/rtrim,replace         ltrim/rtrim,regexp_replace



             NULL Function                         NULL Function
                        coalesce,nvl,nvl2                     coalesce

                                                    No NVL,NVL2
Friday, July 1, 2011
Custom UDF Function
                       •   Condition Function

                           •   DECODE, GREATEST

                       •   Null Comparison Function

                           •   NVL / NVL2

                       •   Type Conversion

                           •   TO_NUMBER

                           •   TO_CHAR

                           •   TO_DATE

                           •   INSTR4

                           •   DATE_FORMAT

                           •   LAST_DAY


Friday, July 1, 2011
Oracle Analytic
                          Function



Friday, July 1, 2011
Analytic Function




Friday, July 1, 2011
Analytic Function
     RANK




Friday, July 1, 2011
Analytic Function
     RANK
      SELECT name,dept,salary,RANK()   OVER (PARTITION BY   dept
      ORDER BY         salary   DESC) FROM   emp




Friday, July 1, 2011
Analytic Function
     RANK
      SELECT name,dept,salary,RANK()     OVER (PARTITION BY     dept
      ORDER BY         salary   DESC) FROM      emp




      SELECT e.name,e.dept,e.salary,RANK(      e.dept,e.salary)
      FROM (SELECT name,        dept, salary   FROM   empDISTRIBUTED
      BY dept SORT BY           dept, salary           DESC) e



Friday, July 1, 2011
Analytic Function
     RANK
      SELECT name,dept,salary,RANK()     OVER (PARTITION BY     dept
      ORDER BY         salary   DESC) FROM      emp




    RANK(arg1,arg2) - Custom UDF
      SELECT e.name,e.dept,e.salary,RANK(      e.dept,e.salary)
      FROM (SELECT name,        dept, salary   FROM   empDISTRIBUTED
      BY dept SORT BY           dept, salary           DESC) e



Friday, July 1, 2011
Analytic Aggregation Function




Friday, July 1, 2011
Analytic Aggregation Function
      MIN




Friday, July 1, 2011
Analytic Aggregation Function
      MIN
      SELECT dept,           MIN(salary) OVER (PARTITION BY   dept)
      FROM             emp




Friday, July 1, 2011
Analytic Aggregation Function
      MIN
      SELECT dept,           MIN(salary) OVER (PARTITION BY       dept)
      FROM             emp




      SELECT dept,tmp.m         FROM emp JOIN (SELECT       dept, MIN(salary) m
      FROM emp           GROUP BY dept) tmp ON emp.dept =   tmp.dept




Friday, July 1, 2011
Analytic Aggregation Function
      MIN
      SELECT dept,           MIN(salary) OVER (PARTITION BY       dept)
      FROM             emp




      Aggregation + JOIN
      SELECT dept,tmp.m         FROM emp JOIN (SELECT       dept, MIN(salary) m
      FROM emp           GROUP BY dept) tmp ON emp.dept =   tmp.dept




Friday, July 1, 2011
Hive Internal



Friday, July 1, 2011
Merge Join Tree Bug

                       • select * from a join b on a.v1 = b.v1
                         join c on a.v1 = c.v1
                         join d on a.v1 = d.v1
                         join e on a.v2 = e.v2


                       • select * from a join e on a.v2 = e.v2
                         join c on a.v1 = c.v1
                         join d on a.v1 = d.v1
                         join b on a.v1 = b.v1


Friday, July 1, 2011
Merge Join Tree Bug

                       • select * from a join b on a.v1 = b.v1
                         join c on a.v1 = c.v1
                         join d on a.v1 = d.v1          educ e #3
                                                  M a pR
                         join e on a.v2 = e.v2


                       • select * from a join e on a.v2 = e.v2
                         join c on a.v1 = c.v1
                         join d on a.v1 = d.v1
                         join b on a.v1 = b.v1


Friday, July 1, 2011
Merge Join Tree Bug

                       • select * from a join b on a.v1 = b.v1
                         join c on a.v1 = c.v1
                         join d on a.v1 = d.v1          educ e #3
                                                  M a pR
                         join e on a.v2 = e.v2


                       • select * from a join e on a.v2 = e.v2
                         join c on a.v1 = c.v1
                         join d on a.v1 = d.v1           duc e #2
                                                  Ma pRe
                         join b on a.v1 = b.v1


Friday, July 1, 2011
Merge Join Tree Bug Fix
                       • SemanticAnalyzer
                          private void mergeJoinTree(QB qb) {


                             QBJoinTree root = qb.getQbJoinTree();
                             QBJoinTree parent = null;
                             while (root != null) {
                                 boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc());

                                 if (parent == null) {
                                       if (merged) {
                                           root = qb.getQbJoinTree();
                                       } else {
                                           parent = root;
                                           root = root.getJoinSrc();
                                       }

                                  } else {
                                     parent = parent.getJoinSrc();
                                     root = parent.getJoinSrc();
                                  }




Friday, July 1, 2011
Merge Join Tree Bug Fix
                       • SemanticAnalyzer
                          private void mergeJoinTree(QB qb) {


                             QBJoinTree root = qb.getQbJoinTree();
                             QBJoinTree parent = null;
                             while (root != null) {
                                 boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc());

                                 if (parent == null) {
                                       if (merged) {
                                           root = qb.getQbJoinTree();
                                       } else {
                                           parent = root;
                                           root = root.getJoinSrc();
                                       }

                                  } else {
                                 } else {
                                   if parent = parent.getJoinSrc();
                                        (merged) {
                                       root = parent.getJoinSrc();
                                        root = qb.getQbJoinTree();
                                  } } else {
                                        parent = parent.getJoinSrc();
                                        root = parent.getJoinSrc();
                                    }
                                 }

Friday, July 1, 2011
New HQL Syntax




Friday, July 1, 2011
New HQL Syntax
      INSERT INTO




Friday, July 1, 2011
New HQL Syntax
      INSERT INTO
      INSERT INTO table VALUES(col1 ... coln)
      SELECT ... FROM tmp ...




Friday, July 1, 2011
New HQL Syntax
      INSERT INTO
      INSERT INTO table VALUES(col1 ... coln)
      SELECT ... FROM tmp ...

          • INSERT [OVERWRITE] destination
           • grammar
           • modify FileSinkPlan
          • New Feature - HIVE-306
           • INSERT INTO destination
Friday, July 1, 2011
Tuning




Friday, July 1, 2011
Tuning
              • Hadoop Tunning




Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task




Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task

                  •    mapred.child.java.opts




Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task

                  •    mapred.child.java.opts

                  •    mapred.min.split.size / mapred.max.split.size




Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task

                  •    mapred.child.java.opts

                  •    mapred.min.split.size / mapred.max.split.size

                  •    dfs.block.size




Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task

                  •    mapred.child.java.opts

                  •    mapred.min.split.size / mapred.max.split.size

                  •    dfs.block.size

              • Hive Tunning


Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task

                  •    mapred.child.java.opts

                  •    mapred.min.split.size / mapred.max.split.size

                  •    dfs.block.size

              • Hive Tunning
                  •    hive.input.format = CombineHiveInputFormat




Friday, July 1, 2011
Tuning
              • Hadoop Tunning
                  •    mapred.job.reuse.jvm.num.task

                  •    mapred.child.java.opts

                  •    mapred.min.split.size / mapred.max.split.size

                  •    dfs.block.size

              • Hive Tunning
                  •    hive.input.format = CombineHiveInputFormat

                  •    query tuning - reduce # of MapReduce
                                      using HQL Plan

Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function
                Develop custom UDF/UDAF/UDTF




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function
                Develop custom UDF/UDAF/UDTF
                Support analytic function




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function
                Develop custom UDF/UDAF/UDTF
                Support analytic function
                  - distributed by + sort by + udf




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function
                Develop custom UDF/UDAF/UDTF
                Support analytic function
                  - distributed by + sort by + udf
                  - join + udf (aggregation)




Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function
                Develop custom UDF/UDAF/UDTF
                Support analytic function
                  - distributed by + sort by + udf
                  - join + udf (aggregation)
                Modify internal hive



Friday, July 1, 2011
Wrap-Up
             Oracle 2 Hive
                Look insight data flow & model
                Modify Oracle SQL to Hive Query Syntax
                Use Built-in function
                Develop custom UDF/UDAF/UDTF
                Support analytic function
                  - distributed by + sort by + udf
                  - join + udf (aggregation)
                Modify internal hive
                Hadoop + Hive Tunning


Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Question ?



Friday, July 1, 2011

Mais conteúdo relacionado

Destaque

HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
Data Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataData Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataDataWorks Summit
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiJoydeep Sen Sarma
 
Hive introduction 介绍
Hive  introduction 介绍Hive  introduction 介绍
Hive introduction 介绍ablozhou
 
User-Defined Table Generating Functions
User-Defined Table Generating FunctionsUser-Defined Table Generating Functions
User-Defined Table Generating Functionspauly1
 
Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConDatacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConamarsri
 
Hive - SerDe and LazySerde
Hive - SerDe and LazySerdeHive - SerDe and LazySerde
Hive - SerDe and LazySerdeZheng Shao
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hiveReza Ameri
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveWill Du
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010ragho
 
Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive	Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive Alex Silva
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014alanfgates
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Thomas Vanhove
 
Hive Object Model
Hive Object ModelHive Object Model
Hive Object ModelZheng Shao
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008athusoo
 

Destaque (20)

HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Data Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your DataData Discovery on Hadoop - Realizing the Full Potential of your Data
Data Discovery on Hadoop - Realizing the Full Potential of your Data
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
 
Hive introduction 介绍
Hive  introduction 介绍Hive  introduction 介绍
Hive introduction 介绍
 
User-Defined Table Generating Functions
User-Defined Table Generating FunctionsUser-Defined Table Generating Functions
User-Defined Table Generating Functions
 
Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConDatacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheCon
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
Hive - SerDe and LazySerde
Hive - SerDe and LazySerdeHive - SerDe and LazySerde
Hive - SerDe and LazySerde
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hive
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache Hive
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
 
Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive	Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
 
20081009nychive
20081009nychive20081009nychive
20081009nychive
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 
Hive Object Model
Hive Object ModelHive Object Model
Hive Object Model
 
Hive Apachecon 2008
Hive Apachecon 2008Hive Apachecon 2008
Hive Apachecon 2008
 

Semelhante a Replacing Telco DB/DW to Hadoop and Hive

Can Metadata Keep Libraries Relevant?
Can Metadata Keep Libraries Relevant?Can Metadata Keep Libraries Relevant?
Can Metadata Keep Libraries Relevant?Richard Wallis
 
Javascript Views, Client-side or Server-side with NodeJS
Javascript Views, Client-side or Server-side with NodeJSJavascript Views, Client-side or Server-side with NodeJS
Javascript Views, Client-side or Server-side with NodeJSSylvain Zimmer
 
Deloit the next step in corporate IT
Deloit the next step in corporate ITDeloit the next step in corporate IT
Deloit the next step in corporate ITVincent Everts
 
The Digital Toolbox - a discussion -Science Online '11
The Digital Toolbox - a discussion -Science Online '11The Digital Toolbox - a discussion -Science Online '11
The Digital Toolbox - a discussion -Science Online '11Kaitlin Thaney
 
"Data in the Digital Age" - Hadoop Big Data Meetup
"Data in the Digital Age" - Hadoop Big Data Meetup"Data in the Digital Age" - Hadoop Big Data Meetup
"Data in the Digital Age" - Hadoop Big Data MeetupKaitlin Thaney
 
Gtmf2011 2011.06.07 slideshare
Gtmf2011 2011.06.07 slideshareGtmf2011 2011.06.07 slideshare
Gtmf2011 2011.06.07 slideshareHiroki Omae
 
"The Reality of Digital Science"
"The Reality of Digital Science""The Reality of Digital Science"
"The Reality of Digital Science"Kaitlin Thaney
 
Puppet camp europe 2011 hackability
Puppet camp europe 2011   hackabilityPuppet camp europe 2011   hackability
Puppet camp europe 2011 hackabilityPuppet
 
iPhone Python love affair
iPhone Python love affairiPhone Python love affair
iPhone Python love affairAnna Callahan
 
Slides for millfield
Slides for millfieldSlides for millfield
Slides for millfieldjuliancoultas
 
Inspiratiemiddag_Vincent_Everts_Finalist generatie_einstein_komt_eraan_07042011
Inspiratiemiddag_Vincent_Everts_Finalist generatie_einstein_komt_eraan_07042011Inspiratiemiddag_Vincent_Everts_Finalist generatie_einstein_komt_eraan_07042011
Inspiratiemiddag_Vincent_Everts_Finalist generatie_einstein_komt_eraan_07042011Finalist - open IT oplossingen
 
Choosing the right Content Management System
Choosing the right Content Management SystemChoosing the right Content Management System
Choosing the right Content Management SystemRachel Andrew
 
Selenium Page Objects101
Selenium Page Objects101Selenium Page Objects101
Selenium Page Objects101Adam Goucher
 
Agile brazil 2011 individuals and interactions over processes and tools
Agile brazil 2011   individuals and interactions over processes and toolsAgile brazil 2011   individuals and interactions over processes and tools
Agile brazil 2011 individuals and interactions over processes and toolsDavid Paniz
 
Erlang: Bult for concurrent, distributed systems
Erlang: Bult for concurrent, distributed systemsErlang: Bult for concurrent, distributed systems
Erlang: Bult for concurrent, distributed systemsKen Pratt
 
iPhone App from concept to product
iPhone App from concept to productiPhone App from concept to product
iPhone App from concept to productjoeysim
 

Semelhante a Replacing Telco DB/DW to Hadoop and Hive (20)

Can Metadata Keep Libraries Relevant?
Can Metadata Keep Libraries Relevant?Can Metadata Keep Libraries Relevant?
Can Metadata Keep Libraries Relevant?
 
Javascript Views, Client-side or Server-side with NodeJS
Javascript Views, Client-side or Server-side with NodeJSJavascript Views, Client-side or Server-side with NodeJS
Javascript Views, Client-side or Server-side with NodeJS
 
Deloit the next step in corporate IT
Deloit the next step in corporate ITDeloit the next step in corporate IT
Deloit the next step in corporate IT
 
The Digital Toolbox - a discussion -Science Online '11
The Digital Toolbox - a discussion -Science Online '11The Digital Toolbox - a discussion -Science Online '11
The Digital Toolbox - a discussion -Science Online '11
 
"Data in the Digital Age" - Hadoop Big Data Meetup
"Data in the Digital Age" - Hadoop Big Data Meetup"Data in the Digital Age" - Hadoop Big Data Meetup
"Data in the Digital Age" - Hadoop Big Data Meetup
 
Project management
Project managementProject management
Project management
 
Gtmf2011 2011.06.07 slideshare
Gtmf2011 2011.06.07 slideshareGtmf2011 2011.06.07 slideshare
Gtmf2011 2011.06.07 slideshare
 
"The Reality of Digital Science"
"The Reality of Digital Science""The Reality of Digital Science"
"The Reality of Digital Science"
 
Puppet camp europe 2011 hackability
Puppet camp europe 2011   hackabilityPuppet camp europe 2011   hackability
Puppet camp europe 2011 hackability
 
iPhone Python love affair
iPhone Python love affairiPhone Python love affair
iPhone Python love affair
 
Slides for millfield
Slides for millfieldSlides for millfield
Slides for millfield
 
Sera que?
Sera que?Sera que?
Sera que?
 
Einstein finalist.nl
Einstein finalist.nlEinstein finalist.nl
Einstein finalist.nl
 
Inspiratiemiddag_Vincent_Everts_Finalist generatie_einstein_komt_eraan_07042011
Inspiratiemiddag_Vincent_Everts_Finalist generatie_einstein_komt_eraan_07042011Inspiratiemiddag_Vincent_Everts_Finalist generatie_einstein_komt_eraan_07042011
Inspiratiemiddag_Vincent_Everts_Finalist generatie_einstein_komt_eraan_07042011
 
Choosing the right Content Management System
Choosing the right Content Management SystemChoosing the right Content Management System
Choosing the right Content Management System
 
Selenium Page Objects101
Selenium Page Objects101Selenium Page Objects101
Selenium Page Objects101
 
Agile brazil 2011 individuals and interactions over processes and tools
Agile brazil 2011   individuals and interactions over processes and toolsAgile brazil 2011   individuals and interactions over processes and tools
Agile brazil 2011 individuals and interactions over processes and tools
 
Erlang: Bult for concurrent, distributed systems
Erlang: Bult for concurrent, distributed systemsErlang: Bult for concurrent, distributed systems
Erlang: Bult for concurrent, distributed systems
 
iPhone App from concept to product
iPhone App from concept to productiPhone App from concept to product
iPhone App from concept to product
 
Google vs Apple
Google vs AppleGoogle vs Apple
Google vs Apple
 

Último

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Último (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Replacing Telco DB/DW to Hadoop and Hive

  • 1. Replacing Telco DB/DW to Hadoop and Hive JunHo Cho Data Analysis Platform Team Friday, July 1, 2011
  • 2. Cloud Computing Platform - Xen • Cloud Storage Platform - hadoop • Massive Email Archiving Solution - hadoop, lucene • HIVE : social network analysis using email • Log Archiving Solution - hadoop • Data Analysis data mining, machine learning, data statistic • Data Platform - hadoop, lucene, hive • Cloud Architecture - KT Cloud Friday, July 1, 2011
  • 20. OpenSource Storage & Computing Friday, July 1, 2011
  • 22. OpenSource Collection Friday, July 1, 2011
  • 24. OpenSource Search Friday, July 1, 2011
  • 26. OpenSource Analysis Friday, July 1, 2011
  • 28. OpenSource Coordination Friday, July 1, 2011
  • 34. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result Friday, July 1, 2011
  • 35. Hive Architecture UI Driver select col1 from tab1 where ... DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result Friday, July 1, 2011
  • 36. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result Friday, July 1, 2011
  • 37. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result Friday, July 1, 2011
  • 38. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result Friday, July 1, 2011
  • 39. Hive Architecture a 123344 b 121211 c 342434 UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result Friday, July 1, 2011
  • 40. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 41. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 42. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5 Friday, July 1, 2011
  • 43. Parser Parser Select col1,col2 From tab1 Where col3 > 5 QB TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5 Friday, July 1, 2011
  • 44. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR QB tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5 Friday, July 1, 2011
  • 45. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5 QB insclause-0 Friday, July 1, 2011
  • 46. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE col1 QB TOK_TABLE_OR_COL 5 insclause-0 Friday, July 1, 2011
  • 47. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL col1 col2 QB TOK_TMP_FILE TOK_TABLE_OR_COL 5 insclause-0 Friday, July 1, 2011
  • 48. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE QB TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL col1 col2 TOK_TMP_FILE TOK_TABLE_OR_COL 5 insclause-0 Friday, July 1, 2011
  • 49. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 50. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 51. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB Friday, July 1, 2011
  • 52. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TOK_WHERE TOK_SELECT TOK_DESTINATION Friday, July 1, 2011
  • 53. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE TOK_SELECT TOK_DESTINATION Friday, July 1, 2011
  • 54. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE TOK_SELECT TOK_DESTINATION Friday, July 1, 2011
  • 55. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT TOK_DESTINATION Friday, July 1, 2011
  • 56. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT TOK_DESTINATION Friday, July 1, 2011
  • 57. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT SelectOperator TOK_DESTINATION Friday, July 1, 2011
  • 58. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT SelectOperator TOK_DESTINATION Friday, July 1, 2011
  • 59. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT SelectOperator TOK_DESTINATION FileSinkOperator Friday, July 1, 2011
  • 60. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 61. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 62. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 TableScanOperator FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 63. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 64. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 65. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} Context TableScanOperator FilterOperator ColumnPruner SelectOperator FileSinkOperator Friday, July 1, 2011
  • 66. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} Context TableScanOperator FilterOperator FIL ColumnPruner TS SEL SelectOperator FileSinkOperator Friday, July 1, 2011
  • 67. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator FIL ColumnPruner TS SEL SelectOperator FileSinkOperator Context Friday, July 1, 2011
  • 68. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner SelectOperator FIL FileSinkOperator Context TS SEL Friday, July 1, 2011
  • 69. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner FIL SelectOperator Context TS SEL FileSinkOperator Friday, July 1, 2011
  • 70. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner FIL SelectOperator Context TS SEL col1, col2 FileSinkOperator Friday, July 1, 2011
  • 71. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner FIL SelectOperator Context TS SEL FileSinkOperator Friday, July 1, 2011
  • 72. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FIL col1, col2, col3 FilterOperator Context TS ColumnPruner SEL SelectOperator FileSinkOperator Friday, July 1, 2011
  • 73. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FIL FilterOperator Context TS ColumnPruner SEL SelectOperator FileSinkOperator Friday, July 1, 2011
  • 74. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} FIL TableScanOperator Context TS col1, col2, col3 SEL FilterOperator ColumnPruner FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 75. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 76. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 77. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB Friday, July 1, 2011
  • 78. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB FetchTask Friday, July 1, 2011
  • 79. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 80. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 81. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 82. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 83. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 84. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 85. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 86. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 87. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory QB MapRedTask MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator Friday, July 1, 2011
  • 88. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF FILOperator SELOperator MetaStore Hive QL FILOperator FSOperator Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 89. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF FILOperator SELOperator MetaStore Hive QL FILOperator FSOperator Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase Friday, July 1, 2011
  • 90. Oracle Migration to Hive Friday, July 1, 2011
  • 91. l l l l Friday, July 1, 2011
  • 92. l l l l l l l l Friday, July 1, 2011
  • 93. l l l l l l l l Friday, July 1, 2011
  • 94. Understand Oracle SQL • more than 3000 ETL SQL • understand Data-Flow • Group similar SQL Pattern • Investigate used Oracle Function Friday, July 1, 2011
  • 97. Data Model Convert Table Friday, July 1, 2011
  • 98. Data Model Convert Table Table Friday, July 1, 2011
  • 99. Data Model Convert Table Table Partition Friday, July 1, 2011
  • 100. Data Model Convert Table Table Partition Partition Friday, July 1, 2011
  • 101. Data Model Convert Table Table Partition Partition Sampling Friday, July 1, 2011
  • 102. Data Model Convert Table Table Partition Partition Sampling Bucket Friday, July 1, 2011
  • 104. DataType Convert NUMBER(n) Friday, July 1, 2011
  • 105. DataType Convert NUMBER(n) TINYINT INT/BIGINT Friday, July 1, 2011
  • 106. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) Friday, July 1, 2011
  • 107. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE Friday, July 1, 2011
  • 108. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 Friday, July 1, 2011
  • 109. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 STRING Friday, July 1, 2011
  • 110. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 STRING DATE Friday, July 1, 2011
  • 111. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 STRING DATE STRING “yyyy-MM-dd HH:mm:ss” format Friday, July 1, 2011
  • 112. HIVE DML • HIVE supports ANSI-SQL • Only Support Sub-Queries in FROM clause • Join query : equi-join/inner-join outer-join self-join Friday, July 1, 2011
  • 114. IN Clause IN SubQuery Friday, July 1, 2011
  • 115. IN Clause IN SubQuery SELECT * from Employee e WHERE e.DeptNo IN(SELECT d.DeptNo FROM Dept d) Friday, July 1, 2011
  • 116. IN Clause IN SubQuery SELECT * from Employee e WHERE e.DeptNo IN(SELECT d.DeptNo FROM Dept d) SELECT * from Employee e LEFT SEMI JOIN Dept d ON (e.DeptNo=d.DeptNo) Friday, July 1, 2011
  • 117. NOT IN Clause Friday, July 1, 2011
  • 118. NOT IN Clause NOT IN SubQuery Friday, July 1, 2011
  • 119. NOT IN Clause NOT IN SubQuery SELECT * from Employee e WHERE e.DeptNo NOT IN(SELECT d.DeptNo FROM Dept d) Friday, July 1, 2011
  • 120. NOT IN Clause NOT IN SubQuery SELECT * from Employee e WHERE e.DeptNo NOT IN(SELECT d.DeptNo FROM Dept d) SELECT e.* from Employee e LEFT OUTER JOIN Dept d ON (e.DeptNo=d.DeptNo) WHERE d.DeptNo IS NULL Friday, July 1, 2011
  • 122. JOIN Operator JOIN Friday, July 1, 2011
  • 123. JOIN Operator JOIN SELECT * FROM Employee e1, Dept d1 WHERE e1.ID = d1.Id Friday, July 1, 2011
  • 124. JOIN Operator JOIN SELECT * FROM Employee e1, Dept d1 WHERE e1.ID = d1.Id SELECT * FROM Employee e1 JOIN Dept d1 ON (e1.ID = d1.Id ) Friday, July 1, 2011
  • 127. Functions Math Function round,ceil,mod, power,sqrt,sin/cos Friday, July 1, 2011
  • 128. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Friday, July 1, 2011
  • 129. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function substr,trim,lpad/rpad ltrim/rtrim,replace Friday, July 1, 2011
  • 130. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function Character Function substr,trim,lpad/rpad substr,trim,lpad/rpad ltrim/rtrim,replace ltrim/rtrim,regexp_replace Friday, July 1, 2011
  • 131. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function Character Function substr,trim,lpad/rpad substr,trim,lpad/rpad ltrim/rtrim,replace ltrim/rtrim,regexp_replace NULL Function coalesce,nvl,nvl2 Friday, July 1, 2011
  • 132. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function Character Function substr,trim,lpad/rpad substr,trim,lpad/rpad ltrim/rtrim,replace ltrim/rtrim,regexp_replace NULL Function NULL Function coalesce,nvl,nvl2 coalesce Friday, July 1, 2011
  • 133. Functions Math Function Math Function round,ceil,mod, round,ceil,pmod, power,sqrt,sin/cos power,sqrt,sin/cos Character Function Character Function substr,trim,lpad/rpad substr,trim,lpad/rpad ltrim/rtrim,replace ltrim/rtrim,regexp_replace NULL Function NULL Function coalesce,nvl,nvl2 coalesce No NVL,NVL2 Friday, July 1, 2011
  • 134. Custom UDF Function • Condition Function • DECODE, GREATEST • Null Comparison Function • NVL / NVL2 • Type Conversion • TO_NUMBER • TO_CHAR • TO_DATE • INSTR4 • DATE_FORMAT • LAST_DAY Friday, July 1, 2011
  • 135. Oracle Analytic Function Friday, July 1, 2011
  • 137. Analytic Function RANK Friday, July 1, 2011
  • 138. Analytic Function RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM emp Friday, July 1, 2011
  • 139. Analytic Function RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM emp SELECT e.name,e.dept,e.salary,RANK( e.dept,e.salary) FROM (SELECT name, dept, salary FROM empDISTRIBUTED BY dept SORT BY dept, salary DESC) e Friday, July 1, 2011
  • 140. Analytic Function RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM emp RANK(arg1,arg2) - Custom UDF SELECT e.name,e.dept,e.salary,RANK( e.dept,e.salary) FROM (SELECT name, dept, salary FROM empDISTRIBUTED BY dept SORT BY dept, salary DESC) e Friday, July 1, 2011
  • 142. Analytic Aggregation Function MIN Friday, July 1, 2011
  • 143. Analytic Aggregation Function MIN SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp Friday, July 1, 2011
  • 144. Analytic Aggregation Function MIN SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp SELECT dept,tmp.m FROM emp JOIN (SELECT dept, MIN(salary) m FROM emp GROUP BY dept) tmp ON emp.dept = tmp.dept Friday, July 1, 2011
  • 145. Analytic Aggregation Function MIN SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp Aggregation + JOIN SELECT dept,tmp.m FROM emp JOIN (SELECT dept, MIN(salary) m FROM emp GROUP BY dept) tmp ON emp.dept = tmp.dept Friday, July 1, 2011
  • 147. Merge Join Tree Bug • select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join e on a.v2 = e.v2 • select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join b on a.v1 = b.v1 Friday, July 1, 2011
  • 148. Merge Join Tree Bug • select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 educ e #3 M a pR join e on a.v2 = e.v2 • select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join b on a.v1 = b.v1 Friday, July 1, 2011
  • 149. Merge Join Tree Bug • select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 educ e #3 M a pR join e on a.v2 = e.v2 • select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 duc e #2 Ma pRe join b on a.v1 = b.v1 Friday, July 1, 2011
  • 150. Merge Join Tree Bug Fix • SemanticAnalyzer private void mergeJoinTree(QB qb) { QBJoinTree root = qb.getQbJoinTree(); QBJoinTree parent = null; while (root != null) { boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc()); if (parent == null) { if (merged) { root = qb.getQbJoinTree(); } else { parent = root; root = root.getJoinSrc(); } } else { parent = parent.getJoinSrc(); root = parent.getJoinSrc(); } Friday, July 1, 2011
  • 151. Merge Join Tree Bug Fix • SemanticAnalyzer private void mergeJoinTree(QB qb) { QBJoinTree root = qb.getQbJoinTree(); QBJoinTree parent = null; while (root != null) { boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc()); if (parent == null) { if (merged) { root = qb.getQbJoinTree(); } else { parent = root; root = root.getJoinSrc(); } } else { } else { if parent = parent.getJoinSrc(); (merged) { root = parent.getJoinSrc(); root = qb.getQbJoinTree(); } } else { parent = parent.getJoinSrc(); root = parent.getJoinSrc(); } } Friday, July 1, 2011
  • 152. New HQL Syntax Friday, July 1, 2011
  • 153. New HQL Syntax INSERT INTO Friday, July 1, 2011
  • 154. New HQL Syntax INSERT INTO INSERT INTO table VALUES(col1 ... coln) SELECT ... FROM tmp ... Friday, July 1, 2011
  • 155. New HQL Syntax INSERT INTO INSERT INTO table VALUES(col1 ... coln) SELECT ... FROM tmp ... • INSERT [OVERWRITE] destination • grammar • modify FileSinkPlan • New Feature - HIVE-306 • INSERT INTO destination Friday, July 1, 2011
  • 157. Tuning • Hadoop Tunning Friday, July 1, 2011
  • 158. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task Friday, July 1, 2011
  • 159. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts Friday, July 1, 2011
  • 160. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size Friday, July 1, 2011
  • 161. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size Friday, July 1, 2011
  • 162. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size • Hive Tunning Friday, July 1, 2011
  • 163. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size • Hive Tunning • hive.input.format = CombineHiveInputFormat Friday, July 1, 2011
  • 164. Tuning • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size • Hive Tunning • hive.input.format = CombineHiveInputFormat • query tuning - reduce # of MapReduce using HQL Plan Friday, July 1, 2011
  • 165. Wrap-Up Oracle 2 Hive Friday, July 1, 2011
  • 166. Wrap-Up Oracle 2 Hive Look insight data flow & model Friday, July 1, 2011
  • 167. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Friday, July 1, 2011
  • 168. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Friday, July 1, 2011
  • 169. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Friday, July 1, 2011
  • 170. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function Friday, July 1, 2011
  • 171. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function - distributed by + sort by + udf Friday, July 1, 2011
  • 172. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function - distributed by + sort by + udf - join + udf (aggregation) Friday, July 1, 2011
  • 173. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function - distributed by + sort by + udf - join + udf (aggregation) Modify internal hive Friday, July 1, 2011
  • 174. Wrap-Up Oracle 2 Hive Look insight data flow & model Modify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF Support analytic function - distributed by + sort by + udf - join + udf (aggregation) Modify internal hive Hadoop + Hive Tunning Friday, July 1, 2011