SlideShare uma empresa Scribd logo
1 de 12
Baixar para ler offline
APACHE TAJO
By Asis Mohanty,
CBIP, CDMP
asismohanty@gmail.com
Tajo: A Data Warehouse System
• Apache Top Level Project
• Distributed and scalable data warehouse system on Hadoop
• Low latency and long running batch queries in a single system
• Features:
 ANSI/ISO SQL compliance
 Mature SQL features: Join, Group by, Order By, Aggregation
and windows functions
 Supports Partition table
 Supports Java/Python, UDF
 JDBC & Java based asynchronous API
 Supports Read/Write of CSV, JSON, RCFile, Sequential file,
Parquet and ORC
Tajo Architecture
Where to use Tajo
• Extraction, Transformation, Loading (ETL)
• Interactive BI/ Analytics on web-scale Big Data
• Data Discovery/ Exploratory analysis with R and existing SQL tools
• Query federation
• Customer wants a unified system for batch and interactive queries
on Hadoop, Amazon S3 or Hbase
• Customer wants to use mixed use of Hadoop-based DW and
RDBMS-based DW or want to replace RDBMS DW.
• Customer wants to use existing SQL tools on Hadoop DW
Hbase Storage Support
• You can use SQL to access Hbase tables.
• TAJO supports Hbase storage
• CREATE(EXTERNAL)/DROP/INSERT/OVERWRITE
• Create TABLE hbase_t1 (Key TEXT, Col1 TEXT, Col2 Int) USING
HBASE (
‘table’ = ‘t1’,
‘columns’ = ‘:key,cf1:col1,cf2:col2’,
‘hbase.zookeper.quorum’ = ‘host1:2181,host2:2181’
Tajo Shell (TSQL)
Tajo provides a shell utility named Tsql. It is a command-line interface
(CLI) where users can create or drop tables, inspect schema and
query tables, etc.
• Meta Commands
• Executing HDFS commands
• Session Variables
• Administration Commands
• Introducing to TSQL
• Executing a single command
• Executing Queries from Files
• Executing as background process
Refer: http://tajo.apache.org/docs/current/index.html
Tajo SQL Language (DDL)
CREATE DATABASE
CREATE DATABASE [IF NOT EXISTS] <database_name>
DROP DATABASE
DROP DATABASE [IF EXISTS] <database_name>
CREATE TABLE
CREATE TABLE [IF NOT EXISTS] <table_name> [(<column_name> <data_type>, ... )]
[using <storage_type> [with (<key> = <value>, ...)]] [AS <select_statement>]
CREATE EXTERNAL TABLE [IF NOT EXISTS] <table_name> (<column_name> <data_type>, ... )
using <storage_type> [with (<key> = <value>, ...)] LOCATION '<path>'
Compression
L_ORDERKEY bigint,
L_PARTKEY bigint,
...
L_COMMENT text)
USING TEXT WITH ('text.delimiter'='|','compression.codec'='org.apache.hadoop.io.compress.DeflateCodec')
LOCATION 'hdfs://localhost:9010/tajo/warehouse/lineitem_100_snappy';
DROP TABLE
DROP TABLE [IF EXISTS] <table_name> [PURGE]
CREATE INDEX
CREATE INDEX [ name ] ON table_name [ USING method ]
( { column_name | ( expression ) } [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WHERE predicate ]
DROP INDEX
DROP INDEX name
INSERT (OVERWRITE) INTO
INSERT OVERWRITE statement overwrites a table data of an existing table or a data in a given
directory. Tajo’s INSERT OVERWRITE statement follows INSERT INTO SELECT statement of
SQL. The examples are as follows:
create table t1 (col1 int8, col2 int4, col3 float8);
-- when a target table schema and output schema are equivalent to each other
INSERT OVERWRITE INTO t1 SELECT l_orderkey, l_partkey, l_quantity FROM
lineitem;
-- or
INSERT OVERWRITE INTO t1 SELECT * FROM lineitem;
-- when the output schema are smaller than the target table schema
INSERT OVERWRITE INTO t1 SELECT l_orderkey FROM lineitem;
-- when you want to specify certain target columns
INSERT OVERWRITE INTO t1 (col1, col3) SELECT l_orderkey, l_quantity FROM
lineitem;
In addition, INSERT OVERWRITE statement overwrites table data as well as a specific
directory.
INSERT OVERWRITE INTO LOCATION '/dir/subdir' SELECT l_orderkey, l_quantity FROM lineitem;
Tajo Queries
Sample Query
SELECT [distinct [all]] * | <expression> [[AS] <alias>] [, ...]
[FROM <table reference> [[AS] <table alias name>] [, ...]]
[WHERE <condition>]
[GROUP BY <expression> [, ...]]
[HAVING <condition>]
[ORDER BY <expression> [ASC|DESC] [NULL FIRST|NULL LAST] [, ...]]
Table and Table Aliases
A temporary name can be given to tables and complex table references to be used for references to
the derived table in the rest of the query. This is called a table alias.
FROM table_reference AS alias or FROM table_reference alias
Window Functions
A window function performs a calculation across multiple table rows that belong to some window
frame.
SELECT ...., func(param) OVER ([PARTITION BY partition-expr [, ...]] [ORDER BY sort-expr [, ...]]),
...., FROM
Better SQL support via thin JDBC
ETL Tools BI Tools Reporting Tools
TAJO CLUSTER
Tajo JDBC
HDFS HBase S3 Swift
Dataset Stored in various Formats/Storage
Which one to choose
Impala Vs Presto Vs Drill Vs Tajo

Mais conteúdo relacionado

Mais procurados

Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...
UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...
UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...Marco Gralike
 
Oracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesOracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesMarco Gralike
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrKai Chan
 
MS SQL Database basic
MS SQL Database basicMS SQL Database basic
MS SQL Database basicwali1195189
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solrmacrochen
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain ExplainedJeremy Coates
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1Marco Gralike
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
Rapid Development of Data Generators Using Meta Generators in PDGF
Rapid Development of Data Generators Using Meta Generators in PDGFRapid Development of Data Generators Using Meta Generators in PDGF
Rapid Development of Data Generators Using Meta Generators in PDGFTilmann Rabl
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 

Mais procurados (20)

Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...
UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...
UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...
 
Moving Data to and From R
Moving Data to and From RMoving Data to and From R
Moving Data to and From R
 
Oracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesOracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New Features
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
 
IR SQLite Session #1
IR SQLite Session #1IR SQLite Session #1
IR SQLite Session #1
 
MS SQL Database basic
MS SQL Database basicMS SQL Database basic
MS SQL Database basic
 
Sql killedserver
Sql killedserverSql killedserver
Sql killedserver
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
Hive commands
Hive commandsHive commands
Hive commands
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain Explained
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Rapid Development of Data Generators Using Meta Generators in PDGF
Rapid Development of Data Generators Using Meta Generators in PDGFRapid Development of Data Generators Using Meta Generators in PDGF
Rapid Development of Data Generators Using Meta Generators in PDGF
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Lab2 ddl commands
Lab2 ddl commandsLab2 ddl commands
Lab2 ddl commands
 

Semelhante a Apache TAJO

Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Hyunsik Choi
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataGruter
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondGruter
 
Introduction to Oracle Database.pptx
Introduction to Oracle Database.pptxIntroduction to Oracle Database.pptx
Introduction to Oracle Database.pptxSiddhantBhardwaj26
 
SQL.pptx for the begineers and good know
SQL.pptx for the begineers and good knowSQL.pptx for the begineers and good know
SQL.pptx for the begineers and good knowPavithSingh
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveWill Du
 
SQL SERVER Training in Pune Slides
SQL SERVER Training in Pune SlidesSQL SERVER Training in Pune Slides
SQL SERVER Training in Pune Slidesenosislearningcom
 
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfytxjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfytWrushabhShirsat3
 
Relational Database Language.pptx
Relational Database Language.pptxRelational Database Language.pptx
Relational Database Language.pptxSheethal Aji Mani
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
Hive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupHive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupRemus Rusanu
 
Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Managing user Online Training in IBM Netezza DBA Development by www.etraining...Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Managing user Online Training in IBM Netezza DBA Development by www.etraining...Ravikumar Nandigam
 

Semelhante a Apache TAJO (20)

Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Vertica-Database
Vertica-DatabaseVertica-Database
Vertica-Database
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
Introduction to Oracle Database.pptx
Introduction to Oracle Database.pptxIntroduction to Oracle Database.pptx
Introduction to Oracle Database.pptx
 
Les10.ppt
Les10.pptLes10.ppt
Les10.ppt
 
SQL.pptx for the begineers and good know
SQL.pptx for the begineers and good knowSQL.pptx for the begineers and good know
SQL.pptx for the begineers and good know
 
Module 3
Module 3Module 3
Module 3
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache Hive
 
SQL SERVER Training in Pune Slides
SQL SERVER Training in Pune SlidesSQL SERVER Training in Pune Slides
SQL SERVER Training in Pune Slides
 
Database
Database Database
Database
 
MySQL Essential Training
MySQL Essential TrainingMySQL Essential Training
MySQL Essential Training
 
unit-ii.pptx
unit-ii.pptxunit-ii.pptx
unit-ii.pptx
 
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfytxjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
 
Relational Database Language.pptx
Relational Database Language.pptxRelational Database Language.pptx
Relational Database Language.pptx
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
Hive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupHive @ Bucharest Java User Group
Hive @ Bucharest Java User Group
 
Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Managing user Online Training in IBM Netezza DBA Development by www.etraining...Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Managing user Online Training in IBM Netezza DBA Development by www.etraining...
 
Sql for dbaspresentation
Sql for dbaspresentationSql for dbaspresentation
Sql for dbaspresentation
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 

Mais de Asis Mohanty

Cloud Data Warehouses
Cloud Data WarehousesCloud Data Warehouses
Cloud Data WarehousesAsis Mohanty
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0Asis Mohanty
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseAsis Mohanty
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataAsis Mohanty
 
ETL tool evaluation criteria
ETL tool evaluation criteriaETL tool evaluation criteria
ETL tool evaluation criteriaAsis Mohanty
 
Cognos vs Hyperion vs SSAS Comparison
Cognos vs Hyperion vs SSAS ComparisonCognos vs Hyperion vs SSAS Comparison
Cognos vs Hyperion vs SSAS ComparisonAsis Mohanty
 
Reporting/Dashboard Evaluations
Reporting/Dashboard EvaluationsReporting/Dashboard Evaluations
Reporting/Dashboard EvaluationsAsis Mohanty
 
Oracle to Netezza Migration Casestudy
Oracle to Netezza Migration CasestudyOracle to Netezza Migration Casestudy
Oracle to Netezza Migration CasestudyAsis Mohanty
 
BI Error Processing Framework
BI Error Processing FrameworkBI Error Processing Framework
BI Error Processing FrameworkAsis Mohanty
 
Netezza vs teradata
Netezza vs teradataNetezza vs teradata
Netezza vs teradataAsis Mohanty
 
Change data capture the journey to real time bi
Change data capture the journey to real time biChange data capture the journey to real time bi
Change data capture the journey to real time biAsis Mohanty
 

Mais de Asis Mohanty (14)

Cloud Data Warehouses
Cloud Data WarehousesCloud Data Warehouses
Cloud Data Warehouses
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
 
ETL tool evaluation criteria
ETL tool evaluation criteriaETL tool evaluation criteria
ETL tool evaluation criteria
 
COGNOS Vs OBIEE
COGNOS Vs OBIEECOGNOS Vs OBIEE
COGNOS Vs OBIEE
 
Cognos vs Hyperion vs SSAS Comparison
Cognos vs Hyperion vs SSAS ComparisonCognos vs Hyperion vs SSAS Comparison
Cognos vs Hyperion vs SSAS Comparison
 
Reporting/Dashboard Evaluations
Reporting/Dashboard EvaluationsReporting/Dashboard Evaluations
Reporting/Dashboard Evaluations
 
Oracle to Netezza Migration Casestudy
Oracle to Netezza Migration CasestudyOracle to Netezza Migration Casestudy
Oracle to Netezza Migration Casestudy
 
BI Error Processing Framework
BI Error Processing FrameworkBI Error Processing Framework
BI Error Processing Framework
 
Netezza vs teradata
Netezza vs teradataNetezza vs teradata
Netezza vs teradata
 
Change data capture the journey to real time bi
Change data capture the journey to real time biChange data capture the journey to real time bi
Change data capture the journey to real time bi
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Apache TAJO

  • 1. APACHE TAJO By Asis Mohanty, CBIP, CDMP asismohanty@gmail.com
  • 2. Tajo: A Data Warehouse System • Apache Top Level Project • Distributed and scalable data warehouse system on Hadoop • Low latency and long running batch queries in a single system • Features:  ANSI/ISO SQL compliance  Mature SQL features: Join, Group by, Order By, Aggregation and windows functions  Supports Partition table  Supports Java/Python, UDF  JDBC & Java based asynchronous API  Supports Read/Write of CSV, JSON, RCFile, Sequential file, Parquet and ORC
  • 4. Where to use Tajo • Extraction, Transformation, Loading (ETL) • Interactive BI/ Analytics on web-scale Big Data • Data Discovery/ Exploratory analysis with R and existing SQL tools • Query federation • Customer wants a unified system for batch and interactive queries on Hadoop, Amazon S3 or Hbase • Customer wants to use mixed use of Hadoop-based DW and RDBMS-based DW or want to replace RDBMS DW. • Customer wants to use existing SQL tools on Hadoop DW
  • 5. Hbase Storage Support • You can use SQL to access Hbase tables. • TAJO supports Hbase storage • CREATE(EXTERNAL)/DROP/INSERT/OVERWRITE • Create TABLE hbase_t1 (Key TEXT, Col1 TEXT, Col2 Int) USING HBASE ( ‘table’ = ‘t1’, ‘columns’ = ‘:key,cf1:col1,cf2:col2’, ‘hbase.zookeper.quorum’ = ‘host1:2181,host2:2181’
  • 6. Tajo Shell (TSQL) Tajo provides a shell utility named Tsql. It is a command-line interface (CLI) where users can create or drop tables, inspect schema and query tables, etc. • Meta Commands • Executing HDFS commands • Session Variables • Administration Commands • Introducing to TSQL • Executing a single command • Executing Queries from Files • Executing as background process Refer: http://tajo.apache.org/docs/current/index.html
  • 7. Tajo SQL Language (DDL) CREATE DATABASE CREATE DATABASE [IF NOT EXISTS] <database_name> DROP DATABASE DROP DATABASE [IF EXISTS] <database_name> CREATE TABLE CREATE TABLE [IF NOT EXISTS] <table_name> [(<column_name> <data_type>, ... )] [using <storage_type> [with (<key> = <value>, ...)]] [AS <select_statement>] CREATE EXTERNAL TABLE [IF NOT EXISTS] <table_name> (<column_name> <data_type>, ... ) using <storage_type> [with (<key> = <value>, ...)] LOCATION '<path>' Compression L_ORDERKEY bigint, L_PARTKEY bigint, ... L_COMMENT text) USING TEXT WITH ('text.delimiter'='|','compression.codec'='org.apache.hadoop.io.compress.DeflateCodec') LOCATION 'hdfs://localhost:9010/tajo/warehouse/lineitem_100_snappy'; DROP TABLE DROP TABLE [IF EXISTS] <table_name> [PURGE] CREATE INDEX CREATE INDEX [ name ] ON table_name [ USING method ] ( { column_name | ( expression ) } [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) [ WHERE predicate ] DROP INDEX DROP INDEX name
  • 8. INSERT (OVERWRITE) INTO INSERT OVERWRITE statement overwrites a table data of an existing table or a data in a given directory. Tajo’s INSERT OVERWRITE statement follows INSERT INTO SELECT statement of SQL. The examples are as follows: create table t1 (col1 int8, col2 int4, col3 float8); -- when a target table schema and output schema are equivalent to each other INSERT OVERWRITE INTO t1 SELECT l_orderkey, l_partkey, l_quantity FROM lineitem; -- or INSERT OVERWRITE INTO t1 SELECT * FROM lineitem; -- when the output schema are smaller than the target table schema INSERT OVERWRITE INTO t1 SELECT l_orderkey FROM lineitem; -- when you want to specify certain target columns INSERT OVERWRITE INTO t1 (col1, col3) SELECT l_orderkey, l_quantity FROM lineitem; In addition, INSERT OVERWRITE statement overwrites table data as well as a specific directory. INSERT OVERWRITE INTO LOCATION '/dir/subdir' SELECT l_orderkey, l_quantity FROM lineitem;
  • 9. Tajo Queries Sample Query SELECT [distinct [all]] * | <expression> [[AS] <alias>] [, ...] [FROM <table reference> [[AS] <table alias name>] [, ...]] [WHERE <condition>] [GROUP BY <expression> [, ...]] [HAVING <condition>] [ORDER BY <expression> [ASC|DESC] [NULL FIRST|NULL LAST] [, ...]] Table and Table Aliases A temporary name can be given to tables and complex table references to be used for references to the derived table in the rest of the query. This is called a table alias. FROM table_reference AS alias or FROM table_reference alias Window Functions A window function performs a calculation across multiple table rows that belong to some window frame. SELECT ...., func(param) OVER ([PARTITION BY partition-expr [, ...]] [ORDER BY sort-expr [, ...]]), ...., FROM
  • 10. Better SQL support via thin JDBC ETL Tools BI Tools Reporting Tools TAJO CLUSTER Tajo JDBC HDFS HBase S3 Swift
  • 11. Dataset Stored in various Formats/Storage
  • 12. Which one to choose Impala Vs Presto Vs Drill Vs Tajo