SlideShare uma empresa Scribd logo
1 de 11
A case study:
Tajo on Big Telco
 
Jeong-shik Jang
System Development  Deployment
Gruter Inc, Seoul, South Korea

©2013 Gruter. All rights reserved.
 
Mobile carriers in S. Korea

2
 
Test setup
Performance test on Hive / Impala / Tajo
H/W
CPU

24 cores (Xeon 2.5 GHz, HT)

Memory

64 GB

Disks

3TB x 6 (NLSAS 7200 RPM)

Network

10G

Size

1 master + 6 data nodes

Versions:
Hadoop

cdh4.3.0

Hive

0.10.0-cdh4.3.0

Impala

impalad_version_1.1.1_RELEASE

Tajo

0.2-SNAPSHOT

Data size: 1.7 TB (4.1B rows, Q1), 8 or less GB (results of Q1, rest of Qs)
3
 
Test setup: Queries
Q1: scan using about 20 text pattern matching filters
Q2: 7 unions with joins
Q3: join
Q4: group by and order by
Q5: 30 text pattern matching filters with OR conditions, group
by, having, and order by

4
 
Results: Q1 – filter scan

• 

• 
1445.69
1400

NB:
* Tajo showed enhanced performance due to
dynamic task scheduling

1200
1000
800

895.96
789.09

Impala

600

Tajo
processing time (sec.)

400
200
0
5
 

Hive

Q1: scan using about 20 text pattern matching filters

Mais conteúdo relacionado

Mais procurados

Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase HBaseCon
 
HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxCloudera, Inc.
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoNathaniel Braun
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
 
opentsdb in a real enviroment
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviromentChen Robert
 
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Ontico
 
openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldOliver Hankeln
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBGeoffrey Anderson
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series WorldMapR Technologies
 
PostgreSQL performance archaeology
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeologyTomas Vondra
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar storesIstvan Szukacs
 
Gnocchi v3 brownbag
Gnocchi v3 brownbagGnocchi v3 brownbag
Gnocchi v3 brownbagGordon Chung
 
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...DataStax
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and SparkJosef Adersberger
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Florian Lautenschlager
 
Declarative Infrastructure Tools
Declarative Infrastructure Tools Declarative Infrastructure Tools
Declarative Infrastructure Tools Yulia Shcherbachova
 

Mais procurados (18)

Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at Box
 
Toku DB by Aswin
Toku DB by AswinToku DB by Aswin
Toku DB by Aswin
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
opentsdb in a real enviroment
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviroment
 
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
 
Full Text Search in PostgreSQL
Full Text Search in PostgreSQLFull Text Search in PostgreSQL
Full Text Search in PostgreSQL
 
openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed world
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
 
PostgreSQL performance archaeology
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeology
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Gnocchi v3 brownbag
Gnocchi v3 brownbagGnocchi v3 brownbag
Gnocchi v3 brownbag
 
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
 
Declarative Infrastructure Tools
Declarative Infrastructure Tools Declarative Infrastructure Tools
Declarative Infrastructure Tools
 

Destaque

3 run ka sauda (Hindi Comic - Freelance Talents)
3 run ka sauda (Hindi Comic - Freelance Talents)3 run ka sauda (Hindi Comic - Freelance Talents)
3 run ka sauda (Hindi Comic - Freelance Talents)Mohit Sharma
 
Project #1 - Water Pumping Station - Switchboard Upgrade
Project #1 - Water Pumping Station - Switchboard UpgradeProject #1 - Water Pumping Station - Switchboard Upgrade
Project #1 - Water Pumping Station - Switchboard UpgradeDavid List
 
Project #3 - A Class Treatment Plant - AMIAD Micro-Strainers
Project #3 - A Class Treatment Plant - AMIAD Micro-StrainersProject #3 - A Class Treatment Plant - AMIAD Micro-Strainers
Project #3 - A Class Treatment Plant - AMIAD Micro-StrainersDavid List
 
EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...
EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...
EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...Yogacharya AB Bhavanani
 
Angiosperms 2016
Angiosperms 2016Angiosperms 2016
Angiosperms 2016Jessi Dildy
 
2016 bpa 검출시험 관련 환경부, 이마트 논평 반박자료
2016 bpa 검출시험 관련 환경부, 이마트 논평 반박자료2016 bpa 검출시험 관련 환경부, 이마트 논평 반박자료
2016 bpa 검출시험 관련 환경부, 이마트 논평 반박자료여성환경연대
 
함수형사고 3장 양도하라
함수형사고 3장 양도하라함수형사고 3장 양도하라
함수형사고 3장 양도하라Sunggon Song
 
스타트업사례로 본 로그 데이터분석 : Tajo on AWS
스타트업사례로 본 로그 데이터분석 : Tajo on AWS스타트업사례로 본 로그 데이터분석 : Tajo on AWS
스타트업사례로 본 로그 데이터분석 : Tajo on AWSGruter
 
한국 금융권을 위한 aws cloud 도입 제언 :: 정우진 :: AWS Finance Seminar
한국 금융권을 위한 aws cloud 도입 제언 :: 정우진 :: AWS Finance Seminar한국 금융권을 위한 aws cloud 도입 제언 :: 정우진 :: AWS Finance Seminar
한국 금융권을 위한 aws cloud 도입 제언 :: 정우진 :: AWS Finance SeminarAmazon Web Services Korea
 
DDD Start! - 2장 아키텍처 개요
DDD Start! - 2장 아키텍처 개요DDD Start! - 2장 아키텍처 개요
DDD Start! - 2장 아키텍처 개요Minchul Jung
 
실천하는 에코페미니스트들의 플랫폼 (2015)
실천하는 에코페미니스트들의 플랫폼 (2015)실천하는 에코페미니스트들의 플랫폼 (2015)
실천하는 에코페미니스트들의 플랫폼 (2015)여성환경연대
 
ENHANCING MANAGEMENT OF TYPE 2 DIABETES THROUGH YOGA
ENHANCING MANAGEMENT OF TYPE 2 DIABETES THROUGH YOGAENHANCING MANAGEMENT OF TYPE 2 DIABETES THROUGH YOGA
ENHANCING MANAGEMENT OF TYPE 2 DIABETES THROUGH YOGAYogacharya AB Bhavanani
 
[Td 2015]너에게만 나는 반응해 반응형 응용프로그램(이규원)
[Td 2015]너에게만 나는 반응해 반응형 응용프로그램(이규원)[Td 2015]너에게만 나는 반응해 반응형 응용프로그램(이규원)
[Td 2015]너에게만 나는 반응해 반응형 응용프로그램(이규원)Sang Don Kim
 

Destaque (18)

Evolución del mercado de viviendas usadas
Evolución del mercado de viviendas usadasEvolución del mercado de viviendas usadas
Evolución del mercado de viviendas usadas
 
3 run ka sauda (Hindi Comic - Freelance Talents)
3 run ka sauda (Hindi Comic - Freelance Talents)3 run ka sauda (Hindi Comic - Freelance Talents)
3 run ka sauda (Hindi Comic - Freelance Talents)
 
El político soy yo
El político soy yoEl político soy yo
El político soy yo
 
Project #1 - Water Pumping Station - Switchboard Upgrade
Project #1 - Water Pumping Station - Switchboard UpgradeProject #1 - Water Pumping Station - Switchboard Upgrade
Project #1 - Water Pumping Station - Switchboard Upgrade
 
Project #3 - A Class Treatment Plant - AMIAD Micro-Strainers
Project #3 - A Class Treatment Plant - AMIAD Micro-StrainersProject #3 - A Class Treatment Plant - AMIAD Micro-Strainers
Project #3 - A Class Treatment Plant - AMIAD Micro-Strainers
 
EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...
EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...
EFFECT OF YOGA THERAPY ON REACTION TIME, BIOCHEMICAL PARAMETERS AND WELLNESS ...
 
Angiosperms 2016
Angiosperms 2016Angiosperms 2016
Angiosperms 2016
 
2016 bpa 검출시험 관련 환경부, 이마트 논평 반박자료
2016 bpa 검출시험 관련 환경부, 이마트 논평 반박자료2016 bpa 검출시험 관련 환경부, 이마트 논평 반박자료
2016 bpa 검출시험 관련 환경부, 이마트 논평 반박자료
 
함수형사고 3장 양도하라
함수형사고 3장 양도하라함수형사고 3장 양도하라
함수형사고 3장 양도하라
 
Fossils
FossilsFossils
Fossils
 
스타트업사례로 본 로그 데이터분석 : Tajo on AWS
스타트업사례로 본 로그 데이터분석 : Tajo on AWS스타트업사례로 본 로그 데이터분석 : Tajo on AWS
스타트업사례로 본 로그 데이터분석 : Tajo on AWS
 
한국 금융권을 위한 aws cloud 도입 제언 :: 정우진 :: AWS Finance Seminar
한국 금융권을 위한 aws cloud 도입 제언 :: 정우진 :: AWS Finance Seminar한국 금융권을 위한 aws cloud 도입 제언 :: 정우진 :: AWS Finance Seminar
한국 금융권을 위한 aws cloud 도입 제언 :: 정우진 :: AWS Finance Seminar
 
DDD Start! - 2장 아키텍처 개요
DDD Start! - 2장 아키텍처 개요DDD Start! - 2장 아키텍처 개요
DDD Start! - 2장 아키텍처 개요
 
DDD Repository
DDD RepositoryDDD Repository
DDD Repository
 
실천하는 에코페미니스트들의 플랫폼 (2015)
실천하는 에코페미니스트들의 플랫폼 (2015)실천하는 에코페미니스트들의 플랫폼 (2015)
실천하는 에코페미니스트들의 플랫폼 (2015)
 
DDD Start Ch#3
DDD Start Ch#3DDD Start Ch#3
DDD Start Ch#3
 
ENHANCING MANAGEMENT OF TYPE 2 DIABETES THROUGH YOGA
ENHANCING MANAGEMENT OF TYPE 2 DIABETES THROUGH YOGAENHANCING MANAGEMENT OF TYPE 2 DIABETES THROUGH YOGA
ENHANCING MANAGEMENT OF TYPE 2 DIABETES THROUGH YOGA
 
[Td 2015]너에게만 나는 반응해 반응형 응용프로그램(이규원)
[Td 2015]너에게만 나는 반응해 반응형 응용프로그램(이규원)[Td 2015]너에게만 나는 반응해 반응형 응용프로그램(이규원)
[Td 2015]너에게만 나는 반응해 반응형 응용프로그램(이규원)
 

Semelhante a Tajo case study bay area hug 20131105

Using Derivation-Free Optimization in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster  with TerasortUsing Derivation-Free Optimization in the Hadoop Cluster  with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster with TerasortAnhanguera Educacional S/A
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongCeph Community
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudNicolas Poggi
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortUsing Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortAnhanguera Educacional S/A
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesJun Liu
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoYu Liu
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...Jorge E. López de Vergara Méndez
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBencht_ivanov
 
PostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability ImprovementsPostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability ImprovementsPGConf APAC
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBencht_ivanov
 
Resume_CQ_Edward
Resume_CQ_EdwardResume_CQ_Edward
Resume_CQ_Edwardcaiqi wang
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkSigOpt
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Terark Product and Technology
Terark Product and TechnologyTerark Product and Technology
Terark Product and TechnologyXinyuan Fu
 
Profiling And Optimization Of Software Base Network Analysis Applications
Profiling And Optimization Of Software Base Network Analysis ApplicationsProfiling And Optimization Of Software Base Network Analysis Applications
Profiling And Optimization Of Software Base Network Analysis ApplicationsHargyo T. Nugroho
 

Semelhante a Tajo case study bay area hug 20131105 (20)

Using Derivation-Free Optimization in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster  with TerasortUsing Derivation-Free Optimization in the Hadoop Cluster  with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster with Terasort
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortUsing Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
 
PostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability ImprovementsPostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability Improvements
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Resume_CQ_Edward
Resume_CQ_EdwardResume_CQ_Edward
Resume_CQ_Edward
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Terark Product and Technology
Terark Product and TechnologyTerark Product and Technology
Terark Product and Technology
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
 
Profiling And Optimization Of Software Base Network Analysis Applications
Profiling And Optimization Of Software Base Network Analysis ApplicationsProfiling And Optimization Of Software Base Network Analysis Applications
Profiling And Optimization Of Software Base Network Analysis Applications
 

Mais de Gruter

MelOn 빅데이터 플랫폼과 Tajo 이야기
MelOn 빅데이터 플랫폼과 Tajo 이야기MelOn 빅데이터 플랫폼과 Tajo 이야기
MelOn 빅데이터 플랫폼과 Tajo 이야기Gruter
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseGruter
 
Expanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoExpanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoGruter
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataGruter
 
Introduction to Apache Tajo
Introduction to Apache TajoIntroduction to Apache Tajo
Introduction to Apache TajoGruter
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondGruter
 
Big data analysis with R and Apache Tajo (in Korean)
Big data analysis with R and Apache Tajo (in Korean)Big data analysis with R and Apache Tajo (in Korean)
Big data analysis with R and Apache Tajo (in Korean)Gruter
 
Efficient In­‐situ Processing of Various Storage Types on Apache Tajo
Efficient In­‐situ Processing of Various Storage Types on Apache TajoEfficient In­‐situ Processing of Various Storage Types on Apache Tajo
Efficient In­‐situ Processing of Various Storage Types on Apache TajoGruter
 
Tajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWSTajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWSGruter
 
Data analysis with Tajo
Data analysis with TajoData analysis with Tajo
Data analysis with TajoGruter
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter
 
Gruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigDataGruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigDataGruter
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter
 
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)Gruter
 
Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Gruter
 
Elastic Search Performance Optimization - Deview 2014
Elastic Search Performance Optimization - Deview 2014Elastic Search Performance Optimization - Deview 2014
Elastic Search Performance Optimization - Deview 2014Gruter
 
Hadoop security DeView 2014
Hadoop security DeView 2014Hadoop security DeView 2014
Hadoop security DeView 2014Gruter
 
Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014Gruter
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopGruter
 

Mais de Gruter (20)

MelOn 빅데이터 플랫폼과 Tajo 이야기
MelOn 빅데이터 플랫폼과 Tajo 이야기MelOn 빅데이터 플랫폼과 Tajo 이야기
MelOn 빅데이터 플랫폼과 Tajo 이야기
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Expanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoExpanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with Tajo
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Introduction to Apache Tajo
Introduction to Apache TajoIntroduction to Apache Tajo
Introduction to Apache Tajo
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
Big data analysis with R and Apache Tajo (in Korean)
Big data analysis with R and Apache Tajo (in Korean)Big data analysis with R and Apache Tajo (in Korean)
Big data analysis with R and Apache Tajo (in Korean)
 
Efficient In­‐situ Processing of Various Storage Types on Apache Tajo
Efficient In­‐situ Processing of Various Storage Types on Apache TajoEfficient In­‐situ Processing of Various Storage Types on Apache Tajo
Efficient In­‐situ Processing of Various Storage Types on Apache Tajo
 
Tajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWSTajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWS
 
Data analysis with Tajo
Data analysis with TajoData analysis with Tajo
Data analysis with Tajo
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in Telco
 
Gruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigDataGruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigData
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
 
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
 
Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Apache Tajo - BWC 2014
Apache Tajo - BWC 2014
 
Elastic Search Performance Optimization - Deview 2014
Elastic Search Performance Optimization - Deview 2014Elastic Search Performance Optimization - Deview 2014
Elastic Search Performance Optimization - Deview 2014
 
Hadoop security DeView 2014
Hadoop security DeView 2014Hadoop security DeView 2014
Hadoop security DeView 2014
 
Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
 

Último

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Último (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Tajo case study bay area hug 20131105

  • 1. A case study: Tajo on Big Telco
  • 2.   Jeong-shik Jang System Development Deployment Gruter Inc, Seoul, South Korea ©2013 Gruter. All rights reserved.
  • 3.  
  • 4. Mobile carriers in S. Korea 2
  • 5.  
  • 6. Test setup Performance test on Hive / Impala / Tajo H/W CPU 24 cores (Xeon 2.5 GHz, HT) Memory 64 GB Disks 3TB x 6 (NLSAS 7200 RPM) Network 10G Size 1 master + 6 data nodes Versions: Hadoop cdh4.3.0 Hive 0.10.0-cdh4.3.0 Impala impalad_version_1.1.1_RELEASE Tajo 0.2-SNAPSHOT Data size: 1.7 TB (4.1B rows, Q1), 8 or less GB (results of Q1, rest of Qs) 3
  • 7.  
  • 8. Test setup: Queries Q1: scan using about 20 text pattern matching filters Q2: 7 unions with joins Q3: join Q4: group by and order by Q5: 30 text pattern matching filters with OR conditions, group by, having, and order by 4
  • 9.  
  • 10. Results: Q1 – filter scan •  •  1445.69 1400 NB: * Tajo showed enhanced performance due to dynamic task scheduling 1200 1000 800 895.96 789.09 Impala 600 Tajo processing time (sec.) 400 200 0 5
  • 11.   Hive Q1: scan using about 20 text pattern matching filters
  • 12. Results: Q2 – unions, joins •  •  70 63.64 NB: 60 *Tajo materializing all query results to HDFS , as is the main goal *unions are processed in sequence in Tajo n ow (parallel processing is coming soon) 50 38.64 40 Impala 30 Tajo processing time (sec.) 20 10 0 6
  • 14. Results: Q3 – join •  •  101.45 NB: 100 *Tajo has an optimal selection/projection push down 80 Hive 60 Impala 40 36.81 20 0 7
  • 16. Results: Q4 – group by and sort •  •  25 24.7 20 15 Hive Impala 10 Tajo processing time (sec.) 5 0.45 0 8
  • 17.   Q4: group by and order by 0.65
  • 18. Results: Q5 – filters, group by, having and sort •  •  128.78 120 100 80 Hive 60 Impala Tajo 40 20 0 9
  • 19.   processing time (sec.) 17.03 6.03 Q6: Q5: 30 Text pattern matching filters with OR conditions, group by, having, and order by resulting in smaller set of output
  • 20. Results: Wrap up The project is underway; more findings expected in the future Performance enhancement thanks to dynamic task scheduling : some results showed better performan ce than Impala, despite Tajo materializing every qu ery result to HDFS, the project still being in its earl y stages, and Tajo still being an early build. 10
  • 21.  
  • 22. GRUTER: YOUR PARTNER IN THE BIG DATA REVOLUTION Phone Fax +82-70-8129-2950 +82-70-8129-2952 E-mail Web contact@gruter.com www.gruter.com Gruter, Inc. 5F Sehwa Office Building 889-70 Daechi-dong, Gangnam-gu, Seoul, South Korea 135-83 9 Jeong-shik Jang: jsjang@gruter.com ©2013 Gruter. All rights reserved.
  • 23.