SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
CoreBigBench: Benchmarking Big
Data Core Operations
Todor Ivanov1, Ahmad Ghazal2, Alain Crolotte3, Pekka
Kostamaa3, Yoseph Ghazal4
1. Frankfurt Big Data Lab, Goethe University, Germany
2. Facebook Corporation, Seattle, WA, USA
3. Teradata Corporation, El Segundo, CA, USA
4. University of California, Irvine, CA, USA
Outline
• Motivation
• Background
• CoreBigBench Specification
• Data Model
• Workload
• Proof of Concept
• Conclusion
DBTest 2020, June 19, 2020 2
Motivation
• Growing number of emerging Big Data systems
--> high number of new Big Data benchmarks
• Micro-benchmarks that focus on testing specific functionality or
single operations:
• WordCount [W1], Pi [P1], Terasort [T1], TestDFSIO [D1]
• HiveBench [A2010], HiBench [H1], AMP Lab Benchmark [A1], HiveRunner [H2]
• SparkBench [S1], Spark-sql-perf [S2]
• End-to-end application benchmarks focus on a business problem and
simulate a real world application with a data model and workload:
• BigBench [G2013] and BigBench V2 [G2017]
DBTest 2020, June 19, 2020 3
End-to-End Application Benchmarks
BigBench/TPCx-BB [G2013]
• Technology agnostic, analytics, application-
level Big Data benchmark.
• On top of TPC-DS (decision support on retail
business)
• Adding semi-structured and unstructured data.
• Focus on: Parallel DBMS and MR engines
(Hadoop, Hive, etc.).
• Workload: 30 queries
• Based on big data retail analytics research
• 11 queries from TPC-DS
• Adopted by TPC as TPCx-BB
• Implementation in HiveQL and Spark MLlib.
BigBench V2 [G2017]
• a major rework of BigBench
• separate from TPC-DS and takes care of late
binding.
• New simplified data model and late binding
requirements.
• Custom made scale factor-based data
generator for all components.
• Workload:
• All 11 TPC-DS queries are replaced with
new queries in BigBench V2.
• New queries with similar business
questions - focus on analytics on the
semi-structured web-logs.
DBTest 2020, June 19, 2020 4
What is not covered by micro and application
benchmarks?
• Both micro-benchmarks and application benchmarks can be tuned for the specific application they are
testing
• There is a need for Big Data White box (or core engine operations) benchmarking
• Examples of core operations
• Table scans, two way joins, aggregations and window functions
• Common User Defined Functions (UDFs) like sessioinze, path, ..
• Core operators benchmarking also helps with performance regression of big data system
• Not replacement for application level benchmarking
• Complements them
• Similar problem for DBMS was addressed by Crolotte & Ghazal [C&G2010] covering: scans, aggregations,
joins and other core relational operators
5DBTest 2020, June 19, 2020
CoreBigBench Data Model
inspired by BigBench V2 [G2017]
• New simplified (star-schema) data model
• Structured part consisting of 6 tables
• Semi-structured part (JSON)
• Key-value pairs representing user clicks
• Keys corresponding to structured part and random keys
and values
• Example :
<user,user1> <time,t1> <webpage,w1>
<product,p1>
<key1,value1> <key2,value2> ...
<key100,value100>
DBTest 2020, June 19, 2020 6
• Unstructured part (text): Product reviews similar to the one in BigBench
• Custom made scale factor-based data generator for all components.
● 1 – many relationship :
● Semi-structured : key-value WebLog
● Un-structured: Product Reviews
Summary of Workload Queries
• Variety of core operations on structured, semi structured and unstructured data
• Scans
• 𝑄1 - 𝑄5 cover variations of scans with different selectivity's on structured and semi-
structured data
• Aggregations
• 𝑄6 - 𝑄12 cover different aggregations on structured and semi-structured data
• Window functions
• 𝑄13 - 𝑄16 cover variations of window functions with different data partitioning
• Joins
• 𝑄17 - 𝑄18 cover binary joins with partitioning variations on structured and unstructured data
• Common Big Data functions
• 𝑄19 - 𝑄22 cover four UDFs (sessionize, path, sentiment analysis and K-means) on structured,
semi-structured and unstructured data
DBTest 2020, June 19, 2020 7
Queries Text Descriptions
Q1 List all store sold products (items) together with their quantity. This query does a full table scan of the store data.
Q2
List all products (items) sold together in stores with their quantity sold between 2013-04-21 and 2013-07-03. This query tests scans
with low selectivity 10% filter.
Q3
List all products (items) together with their quantity sold between 2013-01-21 and 2014-11-10. Similar to 𝑄2 but with high selectivity
(90%).
Q4
List names of all visited web pages. This query tests parsing the semi-structured web logs and scanning the parsed results. The query
requires only one key from the web logs.
Q5
Similar to 𝑄4 above but returning a bigger set of keys. This variation measures the ability of the underlying system for producing a
bigger schema out of the web logs.
Q6
Find total number of all stores sales. This query covers basic aggregations with no grouping. The query involves scanning store sales
and to get the net cost of aggregations we deduct the cost of 𝑄1 from this query run time.
Q7
Find total number of visited web pages. This query requires parsing and scanning the web logs and therefore it is adjusted by
subtracting 𝑄4 from its run time.
Q8 Find total number of store sales per product (item). This query is adjusted similar to 𝑄6.
Q9 Find number of clicks per product (item). This query also requires parsing the web logs and can be adjusted similar to 𝑄7.
Q10
Find a list of aggregations from store sales by customer. Aggregations include number of transactions, maximum and minimum
quantities purchased in an order. This query also finds correlations between stores and products (items) purchased by a a customer.
The purpose of this query is to test cases of a big set of aggregations.
Q11 This query has a simple objective like 𝑄10 but applied to web logs. Again, the query need to be adjusted by removing the parsing and
scan cost represented by 𝑄4.
DBTest 2020, June 19, 2020 8
Queries Text Descriptions
Q12
𝑄12 is the same as 𝑄8 but on store sales partitioned by customer (different than the group key). The shuffle cost is computed
as run-time of 𝑄12 minus run-time of 𝑄8.
Q13 Find row numbers of store sales records order by store id.
Q14 Find row numbers of web log records ordered by timestamp of clicks.
Q15
Find row numbers of store sales records order by store id for each customer. This query is similar to 𝑄13 but computes the
row numbers for each customer individually.
Q16 Same as 𝑄14 where row numbers are computed per customer.
Q17
Find all store sales with products that were reviewed. This query is a join between the stores sales and product reviews both
partitioned on item ID.
Q18
Same as 𝑄17 with different partitioning. (Table store sales is partitioned on customer ID and no partitioning on table product
reviews.)
Q19
List all customers that spend more than 10 minutes on the retailer web site. This query involves finding all sessions of all users
and filtering them to those which are 10 minutes of less.
Q20
Find the 5 most popular web page paths that lead to a purchase. This query is based on finding paths in clicks that lead to
purchases, aggregating the results and finding the top 5.
Q21
For all products, extract sentences from its product reviews that contain Positive or Negative sentiment and display the
sentiment polarity of the extracted sentences.
Q22
Cluster customers into book buddies/club groups based on their in-store book purchasing histories. After model of separation
is build, report for the analyzed customers to which "group" they were assigned.
DBTest 2020, June 19, 2020 9
Proof Of Concept
• Objective --> show the feasibility of CoreBigBench (no serious tuning effort)
• Setup
• 4 node cluster (Ubuntu Server)
• Cloudera CDH 5.16.2 + Hive 1.10
• Data Generation with Scale Factor = 10
• Late binding on the JSON file
• Query implementation in Hive is available in github: https://github.com/t-
ivanov/CoreBigBench
DBTest 2020, June 19, 2020 10
CREATE EXTERNAL TABLE IF NOT EXISTS
web_logs (line string)
ROW FORMAT DELIMITED LINES TERMINATED BY 'n'
STORED AS TEXTFILE
LOCATION 'hdfsPath/web_logs/clicks.json';
Queries on Structured Data
• 𝑄2: List all products (items) sold together in stores with their quantity sold between 2013-04-21 and
2013-07-03. This query tests scans with low selectivity 10% filter.
DBTest 2020, June 19, 2020 11
SELECT ss_item_id, ss_quantity FROM store_sales
WHERE to_date(ss_ts) >= '2013-04-21'
AND to_date(ss_ts) < '2013-07-03';
• 𝑄1 performs a full table scan of the store
data.
• We deduct the 𝑄1 operation time for
queries 𝑄6 to 𝑄15 operating on the
structured data.
• The geometric mean of all query times in
this group is 62.07 seconds.
Queries on Semi-structured Data
DBTest 2020, June 19, 2020 12
• 𝑄4: List names of all visited web pages. This
query tests parsing the semi-structured web
logs and scanning the parsed results. The
query requires only one key from the web
logs.
SELECT wl_webpage_name
FROM web_logs
lateral view json_tuple(
web_logs.line,'wl_webpage_name'
)logs as wl_webpage_name
WHERE wl_webpage_name IS NULL;
• 𝑄4 performs a simple scan operation that involves
parsing all the JSON records on the fly and extracting
only the necessary attributes.
• We deduct 𝑄4 operation time from all other queries
in this group.
• The geometric mean of all query times in this
group is 525.88 seconds.
Queries with UDF Functions
DBTest 2020, June 19, 2020 13
• 𝑄22: Cluster customers into book buddies/club
groups based on their in-store book purchasing
histories. After model of separation is build, report
for the analysed customers to which "group" they
where assigned.
set cluster_centers=8;
set clustering_iterations=20;
SELECT kmeans(
collect_list(array(id1, id3, id5, id7, id9,
id11, id13, id15, id2, id4, id6, id8, id10,
id14, id16)),
${hiveconf:cluster_centers},
${hiveconf:clustering_iterations}) AS out
FROM q22_prep_data;
• 𝑄19 and 𝑄20 operate on the semi-structured key-value
data and we deduct the basic key-value scan 𝑄4
operation time.
• 𝑄21 and 𝑄22 operate on the structured and
unstructured data and we deduct the simple table
scan 𝑄1 operation time.
• The geometric mean of all query times in this group is
204.15 seconds.
Conclusion
• CoreBigBench
• is a benchmark assessing the performance of core (basic) operations of big
data engines like scans, two way joins, UDF functions;
• consists of 22 queries applied on sales data, key-value web logs and
unstructured product reviews (inspired by BigBench V2);
• queries have textual definitions and reference implementation in Hive.
• CoreBigBench can be used for
• complimentary to end-to-end benchmarks like BigBench;
• regression testing of commercial Big Data engines.
• In future the CoreBigBench can be extended to include ETL, which is very basic
functionality for Big Data engines.
DBTest 2020, June 19, 2020 14
Thank you for your attention!
• Acknowledgments. This work has been partially funded by the European Commission
H2020 project DataBench - Evidence Based Big Data Benchmarking to Improve Business
Performance, under project No. 780966. This work expresses the opinions of the authors and not
necessarily those of the European Commission. The European Commission is not liable for any
use that may be made of the information contained in this work. The authors thank all the
participants in the project for discussions and common work.
www.databench.eu
DBTest 2020, June 19, 2020 15
References (1)
• [C&G2010] Alain Crolotte and Ahmad Ghazal. 2010. Benchmarking Using Basic DBMS Operations. In
2nd TPC Technology Conference, TPCTC 2010, Singapore, September 13-17, 2010
• [G2013] Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and
Hans-Arno Jacobsen. 2013. BigBench: Towards An Industry Standard Benchmark for Big Data Analytics.
In SIGMOD 2013. 1197–1208.
• [G2017] Ahmad Ghazal, Todor Ivanov, Pekka Kostamaa, Alain Crolotte, Ryan Voong, Mohammed Al-
Kateb, Waleed Ghazal, and Roberto V. Zicari. 2017. BigBench V2: The New and Improved BigBench. In
ICDE 2017, San Diego, CA, USA, April 19-22.
• [W1] WordCount. https://cwiki.apache.org/confluence/display/HADOOP2/WordCount
• [T1] TeraSort. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-
summary.html
• [P1] Package
hadoop.examples.pi. http://hadoop.apache.org/docs/r0.23.11/api/org/apache/hadoop/examples/pi/package-
summary.html
• [D1] DFSIO benchmark. http://svn.apache.org/repos/asf/hadoop/common/tags/release-
0.13.0/src/test/org/apache/hadoop/fs/TestDFSIO.java
DBTest 2020, June 19, 2020 16
References (2)
• [A2010] Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden,
and Michael Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proc. of the
ACM SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009. ACM, 165–178
• [A1] AMP Lab Big Data Benchmark. https://amplab.cs.berkeley.edu/benchmark/
• [S1] SparkBench. https://bitbucket.org/lm0926/sparkbench
• [S2] Spark-SQL-perf. https://github.com/databricks/spark-sql-perf
• [H1] HiBench Suite. https://github.com/intel-hadoop/HiBench
• [H2] HiveRunner. https://github.com/klarna/HiveRunner
DBTest 2020, June 19, 2020 17

Mais conteúdo relacionado

Mais procurados

Data warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail businessData warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail businessArsalan Qadri
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project reportsonalighai
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6Prithwis Mukerjee
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...ijcsit
 
Data warehouse Project Report
Data warehouse Project ReportData warehouse Project Report
Data warehouse Project ReportHimanshu Yadav
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
HANA Performance Efficient Speed and Scale-out for Real-time BI
HANA Performance Efficient Speed and Scale-out for Real-time BIHANA Performance Efficient Speed and Scale-out for Real-time BI
HANA Performance Efficient Speed and Scale-out for Real-time BIIBM India Smarter Computing
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3ambujm
 
Use of secondary data in marketing analytics
Use of secondary data in marketing analyticsUse of secondary data in marketing analytics
Use of secondary data in marketing analyticsDebasisMohanty37
 
ETL Testing Training Presentation
ETL Testing Training PresentationETL Testing Training Presentation
ETL Testing Training PresentationApurba Biswas
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional modelGersiton Pila Challco
 
A Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data MiningA Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data MiningEditor IJCATR
 
Building the DataBench Workflow and Architecture
Building the DataBench Workflow and ArchitectureBuilding the DataBench Workflow and Architecture
Building the DataBench Workflow and Architecturet_ivanov
 

Mais procurados (17)

Data warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail businessData warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail business
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
 
Data warehouse Project Report
Data warehouse Project ReportData warehouse Project Report
Data warehouse Project Report
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
HANA Performance Efficient Speed and Scale-out for Real-time BI
HANA Performance Efficient Speed and Scale-out for Real-time BIHANA Performance Efficient Speed and Scale-out for Real-time BI
HANA Performance Efficient Speed and Scale-out for Real-time BI
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3
 
Use of secondary data in marketing analytics
Use of secondary data in marketing analyticsUse of secondary data in marketing analytics
Use of secondary data in marketing analytics
 
ETL Testing Training Presentation
ETL Testing Training PresentationETL Testing Training Presentation
ETL Testing Training Presentation
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional model
 
A Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data MiningA Survey on the Clustering Algorithms in Sales Data Mining
A Survey on the Clustering Algorithms in Sales Data Mining
 
Building the DataBench Workflow and Architecture
Building the DataBench Workflow and ArchitectureBuilding the DataBench Workflow and Architecture
Building the DataBench Workflow and Architecture
 
Olap
OlapOlap
Olap
 

Semelhante a CoreBigBench: Benchmarking Big Data Core Operations

Accelerate your Queries with Data Virtualization
Accelerate your Queries with Data VirtualizationAccelerate your Queries with Data Virtualization
Accelerate your Queries with Data VirtualizationDenodo
 
Fundamentals of BI Report Testing - Module 6
Fundamentals of BI Report Testing - Module 6Fundamentals of BI Report Testing - Module 6
Fundamentals of BI Report Testing - Module 6MichaelCalabrese20
 
Universal Analytics and Google Tag Manager - Superweek 2014
Universal Analytics and Google Tag Manager - Superweek 2014Universal Analytics and Google Tag Manager - Superweek 2014
Universal Analytics and Google Tag Manager - Superweek 2014Yehoshua
 
Universal Analytics and Google Tag Manager
Universal Analytics and Google Tag ManagerUniversal Analytics and Google Tag Manager
Universal Analytics and Google Tag ManagerYehoshua
 
Universal Analytics and Google Tag Manager - Superweek 2014
Universal Analytics and Google Tag Manager - Superweek 2014Universal Analytics and Google Tag Manager - Superweek 2014
Universal Analytics and Google Tag Manager - Superweek 2014Analytics Ninja LLC
 
Google Analytics Training - full 2017
Google Analytics Training - full 2017Google Analytics Training - full 2017
Google Analytics Training - full 2017Nate Plaunt
 
Project report aditi paul1
Project report aditi paul1Project report aditi paul1
Project report aditi paul1guest9529cb
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.pptBsMath3rdsem
 
Teradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional ModelsTeradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional Modelspepeborja
 
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...DataBench
 
Understanding Web Analytics and Google Analytics
Understanding Web Analytics and Google AnalyticsUnderstanding Web Analytics and Google Analytics
Understanding Web Analytics and Google AnalyticsPrathamesh Kulkarni
 
Retail Design
Retail DesignRetail Design
Retail Designjagishar
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationKate Subramanian
 
Tips tricks to speed nw bi 2009
Tips tricks to speed  nw bi  2009Tips tricks to speed  nw bi  2009
Tips tricks to speed nw bi 2009HawaDia
 
Why Big Query is so Powerful - Trusted Conf
Why Big Query is so Powerful - Trusted ConfWhy Big Query is so Powerful - Trusted Conf
Why Big Query is so Powerful - Trusted ConfIn Marketing We Trust
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
Adapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologiesAdapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologiesTom Breur
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
 

Semelhante a CoreBigBench: Benchmarking Big Data Core Operations (20)

Accelerate your Queries with Data Virtualization
Accelerate your Queries with Data VirtualizationAccelerate your Queries with Data Virtualization
Accelerate your Queries with Data Virtualization
 
Fundamentals of BI Report Testing - Module 6
Fundamentals of BI Report Testing - Module 6Fundamentals of BI Report Testing - Module 6
Fundamentals of BI Report Testing - Module 6
 
mod 2.pdf
mod 2.pdfmod 2.pdf
mod 2.pdf
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Universal Analytics and Google Tag Manager - Superweek 2014
Universal Analytics and Google Tag Manager - Superweek 2014Universal Analytics and Google Tag Manager - Superweek 2014
Universal Analytics and Google Tag Manager - Superweek 2014
 
Universal Analytics and Google Tag Manager
Universal Analytics and Google Tag ManagerUniversal Analytics and Google Tag Manager
Universal Analytics and Google Tag Manager
 
Universal Analytics and Google Tag Manager - Superweek 2014
Universal Analytics and Google Tag Manager - Superweek 2014Universal Analytics and Google Tag Manager - Superweek 2014
Universal Analytics and Google Tag Manager - Superweek 2014
 
Google Analytics Training - full 2017
Google Analytics Training - full 2017Google Analytics Training - full 2017
Google Analytics Training - full 2017
 
Project report aditi paul1
Project report aditi paul1Project report aditi paul1
Project report aditi paul1
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
Teradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional ModelsTeradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional Models
 
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
 
Understanding Web Analytics and Google Analytics
Understanding Web Analytics and Google AnalyticsUnderstanding Web Analytics and Google Analytics
Understanding Web Analytics and Google Analytics
 
Retail Design
Retail DesignRetail Design
Retail Design
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence Application
 
Tips tricks to speed nw bi 2009
Tips tricks to speed  nw bi  2009Tips tricks to speed  nw bi  2009
Tips tricks to speed nw bi 2009
 
Why Big Query is so Powerful - Trusted Conf
Why Big Query is so Powerful - Trusted ConfWhy Big Query is so Powerful - Trusted Conf
Why Big Query is so Powerful - Trusted Conf
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Adapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologiesAdapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologies
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 

Mais de DataBench

Welcome to DataBench
Welcome to DataBenchWelcome to DataBench
Welcome to DataBenchDataBench
 
Session 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksSession 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksDataBench
 
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...DataBench
 
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...DataBench
 
Session 4 - A practical journey on how to use the DataBench Toolbox
Session 4 - A practical journey on how to use the DataBench ToolboxSession 4 - A practical journey on how to use the DataBench Toolbox
Session 4 - A practical journey on how to use the DataBench ToolboxDataBench
 
DataBench Toolbox in a Nutshell
DataBench Toolbox in a NutshellDataBench Toolbox in a Nutshell
DataBench Toolbox in a NutshellDataBench
 
Success Stories on Big Data & Analytics
Success Stories on Big Data & AnalyticsSuccess Stories on Big Data & Analytics
Success Stories on Big Data & AnalyticsDataBench
 
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...DataBench
 
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...DataBench
 
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...DataBench
 
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...DataBench
 
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...DataBench
 
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019DataBench
 
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...DataBench
 
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019DataBench
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...DataBench
 
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...DataBench
 
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...DataBench
 
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...DataBench
 
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...DataBench
 

Mais de DataBench (20)

Welcome to DataBench
Welcome to DataBenchWelcome to DataBench
Welcome to DataBench
 
Session 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksSession 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data Benchmarks
 
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
 
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
 
Session 4 - A practical journey on how to use the DataBench Toolbox
Session 4 - A practical journey on how to use the DataBench ToolboxSession 4 - A practical journey on how to use the DataBench Toolbox
Session 4 - A practical journey on how to use the DataBench Toolbox
 
DataBench Toolbox in a Nutshell
DataBench Toolbox in a NutshellDataBench Toolbox in a Nutshell
DataBench Toolbox in a Nutshell
 
Success Stories on Big Data & Analytics
Success Stories on Big Data & AnalyticsSuccess Stories on Big Data & Analytics
Success Stories on Big Data & Analytics
 
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
 
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
 
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
 
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
 
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
 
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
 
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
 
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
 
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
 
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
 
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
 
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
 

Último

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 

Último (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

CoreBigBench: Benchmarking Big Data Core Operations

  • 1. CoreBigBench: Benchmarking Big Data Core Operations Todor Ivanov1, Ahmad Ghazal2, Alain Crolotte3, Pekka Kostamaa3, Yoseph Ghazal4 1. Frankfurt Big Data Lab, Goethe University, Germany 2. Facebook Corporation, Seattle, WA, USA 3. Teradata Corporation, El Segundo, CA, USA 4. University of California, Irvine, CA, USA
  • 2. Outline • Motivation • Background • CoreBigBench Specification • Data Model • Workload • Proof of Concept • Conclusion DBTest 2020, June 19, 2020 2
  • 3. Motivation • Growing number of emerging Big Data systems --> high number of new Big Data benchmarks • Micro-benchmarks that focus on testing specific functionality or single operations: • WordCount [W1], Pi [P1], Terasort [T1], TestDFSIO [D1] • HiveBench [A2010], HiBench [H1], AMP Lab Benchmark [A1], HiveRunner [H2] • SparkBench [S1], Spark-sql-perf [S2] • End-to-end application benchmarks focus on a business problem and simulate a real world application with a data model and workload: • BigBench [G2013] and BigBench V2 [G2017] DBTest 2020, June 19, 2020 3
  • 4. End-to-End Application Benchmarks BigBench/TPCx-BB [G2013] • Technology agnostic, analytics, application- level Big Data benchmark. • On top of TPC-DS (decision support on retail business) • Adding semi-structured and unstructured data. • Focus on: Parallel DBMS and MR engines (Hadoop, Hive, etc.). • Workload: 30 queries • Based on big data retail analytics research • 11 queries from TPC-DS • Adopted by TPC as TPCx-BB • Implementation in HiveQL and Spark MLlib. BigBench V2 [G2017] • a major rework of BigBench • separate from TPC-DS and takes care of late binding. • New simplified data model and late binding requirements. • Custom made scale factor-based data generator for all components. • Workload: • All 11 TPC-DS queries are replaced with new queries in BigBench V2. • New queries with similar business questions - focus on analytics on the semi-structured web-logs. DBTest 2020, June 19, 2020 4
  • 5. What is not covered by micro and application benchmarks? • Both micro-benchmarks and application benchmarks can be tuned for the specific application they are testing • There is a need for Big Data White box (or core engine operations) benchmarking • Examples of core operations • Table scans, two way joins, aggregations and window functions • Common User Defined Functions (UDFs) like sessioinze, path, .. • Core operators benchmarking also helps with performance regression of big data system • Not replacement for application level benchmarking • Complements them • Similar problem for DBMS was addressed by Crolotte & Ghazal [C&G2010] covering: scans, aggregations, joins and other core relational operators 5DBTest 2020, June 19, 2020
  • 6. CoreBigBench Data Model inspired by BigBench V2 [G2017] • New simplified (star-schema) data model • Structured part consisting of 6 tables • Semi-structured part (JSON) • Key-value pairs representing user clicks • Keys corresponding to structured part and random keys and values • Example : <user,user1> <time,t1> <webpage,w1> <product,p1> <key1,value1> <key2,value2> ... <key100,value100> DBTest 2020, June 19, 2020 6 • Unstructured part (text): Product reviews similar to the one in BigBench • Custom made scale factor-based data generator for all components. ● 1 – many relationship : ● Semi-structured : key-value WebLog ● Un-structured: Product Reviews
  • 7. Summary of Workload Queries • Variety of core operations on structured, semi structured and unstructured data • Scans • 𝑄1 - 𝑄5 cover variations of scans with different selectivity's on structured and semi- structured data • Aggregations • 𝑄6 - 𝑄12 cover different aggregations on structured and semi-structured data • Window functions • 𝑄13 - 𝑄16 cover variations of window functions with different data partitioning • Joins • 𝑄17 - 𝑄18 cover binary joins with partitioning variations on structured and unstructured data • Common Big Data functions • 𝑄19 - 𝑄22 cover four UDFs (sessionize, path, sentiment analysis and K-means) on structured, semi-structured and unstructured data DBTest 2020, June 19, 2020 7
  • 8. Queries Text Descriptions Q1 List all store sold products (items) together with their quantity. This query does a full table scan of the store data. Q2 List all products (items) sold together in stores with their quantity sold between 2013-04-21 and 2013-07-03. This query tests scans with low selectivity 10% filter. Q3 List all products (items) together with their quantity sold between 2013-01-21 and 2014-11-10. Similar to 𝑄2 but with high selectivity (90%). Q4 List names of all visited web pages. This query tests parsing the semi-structured web logs and scanning the parsed results. The query requires only one key from the web logs. Q5 Similar to 𝑄4 above but returning a bigger set of keys. This variation measures the ability of the underlying system for producing a bigger schema out of the web logs. Q6 Find total number of all stores sales. This query covers basic aggregations with no grouping. The query involves scanning store sales and to get the net cost of aggregations we deduct the cost of 𝑄1 from this query run time. Q7 Find total number of visited web pages. This query requires parsing and scanning the web logs and therefore it is adjusted by subtracting 𝑄4 from its run time. Q8 Find total number of store sales per product (item). This query is adjusted similar to 𝑄6. Q9 Find number of clicks per product (item). This query also requires parsing the web logs and can be adjusted similar to 𝑄7. Q10 Find a list of aggregations from store sales by customer. Aggregations include number of transactions, maximum and minimum quantities purchased in an order. This query also finds correlations between stores and products (items) purchased by a a customer. The purpose of this query is to test cases of a big set of aggregations. Q11 This query has a simple objective like 𝑄10 but applied to web logs. Again, the query need to be adjusted by removing the parsing and scan cost represented by 𝑄4. DBTest 2020, June 19, 2020 8
  • 9. Queries Text Descriptions Q12 𝑄12 is the same as 𝑄8 but on store sales partitioned by customer (different than the group key). The shuffle cost is computed as run-time of 𝑄12 minus run-time of 𝑄8. Q13 Find row numbers of store sales records order by store id. Q14 Find row numbers of web log records ordered by timestamp of clicks. Q15 Find row numbers of store sales records order by store id for each customer. This query is similar to 𝑄13 but computes the row numbers for each customer individually. Q16 Same as 𝑄14 where row numbers are computed per customer. Q17 Find all store sales with products that were reviewed. This query is a join between the stores sales and product reviews both partitioned on item ID. Q18 Same as 𝑄17 with different partitioning. (Table store sales is partitioned on customer ID and no partitioning on table product reviews.) Q19 List all customers that spend more than 10 minutes on the retailer web site. This query involves finding all sessions of all users and filtering them to those which are 10 minutes of less. Q20 Find the 5 most popular web page paths that lead to a purchase. This query is based on finding paths in clicks that lead to purchases, aggregating the results and finding the top 5. Q21 For all products, extract sentences from its product reviews that contain Positive or Negative sentiment and display the sentiment polarity of the extracted sentences. Q22 Cluster customers into book buddies/club groups based on their in-store book purchasing histories. After model of separation is build, report for the analyzed customers to which "group" they were assigned. DBTest 2020, June 19, 2020 9
  • 10. Proof Of Concept • Objective --> show the feasibility of CoreBigBench (no serious tuning effort) • Setup • 4 node cluster (Ubuntu Server) • Cloudera CDH 5.16.2 + Hive 1.10 • Data Generation with Scale Factor = 10 • Late binding on the JSON file • Query implementation in Hive is available in github: https://github.com/t- ivanov/CoreBigBench DBTest 2020, June 19, 2020 10 CREATE EXTERNAL TABLE IF NOT EXISTS web_logs (line string) ROW FORMAT DELIMITED LINES TERMINATED BY 'n' STORED AS TEXTFILE LOCATION 'hdfsPath/web_logs/clicks.json';
  • 11. Queries on Structured Data • 𝑄2: List all products (items) sold together in stores with their quantity sold between 2013-04-21 and 2013-07-03. This query tests scans with low selectivity 10% filter. DBTest 2020, June 19, 2020 11 SELECT ss_item_id, ss_quantity FROM store_sales WHERE to_date(ss_ts) >= '2013-04-21' AND to_date(ss_ts) < '2013-07-03'; • 𝑄1 performs a full table scan of the store data. • We deduct the 𝑄1 operation time for queries 𝑄6 to 𝑄15 operating on the structured data. • The geometric mean of all query times in this group is 62.07 seconds.
  • 12. Queries on Semi-structured Data DBTest 2020, June 19, 2020 12 • 𝑄4: List names of all visited web pages. This query tests parsing the semi-structured web logs and scanning the parsed results. The query requires only one key from the web logs. SELECT wl_webpage_name FROM web_logs lateral view json_tuple( web_logs.line,'wl_webpage_name' )logs as wl_webpage_name WHERE wl_webpage_name IS NULL; • 𝑄4 performs a simple scan operation that involves parsing all the JSON records on the fly and extracting only the necessary attributes. • We deduct 𝑄4 operation time from all other queries in this group. • The geometric mean of all query times in this group is 525.88 seconds.
  • 13. Queries with UDF Functions DBTest 2020, June 19, 2020 13 • 𝑄22: Cluster customers into book buddies/club groups based on their in-store book purchasing histories. After model of separation is build, report for the analysed customers to which "group" they where assigned. set cluster_centers=8; set clustering_iterations=20; SELECT kmeans( collect_list(array(id1, id3, id5, id7, id9, id11, id13, id15, id2, id4, id6, id8, id10, id14, id16)), ${hiveconf:cluster_centers}, ${hiveconf:clustering_iterations}) AS out FROM q22_prep_data; • 𝑄19 and 𝑄20 operate on the semi-structured key-value data and we deduct the basic key-value scan 𝑄4 operation time. • 𝑄21 and 𝑄22 operate on the structured and unstructured data and we deduct the simple table scan 𝑄1 operation time. • The geometric mean of all query times in this group is 204.15 seconds.
  • 14. Conclusion • CoreBigBench • is a benchmark assessing the performance of core (basic) operations of big data engines like scans, two way joins, UDF functions; • consists of 22 queries applied on sales data, key-value web logs and unstructured product reviews (inspired by BigBench V2); • queries have textual definitions and reference implementation in Hive. • CoreBigBench can be used for • complimentary to end-to-end benchmarks like BigBench; • regression testing of commercial Big Data engines. • In future the CoreBigBench can be extended to include ETL, which is very basic functionality for Big Data engines. DBTest 2020, June 19, 2020 14
  • 15. Thank you for your attention! • Acknowledgments. This work has been partially funded by the European Commission H2020 project DataBench - Evidence Based Big Data Benchmarking to Improve Business Performance, under project No. 780966. This work expresses the opinions of the authors and not necessarily those of the European Commission. The European Commission is not liable for any use that may be made of the information contained in this work. The authors thank all the participants in the project for discussions and common work. www.databench.eu DBTest 2020, June 19, 2020 15
  • 16. References (1) • [C&G2010] Alain Crolotte and Ahmad Ghazal. 2010. Benchmarking Using Basic DBMS Operations. In 2nd TPC Technology Conference, TPCTC 2010, Singapore, September 13-17, 2010 • [G2013] Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: Towards An Industry Standard Benchmark for Big Data Analytics. In SIGMOD 2013. 1197–1208. • [G2017] Ahmad Ghazal, Todor Ivanov, Pekka Kostamaa, Alain Crolotte, Ryan Voong, Mohammed Al- Kateb, Waleed Ghazal, and Roberto V. Zicari. 2017. BigBench V2: The New and Improved BigBench. In ICDE 2017, San Diego, CA, USA, April 19-22. • [W1] WordCount. https://cwiki.apache.org/confluence/display/HADOOP2/WordCount • [T1] TeraSort. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package- summary.html • [P1] Package hadoop.examples.pi. http://hadoop.apache.org/docs/r0.23.11/api/org/apache/hadoop/examples/pi/package- summary.html • [D1] DFSIO benchmark. http://svn.apache.org/repos/asf/hadoop/common/tags/release- 0.13.0/src/test/org/apache/hadoop/fs/TestDFSIO.java DBTest 2020, June 19, 2020 16
  • 17. References (2) • [A2010] Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proc. of the ACM SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009. ACM, 165–178 • [A1] AMP Lab Big Data Benchmark. https://amplab.cs.berkeley.edu/benchmark/ • [S1] SparkBench. https://bitbucket.org/lm0926/sparkbench • [S2] Spark-SQL-perf. https://github.com/databricks/spark-sql-perf • [H1] HiBench Suite. https://github.com/intel-hadoop/HiBench • [H2] HiveRunner. https://github.com/klarna/HiveRunner DBTest 2020, June 19, 2020 17