SlideShare uma empresa Scribd logo
1 de 23
Raghunath Nambiar
Chairman TPC-BD Committee
Distinguished Engineer, Cisco
rnambiar@cisco.com
Introducing TPCx-HS
Industry’s first Standard for Benchmarking Big Data Systems
TPCx–HDTPCx-HSTPCx-HS
 The Big Data technology and services market represents a fast-growing
multibillion-dollar worldwide opportunity (Source: IDC 2013)
 Big Data technology and services market will grow at a 27% compound annual
growth rate (CAGR) to $32.4 billion through 2017 - or about six times the
growth rate of the overall information and communication technology (ICT)
market (Source: IDC 2013)
 Big Data will drive $240 billion of worldwide IT spending in 2016 directly or
indirectly (Source: IDC 2013)
 64% of organizations have invested in or plan to Invest in Big Data. 30% have
already invested in big data technology, 19% plan to invest within the next year,
and an additional 15% plan to invest within two years using (Source: Gartner
2013)
 United States federal agencies spent approximately $5 billion on Big Data
resources in 2012. Estimated annual spending will grow to $6 billion in 2014 and
then to $8 billion by 2017 at a compound annual growth rate of 10 percent
(Source: Biometrics Research Group 2013)
Market Opportunity
TPCx–HDTPCx-HSTPCx-HS
What platform
to pick in terms
of performance,
price-
performance,
and energy
efficiency ?
Top Challenge for Enterprise
Customers
52%
21%
27%
Infrastructure
Software
Services
Big Data IT Spending
Source: IDC 2014
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
Motivation for A Big Data Benchmark
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS The Industry Landscape is
Changing…
Virtual Machines
Exceed Physical
Systems
50 Billion devices by 2020. 1 Trillion with IoT
Source: R. Nambiar, VLDB 2013 Vision Paper
The industry landscape is changing
quickly; Big Data (especially Hadoop)
is becoming an integral part of the IT
ecosystem
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS Not Easily Verifiable Claims and
Chaos
There are claims which are not
easily variable or comparable, due
to lack of standards
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS Remember the 1980’s ?
State of the Nature - early 1980's
the industry began a race that has accelerated over time: automation of daily end-user
business transactions. The first application that received wide-spread focus was
automated teller transactions (ATM), but we've seen this automation trend ripple through
almost every area of business, from grocery stores to gas stations. As opposed to the
batch-computing model that dominated the industry in the 1960's and 1970's, this new
online model of computing had relatively unsophisticated clerks and consumers directly
conducting simple update transactions against an on-line database system. Thus, the on-
line transaction processing industry was born, an industry that now represents billions of
dollars in annual sales.
Early Attempts at Civilized Competition
In the April 1, 1985 issue of Datamation, Jim Gray in collaboration with 24 others from
academy and industry, published (anonymously) an article titled, "A Measure of
Transaction Processing Power." This article outlined a test for on-line transaction
processing which was given the title of "DebitCredit." Unlike the TP1 benchmark, Gray's
DebitCredit benchmark specified a true system-level benchmark where the network and
user interaction components of the workload were included. In addition, it outlined
several other key features of the benchmarking process that were later incorporated into
the TPC process:
The TPC Lays Down the Law
While Gray's DebitCredit ideas were widely praised by industry opinion makers, the
DebitCredit benchmark had the same success in curbing bad benchmarking as the
prohibition did in stopping excessive drinking. In fact, according to industry analysts like
Omri Serlin, the situation only got worse. Without a standards body to supervise the
testing and publishing, vendors began to publish extraordinary marketing claims on both
TP1 and DebitCredit. They often deleted key requirements in DebitCredit to improve their
performance results.
From 1985 through 1988, vendors used TP1 and DebitCredit--or their own interpretation
of these benchmarks--to muddy the already murky performance waters. Omri Serlin had
had enough. He spearheaded a campaign to see if this mess could be straightened out.
On August 10, 1988, Serlin had successfully convinced eight companies to form the
Transaction Processing Performance Council (TPC).
The situation is similar to the 1980’s,
when industry experts were motivated to
establish the TPC and SPEC
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
TPC Express Benchmark Overview
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
TPC Express Benchmark Standard
• Easy to implement, run and publish, and less
expensive
• Test sponsor is required to use the TPCx-HS kit
as provided
• The vendor may choose an independent audit
or peer audit
– 60 day review/challenge window apply (as per TPC
policy)
• Approved by super majority of the TPC General
Council
– No Mail Ballot
• All publications must follow the TPC Fair Use Policy
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
TPCx-HS: The First Vendor-Neutral,
Industry Standard Big Data Benchmark
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
TPC-Big Data Benchmark Initiative
• TPC-BD Working Group formed in October 2013 to
evaluate big data workload(s) and make
recommendations to the TPC general council
• TPC-BD Subcommittee formed in February 2014 to
develop an Express benchmark based on already
popular TeraSort workload
– TeraSort is part of Apache Hadoop distribution. org.apache.hadoop.examples.terasort
• In July 2014 TPCx-HS became industry’s first
standard for benchmarking Big Data Systems
• TPC to continue work on other benchmark(s)
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
• TPCx-HS can be used to asses a broad range of
system topologies and implementation
methodologies, in a technically rigorous and
directly comparable, vendor-neutral manner
• Historically we have seen that industry standard
benchmarks enable healthy competition that
results in product improvements and the
evolution of new technologies.
TPCx-HS Overview
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
TPCx-HS Benchmark Defined
• x: Express, H: Hadoop, S:Sort
• Provides verifiable performance,
price/performance, availability, and optional
energy consumption metrics of big data systems
• Enable measurement of both hardware and
software including Hadoop Runtime, Hadoop
Filesystem API compatible systems and
MapReduce layers
• TPC’s first TPC Express Benchmark Standard
• Primary audience is enterprise customers (not
public clouds)
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
Scale Factors
• The TPCx-HS follows a stepped Scale factor model
(like in TPC-H and TPC-DS)
• The test dataset must be chosen from the set of
fixed Scale Factors defined as follows:
– 1TB, 3TB, 10TB, 30TB, 100TB, 300TB, 1000TB,
3000TB, 10000TB.
• The corresponding number of records are
– 10B, 30B, 100B, 300B, 1000B, 3000B, 10000B,
30000B, 100000B, where each record is 100 bytes
generated by HSGen
• The TPC will continuously evaluate adding larger
Scale Factors and retiring smaller Scale Factors
based on industry trends
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
Metrics
• TPC-xHS defines three primary metrics
– Performance Metric (effective sort throughput of
the SUT)
HSph@SF =
SF
T/3600
Where SF is the Scale Factor, T is the elapsed time for the performance run
(data generation, pre data check, sort, post data check, data validate)
– Price-Performance
$/HSph@SF =
𝑃
𝐻𝑆𝑝ℎ@𝑆𝐹
Where P is the total cost of ownership of the SUT
– Availability Date
The date when the all Components of the system under test are generally
available
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
TPCx-HS Kit
The TPCx-HS kit contains the following:
• TPCx-HS Specification document
• TPCx-HS Users Guide documentation
• Scripts to run the benchmark
• Java code to execute the benchmark load
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
Execution
• Test sponsor is required to use the TPC provides
TPCx-HS kit
• A valid run consists of five separate phases run
sequentially with overlap in their execution
• The benchmark test consists of two runs:
– The performance run is defined as the run with
the lower TPCx-HS Performance Metric
– The repeatability run is defined as the run with
the higher TPCx-HS Performance Metric.
• No configuration or tuning changes or reboot are
allowed between the two runs.
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
TPCx-H Energy Metric
• The TPCx-HS energy results are expected to be
accurate representations of system performance
and energy consumption, based on TPC-Energy
Specification
• TPCx-HS Energy metric reports the power per
performance and is expressed as
Watts/HSph@SF:
E / (T * HSph@SF)
Where E is the energy consumption for the reported run, T is the elapsed time in seconds for the
reported run, and HSph@SF is the reported performance metric
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
Backup
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
TPC Membership 2014
Universities and research organizations are encouraged to join the TPC as Associate Members.
To join the TPC: http://www.tpc.org/information/about/join.asp
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
• The TPC is a non-profit, vendor-neutral organization,
established in August 1988
• Reputation of providing the most credible performance
results to the industry.
• Role of “consumer reports” for the computing industry
• Solid foundation for complete system-level performance
• Methodology for calculating total-system-price and price-
performance
• Methodology for measuring energy efficiency of complete
system
Source: Raghunath Nambiar, Matthew Lanken, Nicholas Wakou, Forrest Carman, Michael Majdalany: Transaction Processing Performance Council (TPC): Twenty Years Later - A Look
Back, a Look Ahead, First TPC Technology Conference, TPCTC 2009, Lyon, France, ISBN 978-3-642-10423-7
TPC Overview
TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS
Staying Relevant ! (and Ahead)
• TPC Technology Conference on Performance
Evaluation and Benchmarking (TPCTC)
• WBDB Series (independent of TPC)
• TPC-Express Class Benchmark Standard vs TPC
Enterprise Class Benchmark Standard
– Long development cycle is not acceptable
• TPC-Big Data Benchmark Initiative
TPCx–HDTPCx-HSTPCx-HS
TPC Benchmark Timeline
Obsolete
Active
Common Specifications
In Progress

Mais conteúdo relacionado

Mais procurados

Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseDataWorks Summit
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowJulien Le Dem
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveYingjun Wu
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLDatabricks
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingBart Vandewoestyne
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceDataWorks Summit
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowDataWorks Summit
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016DataStax
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 

Mais procurados (20)

Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive Warehouse
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 

Destaque

Tpc Energy Publications July 2 10 B
Tpc Energy Publications July 2 10 BTpc Energy Publications July 2 10 B
Tpc Energy Publications July 2 10 BChris O'Neal
 
ドリームインキュベータ(4310)2015年3月期決算説明会資料
ドリームインキュベータ(4310)2015年3月期決算説明会資料ドリームインキュベータ(4310)2015年3月期決算説明会資料
ドリームインキュベータ(4310)2015年3月期決算説明会資料Riho Horiba
 
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreGlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreAtin Mukherjee
 
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"Dr. Mirko Kämpf
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Cloudera, Inc.
 
Taming Operations in the Hadoop Ecosystem
Taming Operations in the Hadoop EcosystemTaming Operations in the Hadoop Ecosystem
Taming Operations in the Hadoop EcosystemCloudera, Inc.
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera, Inc.
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark Summit
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...Cloudera, Inc.
 
Debugging (Docker) containers in production
Debugging (Docker) containers in productionDebugging (Docker) containers in production
Debugging (Docker) containers in productionbcantrill
 
Nested Types in Impala
Nested Types in ImpalaNested Types in Impala
Nested Types in ImpalaCloudera, Inc.
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationAlex Moundalexis
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB ProjectRakuten Group, Inc.
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopCloudera, Inc.
 
Big Data Benchmarking Tutorial
Big Data Benchmarking TutorialBig Data Benchmarking Tutorial
Big Data Benchmarking TutorialTilmann Rabl
 

Destaque (20)

Tpc Energy Publications July 2 10 B
Tpc Energy Publications July 2 10 BTpc Energy Publications July 2 10 B
Tpc Energy Publications July 2 10 B
 
Gluster d2.0
Gluster d2.0Gluster d2.0
Gluster d2.0
 
ドリームインキュベータ(4310)2015年3月期決算説明会資料
ドリームインキュベータ(4310)2015年3月期決算説明会資料ドリームインキュベータ(4310)2015年3月期決算説明会資料
ドリームインキュベータ(4310)2015年3月期決算説明会資料
 
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreGlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
 
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
 
Hadoop Puzzlers
Hadoop PuzzlersHadoop Puzzlers
Hadoop Puzzlers
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
 
Taming Operations in the Hadoop Ecosystem
Taming Operations in the Hadoop EcosystemTaming Operations in the Hadoop Ecosystem
Taming Operations in the Hadoop Ecosystem
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
Data Modeling for Data Science: Simplify Your Workload with Complex Types in ...
 
Debugging (Docker) containers in production
Debugging (Docker) containers in productionDebugging (Docker) containers in production
Debugging (Docker) containers in production
 
Nested Types in Impala
Nested Types in ImpalaNested Types in Impala
Nested Types in Impala
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
TeraSort
TeraSortTeraSort
TeraSort
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
 
Big Data Benchmarking Tutorial
Big Data Benchmarking TutorialBig Data Benchmarking Tutorial
Big Data Benchmarking Tutorial
 

Semelhante a Introducing the TPCx-HS Benchmark for Big Data

TPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsTPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsMostafa Mokhtar
 
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdfThe_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdfDotInsight1
 
TPC TC And TPC-Energy Slide Deck 5.4.09
TPC TC And TPC-Energy Slide Deck 5.4.09TPC TC And TPC-Energy Slide Deck 5.4.09
TPC TC And TPC-Energy Slide Deck 5.4.09forrestcarman
 
Raghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarksRaghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarkshdhappy001
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
 
Big Data: Operational Excellence
Big Data: Operational ExcellenceBig Data: Operational Excellence
Big Data: Operational ExcellenceMatthias Vallaey
 
Business Intelligence in Upstream-Downstream
Business Intelligence in Upstream-DownstreamBusiness Intelligence in Upstream-Downstream
Business Intelligence in Upstream-DownstreamNirav Modh
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...DataBench
 
Implementing cost accounting tools
Implementing cost accounting tools Implementing cost accounting tools
Implementing cost accounting tools Ahmad Nadeem Syed
 
Sesión 6: La experiencia de Trinidad y Tobago
Sesión 6:  La experiencia de Trinidad y TobagoSesión 6:  La experiencia de Trinidad y Tobago
Sesión 6: La experiencia de Trinidad y TobagoIndotel RD
 
TPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTilmann Rabl
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsDatabricks
 
Affinitymeterflow Sinkorswimwithsmartmeterdatamanagement
Affinitymeterflow SinkorswimwithsmartmeterdatamanagementAffinitymeterflow Sinkorswimwithsmartmeterdatamanagement
Affinitymeterflow SinkorswimwithsmartmeterdatamanagementTalyam
 
Meterflowsinkorswim
MeterflowsinkorswimMeterflowsinkorswim
Meterflowsinkorswimloichares
 
Replacement of legacy cis with sap cr&b at phi
Replacement of legacy cis with sap cr&b at phiReplacement of legacy cis with sap cr&b at phi
Replacement of legacy cis with sap cr&b at phirobgirvan
 
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...Big Data Value Association
 
Analogic Link08 Presentation
Analogic Link08 PresentationAnalogic Link08 Presentation
Analogic Link08 Presentationrtgalv
 
Seven Degrees Presentation for 2015 ICEAA
Seven Degrees Presentation for 2015 ICEAASeven Degrees Presentation for 2015 ICEAA
Seven Degrees Presentation for 2015 ICEAAJames Lawlor
 

Semelhante a Introducing the TPCx-HS Benchmark for Big Data (20)

TPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsTPC-H Column Store and MPP systems
TPC-H Column Store and MPP systems
 
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdfThe_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
 
TPC TC And TPC-Energy Slide Deck 5.4.09
TPC TC And TPC-Energy Slide Deck 5.4.09TPC TC And TPC-Energy Slide Deck 5.4.09
TPC TC And TPC-Energy Slide Deck 5.4.09
 
Raghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarksRaghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarks
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
Big Data: Operational Excellence
Big Data: Operational ExcellenceBig Data: Operational Excellence
Big Data: Operational Excellence
 
Business Intelligence in Upstream-Downstream
Business Intelligence in Upstream-DownstreamBusiness Intelligence in Upstream-Downstream
Business Intelligence in Upstream-Downstream
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
 
HPC Computing Trends
HPC Computing TrendsHPC Computing Trends
HPC Computing Trends
 
Implementing cost accounting tools
Implementing cost accounting tools Implementing cost accounting tools
Implementing cost accounting tools
 
Sesión 6: La experiencia de Trinidad y Tobago
Sesión 6:  La experiencia de Trinidad y TobagoSesión 6:  La experiencia de Trinidad y Tobago
Sesión 6: La experiencia de Trinidad y Tobago
 
TPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data Integration
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
 
Affinitymeterflow Sinkorswimwithsmartmeterdatamanagement
Affinitymeterflow SinkorswimwithsmartmeterdatamanagementAffinitymeterflow Sinkorswimwithsmartmeterdatamanagement
Affinitymeterflow Sinkorswimwithsmartmeterdatamanagement
 
Meterflowsinkorswim
MeterflowsinkorswimMeterflowsinkorswim
Meterflowsinkorswim
 
Replacement of legacy cis with sap cr&b at phi
Replacement of legacy cis with sap cr&b at phiReplacement of legacy cis with sap cr&b at phi
Replacement of legacy cis with sap cr&b at phi
 
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
 
Analogic Link08 Presentation
Analogic Link08 PresentationAnalogic Link08 Presentation
Analogic Link08 Presentation
 
Compsac 2018
Compsac 2018Compsac 2018
Compsac 2018
 
Seven Degrees Presentation for 2015 ICEAA
Seven Degrees Presentation for 2015 ICEAASeven Degrees Presentation for 2015 ICEAA
Seven Degrees Presentation for 2015 ICEAA
 

Mais de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Mais de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Último

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Último (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

Introducing the TPCx-HS Benchmark for Big Data

  • 1. Raghunath Nambiar Chairman TPC-BD Committee Distinguished Engineer, Cisco rnambiar@cisco.com Introducing TPCx-HS Industry’s first Standard for Benchmarking Big Data Systems
  • 2. TPCx–HDTPCx-HSTPCx-HS  The Big Data technology and services market represents a fast-growing multibillion-dollar worldwide opportunity (Source: IDC 2013)  Big Data technology and services market will grow at a 27% compound annual growth rate (CAGR) to $32.4 billion through 2017 - or about six times the growth rate of the overall information and communication technology (ICT) market (Source: IDC 2013)  Big Data will drive $240 billion of worldwide IT spending in 2016 directly or indirectly (Source: IDC 2013)  64% of organizations have invested in or plan to Invest in Big Data. 30% have already invested in big data technology, 19% plan to invest within the next year, and an additional 15% plan to invest within two years using (Source: Gartner 2013)  United States federal agencies spent approximately $5 billion on Big Data resources in 2012. Estimated annual spending will grow to $6 billion in 2014 and then to $8 billion by 2017 at a compound annual growth rate of 10 percent (Source: Biometrics Research Group 2013) Market Opportunity
  • 3. TPCx–HDTPCx-HSTPCx-HS What platform to pick in terms of performance, price- performance, and energy efficiency ? Top Challenge for Enterprise Customers 52% 21% 27% Infrastructure Software Services Big Data IT Spending Source: IDC 2014
  • 5. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS The Industry Landscape is Changing… Virtual Machines Exceed Physical Systems 50 Billion devices by 2020. 1 Trillion with IoT Source: R. Nambiar, VLDB 2013 Vision Paper The industry landscape is changing quickly; Big Data (especially Hadoop) is becoming an integral part of the IT ecosystem
  • 6. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS Not Easily Verifiable Claims and Chaos There are claims which are not easily variable or comparable, due to lack of standards
  • 7. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS Remember the 1980’s ? State of the Nature - early 1980's the industry began a race that has accelerated over time: automation of daily end-user business transactions. The first application that received wide-spread focus was automated teller transactions (ATM), but we've seen this automation trend ripple through almost every area of business, from grocery stores to gas stations. As opposed to the batch-computing model that dominated the industry in the 1960's and 1970's, this new online model of computing had relatively unsophisticated clerks and consumers directly conducting simple update transactions against an on-line database system. Thus, the on- line transaction processing industry was born, an industry that now represents billions of dollars in annual sales. Early Attempts at Civilized Competition In the April 1, 1985 issue of Datamation, Jim Gray in collaboration with 24 others from academy and industry, published (anonymously) an article titled, "A Measure of Transaction Processing Power." This article outlined a test for on-line transaction processing which was given the title of "DebitCredit." Unlike the TP1 benchmark, Gray's DebitCredit benchmark specified a true system-level benchmark where the network and user interaction components of the workload were included. In addition, it outlined several other key features of the benchmarking process that were later incorporated into the TPC process: The TPC Lays Down the Law While Gray's DebitCredit ideas were widely praised by industry opinion makers, the DebitCredit benchmark had the same success in curbing bad benchmarking as the prohibition did in stopping excessive drinking. In fact, according to industry analysts like Omri Serlin, the situation only got worse. Without a standards body to supervise the testing and publishing, vendors began to publish extraordinary marketing claims on both TP1 and DebitCredit. They often deleted key requirements in DebitCredit to improve their performance results. From 1985 through 1988, vendors used TP1 and DebitCredit--or their own interpretation of these benchmarks--to muddy the already murky performance waters. Omri Serlin had had enough. He spearheaded a campaign to see if this mess could be straightened out. On August 10, 1988, Serlin had successfully convinced eight companies to form the Transaction Processing Performance Council (TPC). The situation is similar to the 1980’s, when industry experts were motivated to establish the TPC and SPEC
  • 9. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS TPC Express Benchmark Standard • Easy to implement, run and publish, and less expensive • Test sponsor is required to use the TPCx-HS kit as provided • The vendor may choose an independent audit or peer audit – 60 day review/challenge window apply (as per TPC policy) • Approved by super majority of the TPC General Council – No Mail Ballot • All publications must follow the TPC Fair Use Policy
  • 10. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS TPCx-HS: The First Vendor-Neutral, Industry Standard Big Data Benchmark
  • 11. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS TPC-Big Data Benchmark Initiative • TPC-BD Working Group formed in October 2013 to evaluate big data workload(s) and make recommendations to the TPC general council • TPC-BD Subcommittee formed in February 2014 to develop an Express benchmark based on already popular TeraSort workload – TeraSort is part of Apache Hadoop distribution. org.apache.hadoop.examples.terasort • In July 2014 TPCx-HS became industry’s first standard for benchmarking Big Data Systems • TPC to continue work on other benchmark(s)
  • 12. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS • TPCx-HS can be used to asses a broad range of system topologies and implementation methodologies, in a technically rigorous and directly comparable, vendor-neutral manner • Historically we have seen that industry standard benchmarks enable healthy competition that results in product improvements and the evolution of new technologies. TPCx-HS Overview
  • 13. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS TPCx-HS Benchmark Defined • x: Express, H: Hadoop, S:Sort • Provides verifiable performance, price/performance, availability, and optional energy consumption metrics of big data systems • Enable measurement of both hardware and software including Hadoop Runtime, Hadoop Filesystem API compatible systems and MapReduce layers • TPC’s first TPC Express Benchmark Standard • Primary audience is enterprise customers (not public clouds)
  • 14. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS Scale Factors • The TPCx-HS follows a stepped Scale factor model (like in TPC-H and TPC-DS) • The test dataset must be chosen from the set of fixed Scale Factors defined as follows: – 1TB, 3TB, 10TB, 30TB, 100TB, 300TB, 1000TB, 3000TB, 10000TB. • The corresponding number of records are – 10B, 30B, 100B, 300B, 1000B, 3000B, 10000B, 30000B, 100000B, where each record is 100 bytes generated by HSGen • The TPC will continuously evaluate adding larger Scale Factors and retiring smaller Scale Factors based on industry trends
  • 15. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS Metrics • TPC-xHS defines three primary metrics – Performance Metric (effective sort throughput of the SUT) HSph@SF = SF T/3600 Where SF is the Scale Factor, T is the elapsed time for the performance run (data generation, pre data check, sort, post data check, data validate) – Price-Performance $/HSph@SF = 𝑃 𝐻𝑆𝑝ℎ@𝑆𝐹 Where P is the total cost of ownership of the SUT – Availability Date The date when the all Components of the system under test are generally available
  • 16. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS TPCx-HS Kit The TPCx-HS kit contains the following: • TPCx-HS Specification document • TPCx-HS Users Guide documentation • Scripts to run the benchmark • Java code to execute the benchmark load
  • 17. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS Execution • Test sponsor is required to use the TPC provides TPCx-HS kit • A valid run consists of five separate phases run sequentially with overlap in their execution • The benchmark test consists of two runs: – The performance run is defined as the run with the lower TPCx-HS Performance Metric – The repeatability run is defined as the run with the higher TPCx-HS Performance Metric. • No configuration or tuning changes or reboot are allowed between the two runs.
  • 18. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS TPCx-H Energy Metric • The TPCx-HS energy results are expected to be accurate representations of system performance and energy consumption, based on TPC-Energy Specification • TPCx-HS Energy metric reports the power per performance and is expressed as Watts/HSph@SF: E / (T * HSph@SF) Where E is the energy consumption for the reported run, T is the elapsed time in seconds for the reported run, and HSph@SF is the reported performance metric
  • 20. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS TPC Membership 2014 Universities and research organizations are encouraged to join the TPC as Associate Members. To join the TPC: http://www.tpc.org/information/about/join.asp
  • 21. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS • The TPC is a non-profit, vendor-neutral organization, established in August 1988 • Reputation of providing the most credible performance results to the industry. • Role of “consumer reports” for the computing industry • Solid foundation for complete system-level performance • Methodology for calculating total-system-price and price- performance • Methodology for measuring energy efficiency of complete system Source: Raghunath Nambiar, Matthew Lanken, Nicholas Wakou, Forrest Carman, Michael Majdalany: Transaction Processing Performance Council (TPC): Twenty Years Later - A Look Back, a Look Ahead, First TPC Technology Conference, TPCTC 2009, Lyon, France, ISBN 978-3-642-10423-7 TPC Overview
  • 22. TPCx–HDTPCx-HSTPCx-HSTPCx-HSPCx-HS Staying Relevant ! (and Ahead) • TPC Technology Conference on Performance Evaluation and Benchmarking (TPCTC) • WBDB Series (independent of TPC) • TPC-Express Class Benchmark Standard vs TPC Enterprise Class Benchmark Standard – Long development cycle is not acceptable • TPC-Big Data Benchmark Initiative