SlideShare uma empresa Scribd logo
1 de 41
Kite SDK: HBase Datasets
Ryan Blue, Software Engineer
What problem is Kite solving?
©2014 Cloudera, Inc. All rights reserved.
• Accessibility
• Hadoop is flexible, but low level
• Should be easy to use, without being an expert
Kite SDK
©2014 Cloudera, Inc. All rights reserved.
• A set of off-the-shelf tools
• Based on experience and best practices
• Lets you focus on your problem
• Helps you solve new challenges
Kite Datasets: Motivation
©2014 Cloudera, Inc. All rights reserved.
Focus on using your data, not managing it
• You shouldn’t have to maintain data files
• This is the first thing you need
Kite Datasets: Motivation
©2014 Cloudera, Inc. All rights reserved.
Application
Database
Data files
Your code
Provided
Maintained by the database
Kite Datasets: Motivation
©2014 Cloudera, Inc. All rights reserved.
Application Application
Database
Data files
Data files HBase
Your code
Kite Datasets: Motivation
©2014 Cloudera, Inc. All rights reserved.
Application ApplicationApplication
Database
Data files
Data files
Kite Data
HBase
Data files HBase
Maintained by the Kite
Kite Datasets: Goals
©2014 Cloudera, Inc. All rights reserved.
• Think in terms of data, not files
• Describe your data and Kite does the right thing
• Should work consistently across the platform
• Reliable
Kite Datasets: Compatibility
©2014 Cloudera, Inc. All rights reserved.
Project HDFS (avro) HDFS (parquet) HBase
Flume Sink 1.0 1.0 1.0
MapReduce 1.0 1.0 1.0
Crunch 1.0 1.0 1.0
Hive 1.0 1.0 1.1
Impala 1.0 1.0 *
* depends on common HBase encoding format
Kite Datasets: What is it?
©2014 Cloudera, Inc. All rights reserved.
• A high-level API for data management
• Work with records and datasets
• Not files, directories, or byte arrays
• Standard descriptions for records and storage
• Schemas describe records
• Partition strategies describe layout
• Opinionated
Kite Datasets: Example
©2014 Cloudera, Inc. All rights reserved.
1. Describe your data
dataset obj-schema org.movielens.Rating --jar app.jar 
--output rating.avsc
Kite Datasets: Example
©2014 Cloudera, Inc. All rights reserved.
1. Describe your data
dataset obj-schema org.movielens.Rating --jar app.jar 
--output rating.avsc
1. Describe your layout
dataset partition-config ts:year ts:month ts:day 
--schema rating.avsc --output ymd.json
Kite Datasets: Example
©2014 Cloudera, Inc. All rights reserved.
1. Describe your data
dataset obj-schema org.movielens.Rating --jar app.jar 
--output rating.avsc
1. Describe your layout
dataset partition-config ts:year ts:month ts:day 
--schema rating.avsc --output ymd.json
1. Create a dataset
dataset create ratings --schema rating.avsc 
--partition-by ymd.json
Kite Datasets: Example
©2014 Cloudera, Inc. All rights reserved.
datasets/
└── ratings/
├── year=1997/
│ ├── month=09/
│ │ ├── day=20/
│ │ ├── ...
│ │ └── day=30/
│ ├── month=10/
│ │ ├── day=01/
│ │ ├── ...
Kite SDK: HBase Datasets
Ryan Blue, Software Engineer
Kite HBase: Background
©2014 Cloudera, Inc. All rights reserved.
Application ApplicationApplication
Database
Data files
Data files
Kite Data
HBase
Data files HBase
Maintained by the Kite
Kite HBase: Background
©2014 Cloudera, Inc. All rights reserved.
• Rows identified by keys, managed by HBase
• Columns are organized as cells
• Cells are identified by column family, qualifier
• The catch: everything is a byte array
family name ...
row key last first ...
buzz@pixar.com Lightyear Buzz ...
• Uniform interaction with HBase and HDFS datasets
• Need to make keys from records
• Need configuration to map fields to cells
Kite HBase
©2014 Cloudera, Inc. All rights reserved.
Kite HBase: Partitioning
©2014 Cloudera, Inc. All rights reserved.
• Use partition strategy to define unique keys
• Kite builds the key from each record
• Kite translates keys to HBase row id bytes
Kite HBase: Partitioning
©2014 Cloudera, Inc. All rights reserved.
• Partition strategy produces a storage key
• HDFS partitioning uses a group key
1403028411014 => (2014, 6, 17)
• HBase partitioning uses a unique key
• Grouping is done dynamically by HBase
1403028411014 => (1403028411014)
Kite HBase: Example partitioning
©2014 Cloudera, Inc. All rights reserved.
• Define key format from data
$ ./dataset partition-config --schema user.avsc 
email:copy
Kite HBase: Example partitioning
©2014 Cloudera, Inc. All rights reserved.
• Define key format from data
$ ./dataset partition-config --schema user.avsc 
email:copy
[ {
"source" : "email", "type" : "identity",
"name" : "email_copy"
} ]
Kite HBase: Example partitioning
©2014 Cloudera, Inc. All rights reserved.
$ ./dataset partition-config --schema user.avsc 
email:hash[16] email:copy
Kite HBase: Example partitioning
©2014 Cloudera, Inc. All rights reserved.
$ ./dataset partition-config --schema user.avsc 
email:hash[16] email:copy
[ {
"source" : "email", "type" : "hash",
"buckets" : 16, "name" : "email_hash"
}, {
"source" : "email", "type" : "identity",
"name" : "email_copy"
} ]
Kite HBase: Partitioning
©2014 Cloudera, Inc. All rights reserved.
• Use partition strategy to define unique keys
• Kite builds the key from each record
• Kite translates keys to HBase row id bytes
• Some operations require keys
Kite HBase: Field mapping
©2014 Cloudera, Inc. All rights reserved.
• Configure the column family and qualifier for a field
{ "email": "buzz@pixar.com",
"firstName": "Buzz", ... }
family name ...
row key last first ...
buzz@pixar.com Lightyear Buzz ...
Kite HBase: Basic column mapping
©2014 Cloudera, Inc. All rights reserved.
column
{ "source": "firstName", "type": "column",
"family": "name", "qualifier": "first" }
Kite HBase: Counter mapping
©2014 Cloudera, Inc. All rights reserved.
column
{ "source": "firstName", "type": "column",
"family": "name", "qualifier": "first" }
counter (can be incremented)
{ "source": "visits", "type": "counter"
"family": "counts", "qualifier": "visits"}
Kite HBase: Key mapping
©2014 Cloudera, Inc. All rights reserved.
key (stored in the row key using identity)
{ "source": "email", "type": "key" }
{
"type" : "record",
"name" : "User",
"fields" : [ {
"name" : "email",
"type" : "string"
}, ... ]
}
Kite HBase: Example
©2014 Cloudera, Inc. All rights reserved.
{
"type" : "record",
"name" : "User",
"fields" : [ {
"name" : "email",
"type" : "string"
}, ... ]
}
Kite HBase: Example
©2014 Cloudera, Inc. All rights reserved.
[
{ "source": "email",
"type": "key" },
...
]
{
"type" : "record",
"name" : "User",
"fields" : [ {
"name" : "email",
"type" : "string"
}, ... ]
}
Kite HBase: Example
©2014 Cloudera, Inc. All rights reserved.
family name counts prefs
row key last first visits flash
buzz@pixar.co
m
Lightyear Buzz 315 true
[
{ "source": "email",
"type": "key" },
...
]
{
"type" : "record",
"name" : "User",
"fields" : [ {
"name" : "lastName",
"type" : "string"
}, ... ]
}
Kite HBase: Example
©2014 Cloudera, Inc. All rights reserved.
{
"type" : "record",
"name" : "User",
"fields" : [ {
"name" : "lastName",
"type" : "string"
}, ... ]
}
Kite HBase: Example
©2014 Cloudera, Inc. All rights reserved.
[
{ "source": "lastName",
"type": "column",
"family": "name",
"qualifier": "last" },
...
]
{
"type" : "record",
"name" : "User",
"fields" : [ {
"name" : "lastName",
"type" : "string"
}, ... ]
}
Kite HBase: Example
©2014 Cloudera, Inc. All rights reserved.
family name counts prefs
row key last first visits flash
buzz@pixar.com Lightyear Buzz 315 true
[
{ "source": "lastName",
"type": "column",
"family": "name",
"qualifier": "last" },
...
]
{
"type" : "record",
"name" : "User",
"fields" : [ {
"name" : "visits",
"type" : "long"
}, ... ]
}
Kite HBase: Example
©2014 Cloudera, Inc. All rights reserved.
family name counts prefs
row key last first visits flash
buzz@pixar.com Lightyear Buzz 315 true
[
{ "source": "visits",
"type": "counter",
"family": "counts",
"qualifier": "visits" },
...
]
• Working with a dataset in HBase does not change
• Readers / writers are backed by scans
• CLI tools work:
dataset csv-import pixar_users.csv users --use-hbase
• Additional methods on RandomAccessDataset
• get, put, delete, increment
Kite HBase: Interaction
©2014 Cloudera, Inc. All rights reserved.
RandomAccessDataset<User> users = ...;
Key buzzEmailKey = new Key.Builder()
.add("email", "buzz@pixar.com")
.build();
User buzz = users.get(buzzEmailKey);
buzz.addPreference("flash", true);
users.put(buzz);
Kite HBase: Interaction using keys
©2014 Cloudera, Inc. All rights reserved.
• Versioning and concurrency
• Additional occVersion type, like a counter
• Rejects a put if the record has changed
• Key-as-column mapping
• Stores maps or records in a column family
• Uses the key or field name as the qualifier
Kite HBase: More features
©2014 Cloudera, Inc. All rights reserved.
• Translation between objects and byte arrays in Kite
• Configuration to define key format
• Configuration to define how fields are stored
• Decreases the code and time required to
experiment
• Key format and column mappings are hard
• Try out configurations to find the right one
Kite HBase: Conclusion
©2014 Cloudera, Inc. All rights reserved.
Questions
©2014 Cloudera, Inc. All rights reserved.
Ryan Blue: blue@cloudera.com
Kite mailing list: cdk-dev@cloudera.org

Mais conteúdo relacionado

Mais procurados

Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoHyunsik Choi
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopGruter
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondGruter
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalogmarkgrover
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalDataWorks Summit
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...DataWorks Summit
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEDataWorks Summit/Hadoop Summit
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Gruter
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDataWorks Summit
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Cloudera, Inc.
 

Mais procurados (20)

Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in Telco
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 

Semelhante a Kite SDK: Working with Datasets

HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
HBaseCon 2014-Just the Basics
HBaseCon 2014-Just the BasicsHBaseCon 2014-Just the Basics
HBaseCon 2014-Just the BasicsJesse Anderson
 
Build AWS CloudFormation Custom Resources (DEV417-R2) - AWS re:Invent 2018
Build AWS CloudFormation Custom Resources (DEV417-R2) - AWS re:Invent 2018Build AWS CloudFormation Custom Resources (DEV417-R2) - AWS re:Invent 2018
Build AWS CloudFormation Custom Resources (DEV417-R2) - AWS re:Invent 2018Amazon Web Services
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...Amazon Web Services
 
Working with Terraform on Azure
Working with Terraform on AzureWorking with Terraform on Azure
Working with Terraform on Azuretombuildsstuff
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariAlejandro Fernandez
 
EWD 3 Training Course Part 26: Event-driven Indexing
EWD 3 Training Course Part 26: Event-driven IndexingEWD 3 Training Course Part 26: Event-driven Indexing
EWD 3 Training Course Part 26: Event-driven IndexingRob Tweed
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariAlejandro Fernandez
 
Ambari Views - Overview
Ambari Views - OverviewAmbari Views - Overview
Ambari Views - OverviewHortonworks
 
The Future of Securing Access Controls in Information Security
The Future of Securing Access Controls in Information SecurityThe Future of Securing Access Controls in Information Security
The Future of Securing Access Controls in Information SecurityAmazon Web Services
 
Spark Streaming with Azure Databricks
Spark Streaming with Azure DatabricksSpark Streaming with Azure Databricks
Spark Streaming with Azure DatabricksDustin Vannoy
 
Elegant Rest Design Webinar
Elegant Rest Design WebinarElegant Rest Design Webinar
Elegant Rest Design WebinarStormpath
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine YardHow I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine YardSV Ruby on Rails Meetup
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon AthenaSungmin Kim
 
AWS CloudFormation macros: Coding best practices - MAD201 - New York AWS Summit
AWS CloudFormation macros: Coding best practices - MAD201 - New York AWS SummitAWS CloudFormation macros: Coding best practices - MAD201 - New York AWS Summit
AWS CloudFormation macros: Coding best practices - MAD201 - New York AWS SummitAmazon Web Services
 
Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSAmazon Web Services
 

Semelhante a Kite SDK: Working with Datasets (20)

HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
HBaseCon 2014-Just the Basics
HBaseCon 2014-Just the BasicsHBaseCon 2014-Just the Basics
HBaseCon 2014-Just the Basics
 
Build AWS CloudFormation Custom Resources (DEV417-R2) - AWS re:Invent 2018
Build AWS CloudFormation Custom Resources (DEV417-R2) - AWS re:Invent 2018Build AWS CloudFormation Custom Resources (DEV417-R2) - AWS re:Invent 2018
Build AWS CloudFormation Custom Resources (DEV417-R2) - AWS re:Invent 2018
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
 
Working with Terraform on Azure
Working with Terraform on AzureWorking with Terraform on Azure
Working with Terraform on Azure
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
EWD 3 Training Course Part 26: Event-driven Indexing
EWD 3 Training Course Part 26: Event-driven IndexingEWD 3 Training Course Part 26: Event-driven Indexing
EWD 3 Training Course Part 26: Event-driven Indexing
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Ambari Views - Overview
Ambari Views - OverviewAmbari Views - Overview
Ambari Views - Overview
 
The Future of Securing Access Controls in Information Security
The Future of Securing Access Controls in Information SecurityThe Future of Securing Access Controls in Information Security
The Future of Securing Access Controls in Information Security
 
Spark Streaming with Azure Databricks
Spark Streaming with Azure DatabricksSpark Streaming with Azure Databricks
Spark Streaming with Azure Databricks
 
Elegant Rest Design Webinar
Elegant Rest Design WebinarElegant Rest Design Webinar
Elegant Rest Design Webinar
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine YardHow I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
AWS CloudFormation macros: Coding best practices - MAD201 - New York AWS Summit
AWS CloudFormation macros: Coding best practices - MAD201 - New York AWS SummitAWS CloudFormation macros: Coding best practices - MAD201 - New York AWS Summit
AWS CloudFormation macros: Coding best practices - MAD201 - New York AWS Summit
 
Data freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWS
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 

Último (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

Kite SDK: Working with Datasets

  • 1. Kite SDK: HBase Datasets Ryan Blue, Software Engineer
  • 2. What problem is Kite solving? ©2014 Cloudera, Inc. All rights reserved. • Accessibility • Hadoop is flexible, but low level • Should be easy to use, without being an expert
  • 3. Kite SDK ©2014 Cloudera, Inc. All rights reserved. • A set of off-the-shelf tools • Based on experience and best practices • Lets you focus on your problem • Helps you solve new challenges
  • 4. Kite Datasets: Motivation ©2014 Cloudera, Inc. All rights reserved. Focus on using your data, not managing it • You shouldn’t have to maintain data files • This is the first thing you need
  • 5. Kite Datasets: Motivation ©2014 Cloudera, Inc. All rights reserved. Application Database Data files Your code Provided Maintained by the database
  • 6. Kite Datasets: Motivation ©2014 Cloudera, Inc. All rights reserved. Application Application Database Data files Data files HBase Your code
  • 7. Kite Datasets: Motivation ©2014 Cloudera, Inc. All rights reserved. Application ApplicationApplication Database Data files Data files Kite Data HBase Data files HBase Maintained by the Kite
  • 8. Kite Datasets: Goals ©2014 Cloudera, Inc. All rights reserved. • Think in terms of data, not files • Describe your data and Kite does the right thing • Should work consistently across the platform • Reliable
  • 9. Kite Datasets: Compatibility ©2014 Cloudera, Inc. All rights reserved. Project HDFS (avro) HDFS (parquet) HBase Flume Sink 1.0 1.0 1.0 MapReduce 1.0 1.0 1.0 Crunch 1.0 1.0 1.0 Hive 1.0 1.0 1.1 Impala 1.0 1.0 * * depends on common HBase encoding format
  • 10. Kite Datasets: What is it? ©2014 Cloudera, Inc. All rights reserved. • A high-level API for data management • Work with records and datasets • Not files, directories, or byte arrays • Standard descriptions for records and storage • Schemas describe records • Partition strategies describe layout • Opinionated
  • 11. Kite Datasets: Example ©2014 Cloudera, Inc. All rights reserved. 1. Describe your data dataset obj-schema org.movielens.Rating --jar app.jar --output rating.avsc
  • 12. Kite Datasets: Example ©2014 Cloudera, Inc. All rights reserved. 1. Describe your data dataset obj-schema org.movielens.Rating --jar app.jar --output rating.avsc 1. Describe your layout dataset partition-config ts:year ts:month ts:day --schema rating.avsc --output ymd.json
  • 13. Kite Datasets: Example ©2014 Cloudera, Inc. All rights reserved. 1. Describe your data dataset obj-schema org.movielens.Rating --jar app.jar --output rating.avsc 1. Describe your layout dataset partition-config ts:year ts:month ts:day --schema rating.avsc --output ymd.json 1. Create a dataset dataset create ratings --schema rating.avsc --partition-by ymd.json
  • 14. Kite Datasets: Example ©2014 Cloudera, Inc. All rights reserved. datasets/ └── ratings/ ├── year=1997/ │ ├── month=09/ │ │ ├── day=20/ │ │ ├── ... │ │ └── day=30/ │ ├── month=10/ │ │ ├── day=01/ │ │ ├── ...
  • 15. Kite SDK: HBase Datasets Ryan Blue, Software Engineer
  • 16. Kite HBase: Background ©2014 Cloudera, Inc. All rights reserved. Application ApplicationApplication Database Data files Data files Kite Data HBase Data files HBase Maintained by the Kite
  • 17. Kite HBase: Background ©2014 Cloudera, Inc. All rights reserved. • Rows identified by keys, managed by HBase • Columns are organized as cells • Cells are identified by column family, qualifier • The catch: everything is a byte array family name ... row key last first ... buzz@pixar.com Lightyear Buzz ...
  • 18. • Uniform interaction with HBase and HDFS datasets • Need to make keys from records • Need configuration to map fields to cells Kite HBase ©2014 Cloudera, Inc. All rights reserved.
  • 19. Kite HBase: Partitioning ©2014 Cloudera, Inc. All rights reserved. • Use partition strategy to define unique keys • Kite builds the key from each record • Kite translates keys to HBase row id bytes
  • 20. Kite HBase: Partitioning ©2014 Cloudera, Inc. All rights reserved. • Partition strategy produces a storage key • HDFS partitioning uses a group key 1403028411014 => (2014, 6, 17) • HBase partitioning uses a unique key • Grouping is done dynamically by HBase 1403028411014 => (1403028411014)
  • 21. Kite HBase: Example partitioning ©2014 Cloudera, Inc. All rights reserved. • Define key format from data $ ./dataset partition-config --schema user.avsc email:copy
  • 22. Kite HBase: Example partitioning ©2014 Cloudera, Inc. All rights reserved. • Define key format from data $ ./dataset partition-config --schema user.avsc email:copy [ { "source" : "email", "type" : "identity", "name" : "email_copy" } ]
  • 23. Kite HBase: Example partitioning ©2014 Cloudera, Inc. All rights reserved. $ ./dataset partition-config --schema user.avsc email:hash[16] email:copy
  • 24. Kite HBase: Example partitioning ©2014 Cloudera, Inc. All rights reserved. $ ./dataset partition-config --schema user.avsc email:hash[16] email:copy [ { "source" : "email", "type" : "hash", "buckets" : 16, "name" : "email_hash" }, { "source" : "email", "type" : "identity", "name" : "email_copy" } ]
  • 25. Kite HBase: Partitioning ©2014 Cloudera, Inc. All rights reserved. • Use partition strategy to define unique keys • Kite builds the key from each record • Kite translates keys to HBase row id bytes • Some operations require keys
  • 26. Kite HBase: Field mapping ©2014 Cloudera, Inc. All rights reserved. • Configure the column family and qualifier for a field { "email": "buzz@pixar.com", "firstName": "Buzz", ... } family name ... row key last first ... buzz@pixar.com Lightyear Buzz ...
  • 27. Kite HBase: Basic column mapping ©2014 Cloudera, Inc. All rights reserved. column { "source": "firstName", "type": "column", "family": "name", "qualifier": "first" }
  • 28. Kite HBase: Counter mapping ©2014 Cloudera, Inc. All rights reserved. column { "source": "firstName", "type": "column", "family": "name", "qualifier": "first" } counter (can be incremented) { "source": "visits", "type": "counter" "family": "counts", "qualifier": "visits"}
  • 29. Kite HBase: Key mapping ©2014 Cloudera, Inc. All rights reserved. key (stored in the row key using identity) { "source": "email", "type": "key" }
  • 30. { "type" : "record", "name" : "User", "fields" : [ { "name" : "email", "type" : "string" }, ... ] } Kite HBase: Example ©2014 Cloudera, Inc. All rights reserved.
  • 31. { "type" : "record", "name" : "User", "fields" : [ { "name" : "email", "type" : "string" }, ... ] } Kite HBase: Example ©2014 Cloudera, Inc. All rights reserved. [ { "source": "email", "type": "key" }, ... ]
  • 32. { "type" : "record", "name" : "User", "fields" : [ { "name" : "email", "type" : "string" }, ... ] } Kite HBase: Example ©2014 Cloudera, Inc. All rights reserved. family name counts prefs row key last first visits flash buzz@pixar.co m Lightyear Buzz 315 true [ { "source": "email", "type": "key" }, ... ]
  • 33. { "type" : "record", "name" : "User", "fields" : [ { "name" : "lastName", "type" : "string" }, ... ] } Kite HBase: Example ©2014 Cloudera, Inc. All rights reserved.
  • 34. { "type" : "record", "name" : "User", "fields" : [ { "name" : "lastName", "type" : "string" }, ... ] } Kite HBase: Example ©2014 Cloudera, Inc. All rights reserved. [ { "source": "lastName", "type": "column", "family": "name", "qualifier": "last" }, ... ]
  • 35. { "type" : "record", "name" : "User", "fields" : [ { "name" : "lastName", "type" : "string" }, ... ] } Kite HBase: Example ©2014 Cloudera, Inc. All rights reserved. family name counts prefs row key last first visits flash buzz@pixar.com Lightyear Buzz 315 true [ { "source": "lastName", "type": "column", "family": "name", "qualifier": "last" }, ... ]
  • 36. { "type" : "record", "name" : "User", "fields" : [ { "name" : "visits", "type" : "long" }, ... ] } Kite HBase: Example ©2014 Cloudera, Inc. All rights reserved. family name counts prefs row key last first visits flash buzz@pixar.com Lightyear Buzz 315 true [ { "source": "visits", "type": "counter", "family": "counts", "qualifier": "visits" }, ... ]
  • 37. • Working with a dataset in HBase does not change • Readers / writers are backed by scans • CLI tools work: dataset csv-import pixar_users.csv users --use-hbase • Additional methods on RandomAccessDataset • get, put, delete, increment Kite HBase: Interaction ©2014 Cloudera, Inc. All rights reserved.
  • 38. RandomAccessDataset<User> users = ...; Key buzzEmailKey = new Key.Builder() .add("email", "buzz@pixar.com") .build(); User buzz = users.get(buzzEmailKey); buzz.addPreference("flash", true); users.put(buzz); Kite HBase: Interaction using keys ©2014 Cloudera, Inc. All rights reserved.
  • 39. • Versioning and concurrency • Additional occVersion type, like a counter • Rejects a put if the record has changed • Key-as-column mapping • Stores maps or records in a column family • Uses the key or field name as the qualifier Kite HBase: More features ©2014 Cloudera, Inc. All rights reserved.
  • 40. • Translation between objects and byte arrays in Kite • Configuration to define key format • Configuration to define how fields are stored • Decreases the code and time required to experiment • Key format and column mappings are hard • Try out configurations to find the right one Kite HBase: Conclusion ©2014 Cloudera, Inc. All rights reserved.
  • 41. Questions ©2014 Cloudera, Inc. All rights reserved. Ryan Blue: blue@cloudera.com Kite mailing list: cdk-dev@cloudera.org