SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
Kite SDK: It’s for developers
Ryan Blue, Software Engineer
Resources
©2014 Cloudera, Inc. All rights reserved.
• Kite guide
• http://tiny.cloudera.com/KiteGuide
• Dataset overview and intro
• http://tiny.cloudera.com/Datasets
• Command-line tutorial
• http://tiny.cloudera.com/KiteCLI
• Kite repository and examples
• https://github.com/kite-sdk/kite
• https://github.com/kite-sdk/kite-examples
Agenda
©2014 Cloudera, Inc. All rights reserved.
• Kite background
• Kite data
What problem does Kite solve?
©2014 Cloudera, Inc. All rights reserved.
• Accessibility for getting started
• Easy to get started, without being an expert
• Use before understanding
• Save time for experienced developers
• Off-the-shelf tools for common tasks
• Quickly iterate and test configurations
Kite Datasets: Motivation
©2014 Cloudera, Inc. All rights reserved.
• Focus on using data, not managing files
• Developers shouldn’t have to maintain data files
• Use through configuration, not code
• Need consistency across the platform
Kite Datasets: Motivation
©2014 Cloudera, Inc. All rights reserved.
Application
Database
Data files
User code
Provided
Maintained by the database
Kite Datasets: Motivation
©2014 Cloudera, Inc. All rights reserved.
Application Application
Database
Data files
Data files HBase
User code
Kite Datasets: Motivation
©2014 Cloudera, Inc. All rights reserved.
Application ApplicationApplication
Database
Data files
Data files
Kite Data
HBase
Data files HBase
Maintained by the Kite
Kite Datasets: Goals
©2014 Cloudera, Inc. All rights reserved.
• Think in terms of data: datasets, views, records
• Describe data, layout and Kite does the right thing
• Should work consistently across the platform
• Reliable
Kite Datasets: Compatibility
©2014 Cloudera, Inc. All rights reserved.
Project HDFS (avro) HDFS (parquet) HBase
Kite 1.0 1.0 1.0
Flume Sink 1.0 1.0 1.0
MapReduce 1.0 1.0 1.0
Crunch 1.0 1.0 1.0
Hive 1.0 1.0 1.1
Impala 1.0 1.0 *
* depends on common HBase encoding format
Current compatibility (0.15.0)
©2014 Cloudera, Inc. All rights reserved.
Project HDFS (avro) HDFS (parquet) HBase
Kite 1.0 1.0 1.0
Flume Sink 1.0 1.0 1.0
MapReduce 1.0 1.0 1.0
Crunch 1.0 1.0 1.0
Hive 1.0 1.0 1.1
Impala 1.0 1.0 *
* depends on common HBase encoding format
Agenda
©2014 Cloudera, Inc. All rights reserved.
• Kite background
• Kite data
Application
Kite Data
Data files HBase
Maintained by the Kite
Datasets
©2014 Cloudera, Inc. All rights reserved.
• A collection of records or entities
• Like a Hive or relational table
• Generic, reflected, or generated objects
• Identified by URI
• dataset:hdfs:/data/ratings
• dataset:hive:/data/ratings
• dataset:hbase:zk1/ratings
ratings = Datasets.load("dataset:hive:/data/ratings")
Dataset configuration, JSON
©2014 Cloudera, Inc. All rights reserved.
• Schema (Avro)
• Record fields, like a table definition
Dataset configuration, JSON
©2014 Cloudera, Inc. All rights reserved.
• Schema (Avro)
• Record fields, like a table definition
• Partition strategy
• Layout or key definition from record fields
Configuring partitioning
©2014 Cloudera, Inc. All rights reserved.
• Partition strategy
[ {
"source" : "timestamp",
"type" : "year"
}, {
"source" : "timestamp",
"type" : "month"
}, {
"source" : "timestamp",
"type" : "day"
} ]
datasets/
└── ratings/
├── year=1997/
│ ├── month=09/
│ │ ├── day=20/
│ │ ├── ...
│ │ └── day=30/
│ ├── month=10/
│ │ ├── day=01/
│ │ ├── ...
Configuring key building
©2014 Cloudera, Inc. All rights reserved.
• Partition strategy for HBase
[ {
"source" : "email",
"type" : "hash",
"buckets": 32
}, {
"source" : "email",
"type" : "identity"
} ]
(22, "buzz@pixar.com")
x80x00x00x16buzz@pixar.comx00x00
Dataset configuration, JSON
©2014 Cloudera, Inc. All rights reserved.
• Schema (Avro)
• Record fields, like a table definition
• Partition strategy
• Layout or key definition from record fields
• Column mapping (HBase)
• Where to store record fields
{
"type" : "record",
"name" : "User",
"fields" : [ {
"name" : "email",
"type" : "string"
}, ... ]
}
Mapping example
©2014 Cloudera, Inc. All rights reserved.
family name counts prefs
row key last first visits flash
buzz@pixar.com Lightyear Buzz 315 true
[
{ "source": "email",
"type": "key" },
...
]
{
"type" : "record",
"name" : "User",
"fields" : [ {
"name" : "lastName",
"type" : "string"
}, ... ]
}
Mapping example
©2014 Cloudera, Inc. All rights reserved.
family name counts prefs
row key last first visits flash
buzz@pixar.com Lightyear Buzz 315 true
[
{ "source": "lastName",
"type": "column",
"family": "name",
"qualifier": "last" },
...
]
Command-line demo?
©2014 Cloudera, Inc. All rights reserved.
1. Describe your data
dataset obj-schema org.movielens.Rating --jar app.jar 
--output rating.avsc
2. Describe your layout
dataset partition-config ts:year ts:month ts:day 
--schema rating.avsc --output ymd.json
3. Create a dataset
dataset create ratings --schema rating.avsc 
--partition-by ymd.json
Command-line tool
©2014 Cloudera, Inc. All rights reserved.
• Executable jar download
• Inspects the environment
• Must be used on-cluster
• Classpath for HBase, Hive, etc.
• Debugging:
debug=true ./dataset -v <command>
• Requires MAPRED_HOME variable on CDH5
Resources
©2014 Cloudera, Inc. All rights reserved.
• Kite guide
• http://tiny.cloudera.com/KiteGuide
• Dataset overview and intro
• http://tiny.cloudera.com/Datasets
• Command-line tutorial
• http://tiny.cloudera.com/KiteCLI
• Kite repository and examples
• https://github.com/kite-sdk/kite
• https://github.com/kite-sdk/kite-examples
Questions
©2014 Cloudera, Inc. All rights reserved.
Ryan Blue: blue@cloudera.com
Kite mailing list: cdk-dev@cloudera.org
Maven parent POM
©2014 Cloudera, Inc. All rights reserved.
• Automatic Kite and Hadoop dependencies
• Inherit from kite-app-parent-cdh4
• CDH4 only, CDH5 support in 0.16.0
<parent>
<groupId>org.kitesdk</groupId>
<artifactId>kite-app-parent-cdh4</artifactId>
<version>0.15.0</version>
</parent>
Maven Plugin
©2014 Cloudera, Inc. All rights reserved.
• Maven plugin manages datasets for an application
• Configured by app-parent POM
• Handles create, update, etc. in maven goals
MapReduce
©2014 Cloudera, Inc. All rights reserved.
• DatasetKeyInputFormat
• DatasetKeyOutputFormat
• Values are always null
View eventsBeforeToday = Datasets
.load("dataset:hive:/data/events")
.toBefore("timestamp", startOfToday());
DatasetKeyInputFormat.configure(mrJob).readFrom(eventsBeforeToday);
Crunch
©2014 Cloudera, Inc. All rights reserved.
• CrunchDatasets.asSource
• CrunchDatasets.asTarget
PCollection<Event> getPipeline().read(
CrunchDatasets.asSource(eventsBeforeToday);
• Handle-existing support in 0.16.0
• Configure dependencies with Kite parent POM
DatasetSink
©2014 Cloudera, Inc. All rights reserved.
• Write to HDFS Avro and HBase
• http://tiny.cloudera.com/DatasetSink
• Proxy user support
• Automatic partitioning
agent.sinks.name.type = org.apache.flume.sink.kite.DatasetSink
agent.sinks.name.kite.repo.uri = repo:hdfs:/datasets
agent.sinks.name.kite.dataset.name = events
agent.sinks.name.auth.proxyUser = cloudera

Mais conteúdo relacionado

Mais procurados

Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopGruter
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoHyunsik Choi
 
Apache Hive authorization models
Apache Hive authorization modelsApache Hive authorization models
Apache Hive authorization modelsThejas Nair
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Cloudera, Inc.
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalogmarkgrover
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...DataWorks Summit
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondGruter
 
What is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaWhat is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaEdureka!
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalDataWorks Summit
 
Apache ignite Datagrid
Apache ignite DatagridApache ignite Datagrid
Apache ignite DatagridSurinder Mehra
 
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperabilityHow Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperabilityUwe Korn
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hiveReza Ameri
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache SparkWes McKinney
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventApache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventGruter
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 

Mais procurados (20)

Hive Hadoop
Hive HadoopHive Hadoop
Hive Hadoop
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
 
Apache Hive authorization models
Apache Hive authorization modelsApache Hive authorization models
Apache Hive authorization models
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
What is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaWhat is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | Edureka
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
Apache ignite Datagrid
Apache ignite DatagridApache ignite Datagrid
Apache ignite Datagrid
 
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperabilityHow Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hive
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventApache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 

Semelhante a Kite SDK introduction for Portland Big Data

Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark OperationsCloudera, Inc.
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended CutWes McKinney
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全Jianwei Li
 
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the FuturePython Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the FutureWes McKinney
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseCloudera, Inc.
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDataWorks Summit
 
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSWJason Hubbard
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo OverviewBill Havanki
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best PracticesCloudera, Inc.
 

Semelhante a Kite SDK introduction for Portland Big Data (20)

Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Spark etl
Spark etlSpark etl
Spark etl
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended Cut
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the FuturePython Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 

Último

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Último (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Kite SDK introduction for Portland Big Data

  • 1. Kite SDK: It’s for developers Ryan Blue, Software Engineer
  • 2. Resources ©2014 Cloudera, Inc. All rights reserved. • Kite guide • http://tiny.cloudera.com/KiteGuide • Dataset overview and intro • http://tiny.cloudera.com/Datasets • Command-line tutorial • http://tiny.cloudera.com/KiteCLI • Kite repository and examples • https://github.com/kite-sdk/kite • https://github.com/kite-sdk/kite-examples
  • 3. Agenda ©2014 Cloudera, Inc. All rights reserved. • Kite background • Kite data
  • 4. What problem does Kite solve? ©2014 Cloudera, Inc. All rights reserved. • Accessibility for getting started • Easy to get started, without being an expert • Use before understanding • Save time for experienced developers • Off-the-shelf tools for common tasks • Quickly iterate and test configurations
  • 5. Kite Datasets: Motivation ©2014 Cloudera, Inc. All rights reserved. • Focus on using data, not managing files • Developers shouldn’t have to maintain data files • Use through configuration, not code • Need consistency across the platform
  • 6. Kite Datasets: Motivation ©2014 Cloudera, Inc. All rights reserved. Application Database Data files User code Provided Maintained by the database
  • 7. Kite Datasets: Motivation ©2014 Cloudera, Inc. All rights reserved. Application Application Database Data files Data files HBase User code
  • 8. Kite Datasets: Motivation ©2014 Cloudera, Inc. All rights reserved. Application ApplicationApplication Database Data files Data files Kite Data HBase Data files HBase Maintained by the Kite
  • 9. Kite Datasets: Goals ©2014 Cloudera, Inc. All rights reserved. • Think in terms of data: datasets, views, records • Describe data, layout and Kite does the right thing • Should work consistently across the platform • Reliable
  • 10. Kite Datasets: Compatibility ©2014 Cloudera, Inc. All rights reserved. Project HDFS (avro) HDFS (parquet) HBase Kite 1.0 1.0 1.0 Flume Sink 1.0 1.0 1.0 MapReduce 1.0 1.0 1.0 Crunch 1.0 1.0 1.0 Hive 1.0 1.0 1.1 Impala 1.0 1.0 * * depends on common HBase encoding format
  • 11. Current compatibility (0.15.0) ©2014 Cloudera, Inc. All rights reserved. Project HDFS (avro) HDFS (parquet) HBase Kite 1.0 1.0 1.0 Flume Sink 1.0 1.0 1.0 MapReduce 1.0 1.0 1.0 Crunch 1.0 1.0 1.0 Hive 1.0 1.0 1.1 Impala 1.0 1.0 * * depends on common HBase encoding format
  • 12. Agenda ©2014 Cloudera, Inc. All rights reserved. • Kite background • Kite data Application Kite Data Data files HBase Maintained by the Kite
  • 13. Datasets ©2014 Cloudera, Inc. All rights reserved. • A collection of records or entities • Like a Hive or relational table • Generic, reflected, or generated objects • Identified by URI • dataset:hdfs:/data/ratings • dataset:hive:/data/ratings • dataset:hbase:zk1/ratings ratings = Datasets.load("dataset:hive:/data/ratings")
  • 14. Dataset configuration, JSON ©2014 Cloudera, Inc. All rights reserved. • Schema (Avro) • Record fields, like a table definition
  • 15. Dataset configuration, JSON ©2014 Cloudera, Inc. All rights reserved. • Schema (Avro) • Record fields, like a table definition • Partition strategy • Layout or key definition from record fields
  • 16. Configuring partitioning ©2014 Cloudera, Inc. All rights reserved. • Partition strategy [ { "source" : "timestamp", "type" : "year" }, { "source" : "timestamp", "type" : "month" }, { "source" : "timestamp", "type" : "day" } ] datasets/ └── ratings/ ├── year=1997/ │ ├── month=09/ │ │ ├── day=20/ │ │ ├── ... │ │ └── day=30/ │ ├── month=10/ │ │ ├── day=01/ │ │ ├── ...
  • 17. Configuring key building ©2014 Cloudera, Inc. All rights reserved. • Partition strategy for HBase [ { "source" : "email", "type" : "hash", "buckets": 32 }, { "source" : "email", "type" : "identity" } ] (22, "buzz@pixar.com") x80x00x00x16buzz@pixar.comx00x00
  • 18. Dataset configuration, JSON ©2014 Cloudera, Inc. All rights reserved. • Schema (Avro) • Record fields, like a table definition • Partition strategy • Layout or key definition from record fields • Column mapping (HBase) • Where to store record fields
  • 19. { "type" : "record", "name" : "User", "fields" : [ { "name" : "email", "type" : "string" }, ... ] } Mapping example ©2014 Cloudera, Inc. All rights reserved. family name counts prefs row key last first visits flash buzz@pixar.com Lightyear Buzz 315 true [ { "source": "email", "type": "key" }, ... ]
  • 20. { "type" : "record", "name" : "User", "fields" : [ { "name" : "lastName", "type" : "string" }, ... ] } Mapping example ©2014 Cloudera, Inc. All rights reserved. family name counts prefs row key last first visits flash buzz@pixar.com Lightyear Buzz 315 true [ { "source": "lastName", "type": "column", "family": "name", "qualifier": "last" }, ... ]
  • 21. Command-line demo? ©2014 Cloudera, Inc. All rights reserved. 1. Describe your data dataset obj-schema org.movielens.Rating --jar app.jar --output rating.avsc 2. Describe your layout dataset partition-config ts:year ts:month ts:day --schema rating.avsc --output ymd.json 3. Create a dataset dataset create ratings --schema rating.avsc --partition-by ymd.json
  • 22. Command-line tool ©2014 Cloudera, Inc. All rights reserved. • Executable jar download • Inspects the environment • Must be used on-cluster • Classpath for HBase, Hive, etc. • Debugging: debug=true ./dataset -v <command> • Requires MAPRED_HOME variable on CDH5
  • 23. Resources ©2014 Cloudera, Inc. All rights reserved. • Kite guide • http://tiny.cloudera.com/KiteGuide • Dataset overview and intro • http://tiny.cloudera.com/Datasets • Command-line tutorial • http://tiny.cloudera.com/KiteCLI • Kite repository and examples • https://github.com/kite-sdk/kite • https://github.com/kite-sdk/kite-examples
  • 24. Questions ©2014 Cloudera, Inc. All rights reserved. Ryan Blue: blue@cloudera.com Kite mailing list: cdk-dev@cloudera.org
  • 25. Maven parent POM ©2014 Cloudera, Inc. All rights reserved. • Automatic Kite and Hadoop dependencies • Inherit from kite-app-parent-cdh4 • CDH4 only, CDH5 support in 0.16.0 <parent> <groupId>org.kitesdk</groupId> <artifactId>kite-app-parent-cdh4</artifactId> <version>0.15.0</version> </parent>
  • 26. Maven Plugin ©2014 Cloudera, Inc. All rights reserved. • Maven plugin manages datasets for an application • Configured by app-parent POM • Handles create, update, etc. in maven goals
  • 27. MapReduce ©2014 Cloudera, Inc. All rights reserved. • DatasetKeyInputFormat • DatasetKeyOutputFormat • Values are always null View eventsBeforeToday = Datasets .load("dataset:hive:/data/events") .toBefore("timestamp", startOfToday()); DatasetKeyInputFormat.configure(mrJob).readFrom(eventsBeforeToday);
  • 28. Crunch ©2014 Cloudera, Inc. All rights reserved. • CrunchDatasets.asSource • CrunchDatasets.asTarget PCollection<Event> getPipeline().read( CrunchDatasets.asSource(eventsBeforeToday); • Handle-existing support in 0.16.0 • Configure dependencies with Kite parent POM
  • 29. DatasetSink ©2014 Cloudera, Inc. All rights reserved. • Write to HDFS Avro and HBase • http://tiny.cloudera.com/DatasetSink • Proxy user support • Automatic partitioning agent.sinks.name.type = org.apache.flume.sink.kite.DatasetSink agent.sinks.name.kite.repo.uri = repo:hdfs:/datasets agent.sinks.name.kite.dataset.name = events agent.sinks.name.auth.proxyUser = cloudera