9. Hadoop on Cloud
1. Features provided by AWS, IDCF, Heroku etc
2. Fast growing reliability and integrity
10. Hadoop on Cloud
1. Features provided by AWS, IDCF, Heroku etc
2. Fast growing reliability and integrity
Maintainability of Middleware
11. Agenda
• Maintainability of Distributed System
• Our Challenges
• Stateless Hive Metastore
• Cloud Storage for Hadoop
• Multiple Hadoop Version Management
• Regression Test for Hive Queries
• REST API for Hadoop
• Workflow Integration
• What we should keep in mind
44. Regression Test for Hive
• Introducing new features, version up, migration
must be done without regression
• Running integration system test and regression test
for Hive queries
57. PerfectQueue
• Highly available distributed queue build on RDBMS
• Amazon SQS like API
• Resource scheduling for multi tenancy
• Graceful and Live Restarting
https://github.com/treasure-data/perfectqueue
58. What we should
keep in mind
• Stateless
Delegate responsibility to Cloud systems
• Mobility
Looking ahead for version up, migration
• Queueing
Make each request persistent
59. Recap
• Maintainability of Distributed System
• Our Challenges
• Stateless Hive Metastore
• Cloud Storage for Hadoop
• Multiple Hadoop version management
• Regression Test for Hive queries
• REST API for Hadoop
• Workflow Integration
• What we should keep in mind