The current Hadoop ecosystem is challenged and slowed by fragmented and duplicated efforts.
An industry standard is required that translates to immediate benefits that will increase stability, capabilities and compatibility among Hadoop distributions. Its also important to include an open data management core with emphasis on making it enterprise focused.
The ODPi is a shared industry effort focused on build such standards and also promoting and advancing the state of Big Data technologies. Linaro is actively involved in this effort and also to make sure ODPi is ARM compatible.
This talk will go over some of specifications defined, Linaro's contributions, Roadmap and a quick demo
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
BKK16-400B ODPI - Standardizing Hadoop
1. Presented by
Date
Event
ODPi (Open Data Platform Initiative)
Ganesh Raju & Naresh Bhat
Big Data Team, LEG
BKK14-400B March 10, 2016
Linaro Connect BKK16
Standardizing Hadoop Ecosystem
2. Why ODPi ?
● Hadoop is a collection of 47+ components
○ Compatibility issues
○ Insufficient documentation
○ No proper integrated tests
○ Maintenance nightmare
■ Release cycle timelines
■ Api changes
■ Config files
● Help minimize the fragmentation and duplication of effort
within the industry
3. In the table below, you can see who supports what. Other projects supported by all the vendors include HBase,
Hive, Pig, Spark, and Zookeeper – for a total of 8 projects supported by all. Potential for more.
4. What is ODPi ?
● A shared industry effort to advance the state of the Apache
Hadoop and Big Data technologies for the enterprise
● Provide a well integrated and tested stable base
● Bringing real enterprise demands to align with the
developer community
● Configuration optimizations
● Best practices and standards
● Platform agnostic (ARM, X86, etc.)
● Certification programs
● Backed by Linux Foundation
5. Benefits for ISVs
● Eliminates fragmentation, reduces costs, accelerates time to market
○ Cost of maintaining external open source components.
○ Insufficient documentation
● Promotes compatibility between distros.
● Members can focus on innovation and differentiated value-add
● Already companies like Pivotal, Hortonworks, IBM and Altiscale have big
data solutions based on ODPi specs
6.
7. ODPi Components
● ODPi consists of 2 stacks
○ Runtime stack
○ Management stack
● ODPi currently consists of only 3 components in Runtime stack and Ambari
on the management stack
8. ODPi - Current status
● Automated CI loop builds are established. Local Nexus repository is setup to store all artifacts
instead of getting from apache
● Utilizing BigTop as single tool for CI build, deployment and automated tests
● Defining more smoke tests.
● Spec for runtime stack is in draft, publically shared
● Working on certification specs for distros
● Technical progress can be found at: https://github.com/odpi
Challenges:
● Concession between members.
● Management tool like Ambari
9. Linaro is a Member ODPi
● Enablement of AARCH64
● Tested on Member Platforms
● Participate in Technical Spec
● Provide feedback to the community
● Provide Engineering efforts
10. Linaro’s contribution
● Reference ARM64 build
● Patches to Apache Bigtop
● ODPi tested against multi node ARM based clusters
● Benchmarking and performance tuning for ARM
● JDK optimizations
● ARM based Developer Cloud for ODPi members to set up CI builds and
tests
11. ODPi Installation and Run Instructions
● ODPi specs can be found here: https://github.com/odpi/specs/blob/v1.0.0-
runtime-draft/ODPi-Operations.md
● ODPi deb and rpm packages can be found on Linaro repositories:
○ Debian Jessie - http://repo.linaro.org/ubuntu/linaro-overlay/
○ CentOS7 - http://repo.linaro.org/rpm/linaro-overlay/centos-7/
● ODPi Installation, setup and instructions to run
○ https://github.com/96boards/documentation/wiki/ODPi-Hadoop-Installation
12. ODPi Milestones
● RC of spec for runtime stack will be delivered on March
● First official release of ODPi will be by end of March
● Next release will be in October with a cadence of release every 6 months
● Voting procedure to be defined by April to let members choose the next
important component to be added into ODPi (Could be HBase / Hive /
Spark)
13. ODPi Roadmap
● Add more tests
● Certification program with tests for members to validate against them
● Expanding tests to applications
● Containerize ODPi
● Make ODPi more cloud friendly, make Ambari-like management stack
more cloud integrated
● Grow the footprint and have as many components included as possible