Spring for Apache Hadoop provides a collection of extensions and wrappers which make real-life project development using Hadoop technologies much easier, just like it has done for J2EE. Moreover, it doesn’t force you to dramatically change the way you develop applications – you still work with Java stack. In my presentation I’ll describe how we used Spring for Apache Hadoop to speed up and simplify development of business features using Big Data technologies. Main focus:
High level APIs for different aspects
Modularity & Testability
Integration with Spring Batch and other Spring projects
Scripting
Extensibility & Caveats
2. Agenda
• Goals of the project
• Hadoop Introduction
• High level support
• Workflows
• Scripting & Migration
• Alternatives
• Testing & Related
3. BigData–Why?
Because of Terabytes and Petabytes:
• Smart meter analysis
• Genome processing
• Sentiment & social media analysis
• Network capacity trending & management
• Ad targeting
• Fraud detection
4. Goals
• Provide programmatic model to work with
Hadoop ecosystem
• Simplify client libraries usage
• Provide Spring friendly wrappers
• Enable real-world usage as a part of
Spring Batch & Spring Integration
• Leverage Spring features
19. SpringBatch &Spring Integration
• Big Data Flows are based on Spring
Integration & Spring Batch
• Spring for Hadoop provides:
– Spring Batch tasklets
– Spring Integration support
30. SpringeXtremeData (XD)
• Ultimate data processing solution
• Implements most common approach,
business logic up to you
• On top of Spring Batch and Spring
Integration
• Has DSL
• Scalable
31. More speedups
• Use provider quick start VM for initial
development
• Use cloud based images for production
(start/stop)
• Don’t use Map/Reduce without real need.
Start with higher abstraction.
• Don’t migrate without real need!
• Invest in DevOps (Chef / Puppet /
Vagrant…)