2. This material is protected by copyright. All rights Reserved by Pivotal Software
We have covered the
“What” and the “How”...A Quick Recap
3. This material is protected by copyright. All rights Reserved by Pivotal Software
Cloud Native Data Processing with Spring - Taxonomy
Spring Cloud Dataflow
• Intuitive DSL
• UI, Shell, REST API
• Repositories for Stream and Task definitions
Spring Cloud Stream
• Messaging via a broker (pluggable Binders)
• Partitioning, consumer groups for scalability
• Package components as Spring Boot apps
• OTB common Sources, Processors and Sinks
Spring Integration
• Enterprise Integration Patterns
• Wire-up message-based POJOs
• Point-to-point and pub-sub
• Enterprise adaptors: JDBC, JMS, ...
Spring Cloud Task
• Deploy, execute and on-demand removal of short
lived processes
• Results persist beyond the life of the task
• Ready made common task components
Spring Batch
• File and bulk data processing
• Job tracking, retries and restart
• Parallel and partial processing
• Periodic commit
• Load balancing, scale, allocate resources
• Manage deployments on Cloud Foundry and
Kubernetes
4. This material is protected by copyright. All rights Reserved by Pivotal Software
Cloud Native Batch Processing Java Stack
Spring Boot
Spring Batch
Spring Cloud Task
Scheduler
5. This material is protected by copyright. All rights Reserved by Pivotal Software
Why
6. This material is protected by copyright. All rights Reserved by Pivotal Software
Current State of Batch Processing in Enterprise
7. This material is protected by copyright. All rights Reserved by Pivotal Software
Technical Challenges
Data Pipeline
Scale
License Cost
Over-run
Upgrade
Interoperate
Integrate
CVE Mitigation
Source Destination
Processing
Step
Processing
Step
Processing
Step
Processing
Step
CI/CD
Cloud Deployment
8. This material is protected by copyright. All rights Reserved by Pivotal Software
Cover w/ Image
Cultural Challenges
■ Consistency in expertise with ad-hoc
tooling
■ Applying distributed software
engineering best practices
■ Quality assurance and data governance
■ Pets vs. cattle
This material is protected by copyright. All rights Reserved by Pivotal Software
9. This material is protected by copyright. All rights Reserved by Pivotal Software
Data is hard to access and
integrate with
Agility in data processing and access
Licenses and scaling were
expensive
Lower provisioning and deployment
cost
IT Ops and DBAs acting as a
gate or constantly fighting fires
Shared responsibility with DevOps
Tightly coupled monolithic
architecture
Loosely-coupled and independently
deployed microservices
Current State Desired State
10. This material is protected by copyright. All rights Reserved by Pivotal Software
Methodologies Deployment
Change
management
nightmare!
On-demand
provisioning and
launch
Architecture Technologies Operations
Do-It-Yourself / legacy
tooling
Containers,
Public / Private /
Hybrid Cloud
Monolithic ETL /
curation / distribution
Data Manufacturing
Microservices
Linear / sequential /
waterfall changes
Agile, TDD,
DevOps
CI/CD Pipelines
Many tools, ad hoc
automation
Manage jobs securely
at scale
Evolving Tools and Practices
11. This material is protected by copyright. All rights Reserved by Pivotal Software
Where
12. This material is protected by copyright. All rights Reserved by Pivotal Software
Replatforming
Before
● Jobs written in Java running on server farms
with over-provisioned capacity scheduled by a
legacy scheduler
After
● Containerized Spring Boot tasks provisioned
on-demand and orchestrated in the cloud
Modernization
Before
● Batch processing with traditional ETL tools,
nightly SQL stored procedure runs, scheduled
jobs on the mainframe, ...
After
● Decomposing data workloads into a
microservices architecture and running on a
cloud native platform
APP
APP
APP APP
APP
APP
APP
µService
µService
µService
ETL, File
processing,
Stored-PROC
13. This material is protected by copyright. All rights Reserved by Pivotal Software
Where NOT to apply is as
important - if not more
than - where and how to
apply
Be Bold but not Blind
14. This material is protected by copyright. All rights Reserved by Pivotal Software
One Size Does NOT Fit All
● Understand the total cost: Not all batch processing needs to
be custom developed
● Understand the need: Do not fix something that is not broken
and no silver bullets
● Understand the skillset: this is not a tool for business analysts,
citizen developers or DBAs
● Understand the scale: too small a problem or too big of a data
set
● Understand the value: non-strategic projects or those that have
marginal cost to operate
How does this fit in the overall
value stream?
How much is the ops overhead for
operating these legacy batch
processes?
What’s total the middleware
licensing, OS, VM, hardware cost
to run?
What tools & automation are used
for these operations?
What are the Service Level
Objectives for maintaining,
patching and upgrading these
environments today?
15. This material is protected by copyright. All rights Reserved by Pivotal Software
Who
16. This material is protected by copyright. All rights Reserved by Pivotal Software
Data reconciliation in
Financial Services
Nightly reconciliation of
financial transactions
and report generation.
Bulk file processing in
the Insurance and
Health Care industries
Reliably ingest bulk data
into operational data
stores
Mainframe offload in
Retail and Telco
Orchestration and
scheduling of batch jobs
Industry Use Case
17. This material is protected by copyright. All rights Reserved by Pivotal Software
CoreLogic Case Study
A real estate data “value chain” company
This material is protected by copyright. All rights Reserved by Pivotal Software and CoreLogic