SlideShare a Scribd company logo
1 of 70
Building Beautiful Batch Jobs !
Who says batch jobs can’t be beautiful code?

SouthBay JVM User Group (SBJUG)
Meetup - November 2013
In this tech talk we’ll cover
• Spring Batch Introduction
• Dealer.com Real World Usage
• Some Lessons Learned
About me
• Software Engineer
• Worked on complex integration projects
– CSIS, LAPD, UCLA

• Worked on one high traffic system
– Napster

• Currently at Dealer.com
• Fascinated by all things Engineering
Dealer.com
• Leader in Automotive Marketing
• 10K+ clients, 12K+ Websites
• CRM is our new product offering

• It’s definitely a great place to work. I’d
recommend it to a friend.
Believe it or not – these are actually Dealer.com’s Core Values
In this tech talk we’ll cover
• Spring Batch Introduction
• Dealer.com Real World Usage
• Some Lessons Learned
Background
• Lack of frameworks for Java-based batch
processing
• Proliferation of many one-off, in-house solutions
• SpringSource and Accenture changed this
• June 2008 – production version of Spring Batch
• Spring Batch is the only open source framework
that provides a robust, enterprise-scale solution
• Batch Application for Java Platform is coming
soon (JSR 352)
Usage Scenario
A typical batch program reads a large number of
records from a database, file, or queue, processes
the data in some fashion, and then writes back data
in a modified form
•
•
•
•
•
•

Commit batch process periodically
Sequential processing of dependent steps
Partial processing: skip records
Concurrent batch processing
Massively parallel batch processing
Manual or scheduled restart after failure
Domain Language of a Batch
• Job
• Step
• Item Reader

-

• Item Processor • Item Writer

-

•
•
•
•

-

Job Launcher
Job Repository
Job Instance
Job Execution

has one to many steps
has item reader, processor or writer
an abstraction that represents the retrieval of
input for a Step, one item at a time
an abstraction that represents the business
processing of an item
an abstraction that represents the output of a
Step, chunk of items at a time
launches jobs
store metadata about currently running jobs
an instance of a job with its unique parameters
an execution attempt of a job instance
Batch Components
Job, Job Instance, Job Execution
Job Parameters
Job – Tasklet
Job – Sequential Flow
Job – Conditional Flow
Job – Chunk Oriented Processing
Item Readers and Writers - Out of the box
Item Readers

Item Writers

AmqpItemReader

AmqpItemWriter

FlatFileItemReader

CompositeItemWriter

HibernateCursorItemReader

FlatFileItemWriter

HibernatePagingItemReader

GemfireItemWriter

IbatisPagingItemReader

HibernateItemWriter

ItemReaderAdapter

IbatisBatchItemWriter

JdbcCursorItemReader

ItemWriterAdapter

JdbcPagingItemReader

JdbcBatchItemWriter

JmsItemReader

JmsItemWriter

JpaPagingItemReader

JpaItemWriter

ListItemReader

MimeMessageItemWriter

MongoItemReader

MongoItemWriter

Neo4jItemReader

Neo4jItemWriter

RepositoryItemReader

RepositoryItemWriter

StoredProcedureItemReader

PropertyExtractingDelegatingItemWriter

StaxEventItemReader

StaxEventItemWriter
Job Repository Data Model
Let’s look at a couple of examples of building simple
Spring Batch Jobs

Example 1 – Load Flat file contents into database
Example 2 – Load XML file contents into database
Configure DataSource and Spring Batch Core Beans
spring-batch-context.xml :
Example1: Load Flat file contents into database
PERSON Table

person-data.csv

Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
John,Doe

PERSON_ID
1

JILL

DOE

2

JOE

DOE

3
Transform Data to
Upper Case

FIRST_NAME LAST_NAME

JUSTIN

DOE

4

JANE

DOE

5

JOHN

DOE
Example1: Job Config
flat-file-reader-job.xml

Chunk Processing:
• Reader – retrieves input for a Step one item at a time
• Processor – processes an item
• Writer – writes the output, one item or chunk of items at a time
Example1: Reader, Processor and Writer
flat-file-reader-job.xml (cont..d)
Example1: Person Item Processor
Example1: Test Case to Execute Flat File Reader Job
Example2: Load XML file contents into database
record-data.xml

AD_PERFORMANCE Table
ID

DATE

IMPRESSION CLICKS EARNING

1

06/01/2013

139237

57

220.90

2

06/02/2013

339100

57

320.88

3

06/03/2013

431436

57

27.80
Example2: Job Config
xml-file-reader-job.xml
Example2: Reader, JAXB Unmarshaller, Processor and Writer
xml-file-reader-job.xml (cont..d)
Example2: Record Item Processor
Example2: Ad Performance Writer
Example2: Test Case to Execute XML File Reader Job
Spring Batch Admin Webapp
Jobs
Job Executions
Job Execution Details
In this tech talk we’ll cover
• Spring Batch Introduction
• Dealer.com Real World Usage
• Some Lessons Learned
Business Problem
• CRM entering Dealer's day-to-day Operations

• We need to Pull data from Dealer’s DMS systems into CRM
• DMS Systems can be ADP or Reynolds or DealerTrack etc
Here’s a Small Big Picture
Dealer’s DMS
Systems

Dealer.com’s
DMS & CRM Systems

ADP
Extract

Reynolds

DealerTrack

DMS

Load

CRM
Typical Batch Job
• Download data from DMS Provider for a dealership
• Load the data in CRM
• Generate report on how the data was processed
ADP Vehicle Sales ETL Job Configuration
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
Pull Frequently
• We have 100s of Dealerships, so each batch Job has to be run
for a Dealer’s ADP Account
• We schedule Jobs for each dealership to pull every 4 hours
• The Job Scheduling is managed via a centralized DDC
Scheduling Server
– Clients issue scheduling requests via a command queue to the server
– The server will then fire scheduled events back onto a queue for
clients to consume
– Clients and DDC Scheduling Server communicate through a single
rabbit exchange. Each client is chooses an unique application key and
binds to this exchange to receive messages about its scheduled events
– Named ClockTower: it’s worth a separate talk in itself
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
Job Concurrency
• 100s of scheduled or manually initiated jobs can all go off at
the same time
• We want to control how many jobs should run in our Cluster
concurrently
• We used basic queuing to solve this
– all job commands go into a queue
– they get processed one at a time
– we can control how many consumers we want to allow
across the cluster
• We use Spring Integration AMQP OutBound & InBound
Adapters
Running Jobs Concurrently – Competing Consumer Pattern
DMS Service 01

Job1

Scheduled and Manually Initiated Job
Commands come through the same Queue

DMS Pull Job Queue

Job5

Job4

Job3

DMS Service 02

Job2

• Each Node is configured with multiple concurrent Consumers (3 as of now)
• As we take more Tenants we could scale horizontally by adding more Nodes
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
Data Flow Control
ADP

Extract

DMS

Load

CRM

• We need to control the load we put on the CRM system
• We don't want to EVER load too much data at the same time
• We debated two ways to solve this
– Synchronous
– Asynchronous (via Queues)
Sync vs Async Loading Data into CRM
CRM
Batch 01

DMS Service 01

Job1

CRM
Batch
Service
Load
Balancer

(SYNC)
CRM
Batch 02

CRM
Batch 01

DMS Service 01

(ASYNC)

DMS Data Load Queue
Job1

CRM
Batch 02
Synchronous
• Haproxy load balancer - cannot be scaled dynamically
• Remote call needs to be made via REST or Spring Remoting API - tightly coupled
• Client has to fail the batch job or retry the request on failure - not fault tolerant
• Nodes need to throttle the number of incoming requests (via tomcat threads) –
have to administer tomcat threads, nodes cannot be repurposed
Asynchronous
• AMQP Rabbit Queue - can be scaled dynamically
• Only contract is the 'message' being passed – some what loosely coupled
• If a node fails, message will be unacknowledged and another node will execute the
same request - fault tolerant
• Each node can control the number of concurrent queue consumers – application
configuration, nodes can be purposed
• It does incur some extra cost, message persistence & dynamic reply queues - extra
cost
We settled on loading via Queue using Spring Integration AMQP Gateways (which
are Bi-Directional), the call waits for response to come back via reply queue
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
We send out an awesome looking email notification to an internal mailing list
The CSV Report has Detailed information how each row was processed
We are working towards a UI that’ll look like this
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
Job Resiliency
ADP

Extract

DMS

Load

CRM

100s of Jobs could go off at the same time and jobs need to be resilient to
unexpected failures
• While a big job is running, CRM could crash or get restarted for deployment
• While a big job is running, DMS could crash or get restarted for deployment
In such cases, we want to rerun the job after a short while from where it left off.
• We use Spring Batch’s Job Restart-ability feature to achieve this
What could go wrong?
DMS Service 01

Job1

CRM
Batch 01

X

DMS Pull Job Queue

X

DMS Data Load Queue
DMS Service 02

Job2

CRM
Batch 02

X

X  Nodes that could just crash or could be restarted due to a deployment –
when a big job is running.

Our goal is to be able to rerun the job, and resume from where left things left off.

X
Spring Batch – Restartability
• Spring Batch maintains Job State in the database
– which Step is completed, being processed or failed
– Which item is being processed when Chunk processing
• Jobs can be restarted using the Job ExecutionId
• Spring Batch will skip over the steps and run the job from
where it left off before
• If the job had failed during Chunk processing it’ll skip
processing the items that were already processed and start
from where it left off before
When CRM goes down
DMS Service 01

CRM
Batch 01

Job1

DMS Pull Job Queue

X

DMS Data Load Queue
DMS Service 02

Job2

CRM
Batch 02

• We have a timeout of 5 minutes for the reply from CRM
• When CRM Batch Nodes are down, we’ll get a timeout Exception, which results
in a new Job Command Message to the DMS Pull Job Queue
• The message includes the JobExecutionId
• Which ever node picks up the message will resume the job from where it left
off

X
When a DMS Service Node goes down
DMS Service 01

CRM
Batch 01

Job1

DMS Pull Job Queue

DMS Data Load Queue
DMS Service 02

Job2

CRM
Batch 02

X

• When a DMS Node executing the Job goes down, the message will be
unacknowledged, and will be picked up by any other node connected to the
DMS Pull Job Queue
• The node that picks up the message will inspect if this job was already running
and stopped abruptly, and if so it’ll try to resume it from where it left off
• (This is not in production yet, its under development)
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
So, what makes it beautiful?
• Simple
– We just used the basic features of Spring Batch

• Easy to understand
– Quick look at spring configurations is all you need

• Less code
– We focused on the business logic

• Low maintenance
– Anybody can maintain it
In this tech talk we’ll cover
• Spring Batch Introduction
• Dealer.com Real World Usage
• Some Lessons Learned
On Spring Batch
•
•
•
•
•
•

Really easy to setup and user
Highly configurable
Chunk Processing is the bomb!
Beware of the commit count
The bean ‘step’ scope comes in handy
ExecutionContext is limited to 4 data types
On 3rd Party Integration
•
•
•
•
•
•

Plan for Dev & Live accounts and environments
Configure anything and everything possible
Download large files via streaming
Handle exceptions properly
Embrace data translation errors
Build jobs that are repeat runnable
Sources
• Spring Batch Reference Documentation
– http://docs.spring.io/spring-batch/reference/html-single/index.html
• Ad Performance Sample XML taken from
– http://www.mkyong.com/spring-batch/spring-batch-example-xml-fileto-database/
Questions?
Shameless Plug
Currently we have a few openings in the Manhattan Beach
office
• Java Developers
• UI Developers
• Web Developers
If interested please apply at http://careers.dealer.com/

More Related Content

What's hot

Groovy concurrency
Groovy concurrencyGroovy concurrency
Groovy concurrencyAlex Miller
 
Scaling Your Cache
Scaling Your CacheScaling Your Cache
Scaling Your CacheAlex Miller
 
Scaling Hibernate with Terracotta
Scaling Hibernate with TerracottaScaling Hibernate with Terracotta
Scaling Hibernate with TerracottaAlex Miller
 
System Integration with Akka and Apache Camel
System Integration with Akka and Apache CamelSystem Integration with Akka and Apache Camel
System Integration with Akka and Apache Camelkrasserm
 
Caching In The Cloud
Caching In The CloudCaching In The Cloud
Caching In The CloudAlex Miller
 
Concurrency in Scala - the Akka way
Concurrency in Scala - the Akka wayConcurrency in Scala - the Akka way
Concurrency in Scala - the Akka wayYardena Meymann
 
Java Enterprise Performance - Unburdended Applications
Java Enterprise Performance - Unburdended ApplicationsJava Enterprise Performance - Unburdended Applications
Java Enterprise Performance - Unburdended ApplicationsLucas Jellema
 
Distributed Model Validation with Epsilon
Distributed Model Validation with EpsilonDistributed Model Validation with Epsilon
Distributed Model Validation with EpsilonSina Madani
 
Akka Actor presentation
Akka Actor presentationAkka Actor presentation
Akka Actor presentationGene Chang
 
Above the clouds: introducing Akka
Above the clouds: introducing AkkaAbove the clouds: introducing Akka
Above the clouds: introducing Akkanartamonov
 
What’s expected in Spring 5
What’s expected in Spring 5What’s expected in Spring 5
What’s expected in Spring 5Gal Marder
 
Java Enterprise Edition Concurrency Misconceptions
Java Enterprise Edition Concurrency Misconceptions Java Enterprise Edition Concurrency Misconceptions
Java Enterprise Edition Concurrency Misconceptions Haim Yadid
 
Akka lsug skills matter
Akka lsug skills matterAkka lsug skills matter
Akka lsug skills matterSkills Matter
 
Play Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaPlay Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaYevgeniy Brikman
 
Introduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users GroupIntroduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users GroupRoy Russo
 
Lazy vs. Eager Loading Strategies in JPA 2.1
Lazy vs. Eager Loading Strategies in JPA 2.1Lazy vs. Eager Loading Strategies in JPA 2.1
Lazy vs. Eager Loading Strategies in JPA 2.1Patrycja Wegrzynowicz
 
State Machine Workflow: Esoteric Techniques & Patterns Everyone Should Buy pr...
State Machine Workflow: Esoteric Techniques & Patterns Everyone Should Buy pr...State Machine Workflow: Esoteric Techniques & Patterns Everyone Should Buy pr...
State Machine Workflow: Esoteric Techniques & Patterns Everyone Should Buy pr...European SharePoint Conference
 
Practicing Continuous Deployment
Practicing Continuous DeploymentPracticing Continuous Deployment
Practicing Continuous Deploymentzeeg
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIOliver Busse
 

What's hot (20)

Groovy concurrency
Groovy concurrencyGroovy concurrency
Groovy concurrency
 
Scaling Your Cache
Scaling Your CacheScaling Your Cache
Scaling Your Cache
 
Scaling Hibernate with Terracotta
Scaling Hibernate with TerracottaScaling Hibernate with Terracotta
Scaling Hibernate with Terracotta
 
System Integration with Akka and Apache Camel
System Integration with Akka and Apache CamelSystem Integration with Akka and Apache Camel
System Integration with Akka and Apache Camel
 
Caching In The Cloud
Caching In The CloudCaching In The Cloud
Caching In The Cloud
 
Concurrency in Scala - the Akka way
Concurrency in Scala - the Akka wayConcurrency in Scala - the Akka way
Concurrency in Scala - the Akka way
 
Java Enterprise Performance - Unburdended Applications
Java Enterprise Performance - Unburdended ApplicationsJava Enterprise Performance - Unburdended Applications
Java Enterprise Performance - Unburdended Applications
 
Cold Hard Cache
Cold Hard CacheCold Hard Cache
Cold Hard Cache
 
Distributed Model Validation with Epsilon
Distributed Model Validation with EpsilonDistributed Model Validation with Epsilon
Distributed Model Validation with Epsilon
 
Akka Actor presentation
Akka Actor presentationAkka Actor presentation
Akka Actor presentation
 
Above the clouds: introducing Akka
Above the clouds: introducing AkkaAbove the clouds: introducing Akka
Above the clouds: introducing Akka
 
What’s expected in Spring 5
What’s expected in Spring 5What’s expected in Spring 5
What’s expected in Spring 5
 
Java Enterprise Edition Concurrency Misconceptions
Java Enterprise Edition Concurrency Misconceptions Java Enterprise Edition Concurrency Misconceptions
Java Enterprise Edition Concurrency Misconceptions
 
Akka lsug skills matter
Akka lsug skills matterAkka lsug skills matter
Akka lsug skills matter
 
Play Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaPlay Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and Scala
 
Introduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users GroupIntroduction to Akka - Atlanta Java Users Group
Introduction to Akka - Atlanta Java Users Group
 
Lazy vs. Eager Loading Strategies in JPA 2.1
Lazy vs. Eager Loading Strategies in JPA 2.1Lazy vs. Eager Loading Strategies in JPA 2.1
Lazy vs. Eager Loading Strategies in JPA 2.1
 
State Machine Workflow: Esoteric Techniques & Patterns Everyone Should Buy pr...
State Machine Workflow: Esoteric Techniques & Patterns Everyone Should Buy pr...State Machine Workflow: Esoteric Techniques & Patterns Everyone Should Buy pr...
State Machine Workflow: Esoteric Techniques & Patterns Everyone Should Buy pr...
 
Practicing Continuous Deployment
Practicing Continuous DeploymentPracticing Continuous Deployment
Practicing Continuous Deployment
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino API
 

Similar to SBJUG - Building Beautiful Batch Jobs

AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...Amazon Web Services
 
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon KinesisAmazon Web Services
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksAppDynamics
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsMatt Kuklinski
 
SCUG.DK - Automation Strategy - April 2015
SCUG.DK - Automation Strategy - April 2015SCUG.DK - Automation Strategy - April 2015
SCUG.DK - Automation Strategy - April 2015Ronni Pedersen
 
StatSever-Samza: Near Real-Time Analytics
StatSever-Samza: Near Real-Time AnalyticsStatSever-Samza: Near Real-Time Analytics
StatSever-Samza: Near Real-Time AnalyticsChang-Ming Tsai
 
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerDevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerR3
 
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
(DVO205) Monitoring Evolution: Flying Blind to Flying by InstrumentAmazon Web Services
 
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...Startupfest
 
Thomas Jensen. Machine Learning
Thomas Jensen. Machine LearningThomas Jensen. Machine Learning
Thomas Jensen. Machine LearningVolha Banadyseva
 
Getting data into microsoft dynamics crm faster
Getting data into microsoft dynamics crm fasterGetting data into microsoft dynamics crm faster
Getting data into microsoft dynamics crm fasterDaniel Cai
 
Converting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000DConverting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000Ddclsocialmedia
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...Amazon Web Services
 
Praxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudPraxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudRoman Weber
 
Angular Ivy- An Overview
Angular Ivy- An OverviewAngular Ivy- An Overview
Angular Ivy- An OverviewJalpesh Vadgama
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Amazon Web Services
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajSri Ambati
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQLDaniel Austin
 
5 Amazing Reasons DBAs Need to Love Extended Events
5 Amazing Reasons DBAs Need to Love Extended Events5 Amazing Reasons DBAs Need to Love Extended Events
5 Amazing Reasons DBAs Need to Love Extended EventsJason Strate
 

Similar to SBJUG - Building Beautiful Batch Jobs (20)

AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
 
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
 
SCUG.DK - Automation Strategy - April 2015
SCUG.DK - Automation Strategy - April 2015SCUG.DK - Automation Strategy - April 2015
SCUG.DK - Automation Strategy - April 2015
 
StatSever-Samza: Near Real-Time Analytics
StatSever-Samza: Near Real-Time AnalyticsStatSever-Samza: Near Real-Time Analytics
StatSever-Samza: Near Real-Time Analytics
 
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerDevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
 
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
 
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
 
Thomas Jensen. Machine Learning
Thomas Jensen. Machine LearningThomas Jensen. Machine Learning
Thomas Jensen. Machine Learning
 
Getting data into microsoft dynamics crm faster
Getting data into microsoft dynamics crm fasterGetting data into microsoft dynamics crm faster
Getting data into microsoft dynamics crm faster
 
Converting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000DConverting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000D
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
 
Praxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudPraxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloud
 
Angular Ivy- An Overview
Angular Ivy- An OverviewAngular Ivy- An Overview
Angular Ivy- An Overview
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQL
 
5 Amazing Reasons DBAs Need to Love Extended Events
5 Amazing Reasons DBAs Need to Love Extended Events5 Amazing Reasons DBAs Need to Love Extended Events
5 Amazing Reasons DBAs Need to Love Extended Events
 
Building a devops CMDB
Building a devops CMDBBuilding a devops CMDB
Building a devops CMDB
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

SBJUG - Building Beautiful Batch Jobs

  • 1. Building Beautiful Batch Jobs ! Who says batch jobs can’t be beautiful code? SouthBay JVM User Group (SBJUG) Meetup - November 2013
  • 2. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  • 3. About me • Software Engineer • Worked on complex integration projects – CSIS, LAPD, UCLA • Worked on one high traffic system – Napster • Currently at Dealer.com • Fascinated by all things Engineering
  • 4. Dealer.com • Leader in Automotive Marketing • 10K+ clients, 12K+ Websites • CRM is our new product offering • It’s definitely a great place to work. I’d recommend it to a friend.
  • 5. Believe it or not – these are actually Dealer.com’s Core Values
  • 6. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  • 7. Background • Lack of frameworks for Java-based batch processing • Proliferation of many one-off, in-house solutions • SpringSource and Accenture changed this • June 2008 – production version of Spring Batch • Spring Batch is the only open source framework that provides a robust, enterprise-scale solution • Batch Application for Java Platform is coming soon (JSR 352)
  • 8. Usage Scenario A typical batch program reads a large number of records from a database, file, or queue, processes the data in some fashion, and then writes back data in a modified form • • • • • • Commit batch process periodically Sequential processing of dependent steps Partial processing: skip records Concurrent batch processing Massively parallel batch processing Manual or scheduled restart after failure
  • 9. Domain Language of a Batch • Job • Step • Item Reader - • Item Processor • Item Writer - • • • • - Job Launcher Job Repository Job Instance Job Execution has one to many steps has item reader, processor or writer an abstraction that represents the retrieval of input for a Step, one item at a time an abstraction that represents the business processing of an item an abstraction that represents the output of a Step, chunk of items at a time launches jobs store metadata about currently running jobs an instance of a job with its unique parameters an execution attempt of a job instance
  • 11. Job, Job Instance, Job Execution
  • 16. Job – Chunk Oriented Processing
  • 17. Item Readers and Writers - Out of the box Item Readers Item Writers AmqpItemReader AmqpItemWriter FlatFileItemReader CompositeItemWriter HibernateCursorItemReader FlatFileItemWriter HibernatePagingItemReader GemfireItemWriter IbatisPagingItemReader HibernateItemWriter ItemReaderAdapter IbatisBatchItemWriter JdbcCursorItemReader ItemWriterAdapter JdbcPagingItemReader JdbcBatchItemWriter JmsItemReader JmsItemWriter JpaPagingItemReader JpaItemWriter ListItemReader MimeMessageItemWriter MongoItemReader MongoItemWriter Neo4jItemReader Neo4jItemWriter RepositoryItemReader RepositoryItemWriter StoredProcedureItemReader PropertyExtractingDelegatingItemWriter StaxEventItemReader StaxEventItemWriter
  • 19. Let’s look at a couple of examples of building simple Spring Batch Jobs Example 1 – Load Flat file contents into database Example 2 – Load XML file contents into database
  • 20. Configure DataSource and Spring Batch Core Beans spring-batch-context.xml :
  • 21. Example1: Load Flat file contents into database PERSON Table person-data.csv Jill,Doe Joe,Doe Justin,Doe Jane,Doe John,Doe PERSON_ID 1 JILL DOE 2 JOE DOE 3 Transform Data to Upper Case FIRST_NAME LAST_NAME JUSTIN DOE 4 JANE DOE 5 JOHN DOE
  • 22. Example1: Job Config flat-file-reader-job.xml Chunk Processing: • Reader – retrieves input for a Step one item at a time • Processor – processes an item • Writer – writes the output, one item or chunk of items at a time
  • 23. Example1: Reader, Processor and Writer flat-file-reader-job.xml (cont..d)
  • 25. Example1: Test Case to Execute Flat File Reader Job
  • 26. Example2: Load XML file contents into database record-data.xml AD_PERFORMANCE Table ID DATE IMPRESSION CLICKS EARNING 1 06/01/2013 139237 57 220.90 2 06/02/2013 339100 57 320.88 3 06/03/2013 431436 57 27.80
  • 28. Example2: Reader, JAXB Unmarshaller, Processor and Writer xml-file-reader-job.xml (cont..d)
  • 31. Example2: Test Case to Execute XML File Reader Job
  • 33.
  • 34. Jobs
  • 37.
  • 38. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  • 39. Business Problem • CRM entering Dealer's day-to-day Operations • We need to Pull data from Dealer’s DMS systems into CRM • DMS Systems can be ADP or Reynolds or DealerTrack etc
  • 40. Here’s a Small Big Picture Dealer’s DMS Systems Dealer.com’s DMS & CRM Systems ADP Extract Reynolds DealerTrack DMS Load CRM
  • 41. Typical Batch Job • Download data from DMS Provider for a dealership • Load the data in CRM • Generate report on how the data was processed
  • 42. ADP Vehicle Sales ETL Job Configuration
  • 43. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 44. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 45. Pull Frequently • We have 100s of Dealerships, so each batch Job has to be run for a Dealer’s ADP Account • We schedule Jobs for each dealership to pull every 4 hours • The Job Scheduling is managed via a centralized DDC Scheduling Server – Clients issue scheduling requests via a command queue to the server – The server will then fire scheduled events back onto a queue for clients to consume – Clients and DDC Scheduling Server communicate through a single rabbit exchange. Each client is chooses an unique application key and binds to this exchange to receive messages about its scheduled events – Named ClockTower: it’s worth a separate talk in itself
  • 46. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 47. Job Concurrency • 100s of scheduled or manually initiated jobs can all go off at the same time • We want to control how many jobs should run in our Cluster concurrently • We used basic queuing to solve this – all job commands go into a queue – they get processed one at a time – we can control how many consumers we want to allow across the cluster • We use Spring Integration AMQP OutBound & InBound Adapters
  • 48. Running Jobs Concurrently – Competing Consumer Pattern DMS Service 01 Job1 Scheduled and Manually Initiated Job Commands come through the same Queue DMS Pull Job Queue Job5 Job4 Job3 DMS Service 02 Job2 • Each Node is configured with multiple concurrent Consumers (3 as of now) • As we take more Tenants we could scale horizontally by adding more Nodes
  • 49. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 50. Data Flow Control ADP Extract DMS Load CRM • We need to control the load we put on the CRM system • We don't want to EVER load too much data at the same time • We debated two ways to solve this – Synchronous – Asynchronous (via Queues)
  • 51. Sync vs Async Loading Data into CRM CRM Batch 01 DMS Service 01 Job1 CRM Batch Service Load Balancer (SYNC) CRM Batch 02 CRM Batch 01 DMS Service 01 (ASYNC) DMS Data Load Queue Job1 CRM Batch 02
  • 52. Synchronous • Haproxy load balancer - cannot be scaled dynamically • Remote call needs to be made via REST or Spring Remoting API - tightly coupled • Client has to fail the batch job or retry the request on failure - not fault tolerant • Nodes need to throttle the number of incoming requests (via tomcat threads) – have to administer tomcat threads, nodes cannot be repurposed Asynchronous • AMQP Rabbit Queue - can be scaled dynamically • Only contract is the 'message' being passed – some what loosely coupled • If a node fails, message will be unacknowledged and another node will execute the same request - fault tolerant • Each node can control the number of concurrent queue consumers – application configuration, nodes can be purposed • It does incur some extra cost, message persistence & dynamic reply queues - extra cost We settled on loading via Queue using Spring Integration AMQP Gateways (which are Bi-Directional), the call waits for response to come back via reply queue
  • 53. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 54. We send out an awesome looking email notification to an internal mailing list
  • 55. The CSV Report has Detailed information how each row was processed
  • 56. We are working towards a UI that’ll look like this
  • 57. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 58. Job Resiliency ADP Extract DMS Load CRM 100s of Jobs could go off at the same time and jobs need to be resilient to unexpected failures • While a big job is running, CRM could crash or get restarted for deployment • While a big job is running, DMS could crash or get restarted for deployment In such cases, we want to rerun the job after a short while from where it left off. • We use Spring Batch’s Job Restart-ability feature to achieve this
  • 59. What could go wrong? DMS Service 01 Job1 CRM Batch 01 X DMS Pull Job Queue X DMS Data Load Queue DMS Service 02 Job2 CRM Batch 02 X X  Nodes that could just crash or could be restarted due to a deployment – when a big job is running. Our goal is to be able to rerun the job, and resume from where left things left off. X
  • 60. Spring Batch – Restartability • Spring Batch maintains Job State in the database – which Step is completed, being processed or failed – Which item is being processed when Chunk processing • Jobs can be restarted using the Job ExecutionId • Spring Batch will skip over the steps and run the job from where it left off before • If the job had failed during Chunk processing it’ll skip processing the items that were already processed and start from where it left off before
  • 61. When CRM goes down DMS Service 01 CRM Batch 01 Job1 DMS Pull Job Queue X DMS Data Load Queue DMS Service 02 Job2 CRM Batch 02 • We have a timeout of 5 minutes for the reply from CRM • When CRM Batch Nodes are down, we’ll get a timeout Exception, which results in a new Job Command Message to the DMS Pull Job Queue • The message includes the JobExecutionId • Which ever node picks up the message will resume the job from where it left off X
  • 62. When a DMS Service Node goes down DMS Service 01 CRM Batch 01 Job1 DMS Pull Job Queue DMS Data Load Queue DMS Service 02 Job2 CRM Batch 02 X • When a DMS Node executing the Job goes down, the message will be unacknowledged, and will be picked up by any other node connected to the DMS Pull Job Queue • The node that picks up the message will inspect if this job was already running and stopped abruptly, and if so it’ll try to resume it from where it left off • (This is not in production yet, its under development)
  • 63. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 64. So, what makes it beautiful? • Simple – We just used the basic features of Spring Batch • Easy to understand – Quick look at spring configurations is all you need • Less code – We focused on the business logic • Low maintenance – Anybody can maintain it
  • 65. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  • 66. On Spring Batch • • • • • • Really easy to setup and user Highly configurable Chunk Processing is the bomb! Beware of the commit count The bean ‘step’ scope comes in handy ExecutionContext is limited to 4 data types
  • 67. On 3rd Party Integration • • • • • • Plan for Dev & Live accounts and environments Configure anything and everything possible Download large files via streaming Handle exceptions properly Embrace data translation errors Build jobs that are repeat runnable
  • 68. Sources • Spring Batch Reference Documentation – http://docs.spring.io/spring-batch/reference/html-single/index.html • Ad Performance Sample XML taken from – http://www.mkyong.com/spring-batch/spring-batch-example-xml-fileto-database/
  • 70. Shameless Plug Currently we have a few openings in the Manhattan Beach office • Java Developers • UI Developers • Web Developers If interested please apply at http://careers.dealer.com/