Mais conteúdo relacionado Semelhante a Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform (20) Three Key Concepts for Understanding JSR-352: Batch Programming for the Java Platform1. Timothy C. Fanelli - Senior IT Specialist
23 September 2013
Three Key Concepts for
Understanding JSR-352:
Batch Applications for the
Java Platform
© 2013 IBM Corporation
2. Important Disclaimers
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.
WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION
CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED.
ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED
ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE
DIFFERENCES.
ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.
IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT
PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.
IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE
OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
- CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR
SUPPLIERS AND/OR LICENSORS
2
© 2013 IBM Corporation
3. About me
§ Timothy C. Fanelli
§ Senior IT Specialist, IBM - Mainframe
Workload Modernization
§ Instructor of Software Engineering Clarkson University, Potsdam NY
§ tim@timfanelli.com
§ tfanelli@us.ibm.com, tfanelli@clarkson.edu
§ Visit the IBM booth #5112 and meet other
IBM developers at JavaOne 2013
3
© 2013 IBM Corporation
4. Agenda
§ Background
§ Three Key Concepts
– Implementation
– Orchestration
– Execution
§ An Example JSR 352 Application
§ Advanced Topics
– Splits and Flows
– Partitioning
– Java EE
§ Conclusion and Thoughts on What’s Next...
4
© 2013 IBM Corporation
6. Batch Processing
§ One of the oldest processing paradigms
§ Historically associated with mainframe computing
§ Still incredibly relevant today, with fresh challenges in an OLTP driven world
6
© 2013 IBM Corporation
7. Java for Batch Processing?
§ Mainframe developers have shied away from Java
– Performance concerns over native languages
– Integration concerns for legacy data
– Disparate developer skill set between System Z and Java
§ Java and JavaEE have dominated the Online Transaction Processing world
§ Time to bridge the two worlds together
– IBM Java for zOS, IBM WebSphere, and Spring Batch paved new paths
– Just-in-Time Compilation, Garbage Collection optimizations proved it out
– Adoption is wide-spread!
§ Only remaining challenge was the lack of a standard
– The need for JSR-352 was obvious
7
© 2013 IBM Corporation
8. JSR 352: Batch Applications for the Java Platform
§ Expert working group formed 29 November 2011
– IBM*, VMWare, RedHat, Oracle, Credit Suisse, Independent participants
– Broad range of talent with deep batch experience
§ Final Release 24 May 2013
§ Included in Java EE 7!
8
© 2013 IBM Corporation
10. Three Key Concepts ...
§ JSR 352 defines
– Implementation: A programming model for
implementing the artifacts
– Orchestration: A Job Specification Language,
which orchestrates the execution of a batch
artifacts within a job.
– Execution: A runtime environment for executing
batch application, according to a defined
lifecycle.
Orchestrate
Implement
Execute
§ Note: “key” concepts, not “new” concepts!
– Roles and abstractions should be familiar to
SOA and JavaEE developers
10
© 2013 IBM Corporation
11. Anatomy of JSR352
§ Those concepts define the anatomy of JSR 352: Batch
Applications for the Java Platform...
Listeners
Contexts
Listeners
Partitioning
Batchlet
Orchestrate
Implement
Job
Operator
Job
Reader
Step
Processor
Chunk
Writer
Execute
Job Repository
Chunk
Chunk
Chunk
Listeners
11
© 2013 IBM Corporation
12. Implementation: The programming model
§ Chunk and Batchlet provide models for implementing
a step.
§ Contexts provide Job- and Step- level runtime
information, and provide interim data persistence.
§ Listeners provide callback hooks to respond to
lifecycle events on batch artifacts.
Listeners
Contexts
Listeners
Partitioning
Batchlet
Reader
Processor
Chunk
§ Partitioning provides a mechanism imposing parallel
processing on jobs and steps
Writer
Chunk
Chunk
Chunk
Listeners
12
© 2013 IBM Corporation
13. Implementation: The programming model
Chunk vs Batchlet
§ Both are implementations of a step within a batch job
§ The chunk model
– Encapsulates a very common pattern: ETL
– Single “reader”, “processor” and “writer”
– Reader/Processor combination is invoked until
an entire “chunk” of data is processed
– Output “chunk” is written atomically
§ Batchlet provides a “roll your own” step type
– Invoked and runs to completion, producing a
return code upon exit.
13
Listeners
Contexts
Listeners
Partitioning
Batchlet
Reader
Processor
Chunk
Writer
Chunk
Chunk
Chunk
Listeners
© 2013 IBM Corporation
14. Orchestration: The Job Specification Language (JSL)
§ The JSL defines a batch job as an XML document
§ Describes a step as an assemblage of batch artifacts
Job
Step
§ Provides for the description of steps, step groupings,
and execution sequencing
14
© 2013 IBM Corporation
15. Execution: The JobOperator and Repository
§ JobOperator is the runtime interface for job
management, including start, stop, restart and job
repository related commands
§ The Job Repository holds information about
completed and executing jobs
Job
Operator
Job Repository
§ To start a batch job, get a JobOperator instance use
it to start a job described (described by JSL).
15
© 2013 IBM Corporation
16. Execution: JobInstance, JobExecution, and StepExecution
§ The state of a job is broken down into various parts,
and persisted in the repository
– Submitting a job creates a JobInstance, a
logical representation of a particular “run” of a
job.
– A JobExecution is a single attempt to run a
JobInstance. A restart attempt creates another
JobExecution
– Similarly, a StepExecution is a single attempt to
run a step within a job. It is created when a step
starts execution.
Job
Operator
16
Job
Step
*
JobInstance
*
JobExecution
*
*
StepExecution
Job Repository
© 2013 IBM Corporation
18. The Application
‣ A typical “batch hello world”:
– Reads strings from an input file
– Performs some validation or transforms
– Writes validated or transformed string to an output file
‣ Key capabilities
– If something goes wrong, we don’t want to discard all the
prior work; and we want to pick up where we left off
– We want control over the transaction scoping so prevent
lock contention in high volume periods
– We want flexibility to “plug and play” where our records
come from
– For unit testing, development testing, and QA testing records may come from a variety of sources
18
© 2013 IBM Corporation
19. The Design
‣ Let’s implement a string-transform in an extract-transform-load pattern
‣ We’ll use JSR352’s Chunk programming model
– Encapsulates the ETL pattern components as Reader, Processor, and Writer interfaces
– Loosely coupled artifacts will be orchestrated into a single-step job later
– “Free” checkpoint/restart capability
– Transaction scoping imposed externally in the job descriptor
‣ Job will be executed as a Java SE command line batch application
19
© 2013 IBM Corporation
20. The Code
Implement
‣ An ItemReader encapsulates the data
access and deserialization of a record.
‣ No restriction on data access paradigm: use
DAO patterns, JDBC, JPA, Hibernate,
Spring Data, etc!
‣ Checkpoint/Restart data provided as
Serializable argument to “open” and from
“checkpointInfo” methods.
20
© 2013 IBM Corporation
21. The Code
Implement
§ An ItemWriter is the output counterpart to
ItemReader
§ Primary difference is that writeItems
accepts a “chunk” of output objects (as a
list) to serialize.
§ Again, no restriction on data access
paradigm!
21
© 2013 IBM Corporation
22. The Code
Implement
§ An ItemProcessor encapsulates the
business logic applied to each record
§ “main” here demonstrates the invocation of
a batch job, using the JobOperator
§ Would typically not be in the processor
implementation
22
© 2013 IBM Corporation
23. The Batch Descriptor and Job Specification
Orchestrate
§ batch.xml defines and names
the batch artifacts in this
application archive
§ sample.xml is an example Job
Specification Language
document for SampleBatchApp
23
© 2013 IBM Corporation
24. The Execution
Execute
§ Package the application as a standard JAR or WAR
for deployment in JavaSE or EE environments
– batch.xml goes in META-INF or WEB-INF/
classes/META-INF
– JSL may go in META-INF/batch-jobs, or
submitted from an external source (up to the
provider!)
24
© 2013 IBM Corporation
27. Job Management - Restart, Stop, Abandon
§ Had something gone wrong, what then?
– The “main” program shown was too simple... only
“started” the job
Execute
§ JobOperator exposes APIs for a variety of job
management tasks: start, stop, abandon, restart
– Would have had to take advantage of these for
advanced job management capabilities.
§ The door is left open for more advanced batch job
management systems to be built!
– Integration into existing enterprise schedulers?
– New Java EE batch scheduling standard?
– Plenty of options, but currently left to the provider
to implement
27
© 2013 IBM Corporation
28. Java EE Integration
§ JSR-352: Java Batch is included in Java EE 7
Execute
§ Provides EE clustering, security, resource
management, etc to Java Batch applications
§ Performance benefits to dispatching into longrunning, reusable container
– JIT compilation through the first couple runs
– Eliminates overhead of starting / stopping JVM
28
© 2013 IBM Corporation
29. Parallel Job Processing
§ Splits and Flows provide a mechanism for executing
job steps concurrently at the orchestration layer
Orchestrate
§ A flow is a sequence of one or more steps which
execute sequentially, but as a single unit.
§ A Split is a collection of flows that may execute
concurrently
– A split may only contain “flows”; a step is not
implicitly a flow
§ This is done entirely in the JSL descriptor
– Imposed on the batch application with no code
changes!
29
© 2013 IBM Corporation
30. Parallel Job Processing
§ Step-level parallelism can be achieved programmatically using
step partitioning
Implement
§ A partitioned step runs as multiple instances with distinct
property sets
§ PartitionMapper defines the number of partitions, and property
values for each partition
– Can be a fixed set of partitions in JSL
– Can be dynamic using a PartitionMapper implementation
30
© 2013 IBM Corporation
33. Batch Processing
§ The oldest “new thing” in Java
§ JSR 352 applies the modern thinking and abstraction of Java EE and SOA and applies it to
sequential batch processing
§ The standardized programming model provides application developers vendor portability
§ Inclusion in Java EE 7 ensures wide spread availability
33
© 2013 IBM Corporation