[펌]Spring Batch Tutorial
This tutorial is about Spring batch, which is part of the Spring framework. Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. It also provides more advanced technical services and features that will enable extremely high-volume and high performance batch jobs through optimization and partitioning techniques.
Here, you can find a clear explanation about its main components and concepts and several working examples. This tutorial is not about the Spring framework in general; it is expected that you are familiar with mechanisms like Inversion of Control and Dependency Injection, that are the main pillars of the Spring framework. It is also assumed that you know how to configure the Spring framework context for basic applications and that you are used to work with both annotations and configuration files based Spring projects.
If this is not the case, I would really recommend to go to the Spring framework official page and learn the basic tutorials before starting to learn what is Spring batch and how it works. Here is a very good one: http://docs.spring.io/docs/Spring-MVC-step-by-step/.
At the end of this tutorial, you can find a compressed file with all the examples listed and some extras.
The software used in the elaboration of this tutorial is listed below:
- Java update 8 Version 3.1
- Apache Maven 3.2.5
- Eclipse Luna 4.4.1
- Spring Batch 3.0.3 and all its dependencies (I really recommend to use Maven or Gradle to resolve all the required dependencies and avoid headaches)
- Spring Boot 1.2.2 and all its dependencies (I really recommend to use Maven or Gradle to resolve all the required dependencies and avoid headaches)
- MySQL Community Server version 5.6.22
- MongoDB 2.6.8
- HSQLDB version 1.8.0.10
This tutorial will not explain how to use Maven although it is used for solving dependencies, compiling and executing the examples provided. More information can be found in the following article http://examples.javacodegeeks.com/enterprise-java/maven/log4j-maven-example/.
The module Spring boot is also heavily used in the examples, for more information about it please refer to the official Spring Boot documentation: http://projects.spring.io/spring-boot/.
Table Of Contents
1. Intro
Spring Batch is an open source framework for batch processing. It is built as a module within the Spring framework and depends on this framework (among others). Before continuing with Spring Batch we are going to put here the definition of batch processing:
“Batch processing is the execution of a series of programs (“jobs”) on a computer without manual intervention” (From the Wikipedia).
So, for our matter, a batch application executes a series of jobs (iterative or in parallel), where input data is read, processed and written without any interaction. We are going to see how Spring Batch can help us with this purpose.
Spring Batch provides mechanisms for processing large amount of data like transaction management, job processing, resource management, logging, tracing, conversion of data, interfaces, etc. These functionalities are available out of the box and can be reused by applications containing the Spring Batch framework. By using these diverse techniques, the framework takes care of the performance and the scalability while processing the records.
Normally a batch application can be divided in three main parts:
- Reading the data (from a database, file system, etc.)
- Processing the data (filtering, grouping, calculating, validating…)
- Writing the data (to a database, reporting, distributing…)
Spring Batch contains features and abstractions (as we will explain in this article) for automating these basic steps and allowing the application programmers to configure them, repeat them, retry them, stop them, executing them as a single element or grouped (transaction management), etc.
It also contains classes and interfaces for the main data formats, industry standards and providers like XML, CSV, SQL, Mongo DB, etc.
In the next chapters of this tutorial we are going to explain and provide examples of all these steps and the difference possibilities that Spring Batch offers.
2. Concepts
Here are the most important concepts in the Spring Batch framework:
Jobs
Jobs are abstractions to represent batch processes, that is, sequences of actions or commands that have to be executed within the batch application.
Spring batch contains the following interface to represent Jobs: http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/Job.html. Simple Jobs contain a list of steps and these are executed sequentially or in parallel.
In order to configure a Job it is enough to initialize the list of steps, this is an example of an xml based configuration for a dummy Job:
Job launcher
This interface http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/launch/JobLauncher.htmlrepresents a Job Launcher. Implementations of its method take care of starting job executions for the given jobs and job parameters.
Job instance
This is an abstraction representing a single run for a given Job. It is unique and identifiable. The class representing this abstraction is http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/JobInstance.html.
Job instances can be restarted in case they were not completed successfully and if the Job is restart able. Otherwise an error will be raised.
Steps
Steps are mainly the parts that compose a Job (and a Job instance). A
is a part of a and contains all the necessary information to execute the batch processing actions that are expected to be done at that phase of the job. Steps in Spring Batch are composed of , and and can be very simple or extremely complicated depending on the complexity of their members.Steps also contain configuration options for their processing strategy, commit interval, transaction mechanism or job repositories that may be used. Spring Batch uses normally chunk processing, that is reading all data at one time and processing and writing “chunks” of this data on a preconfigured interval, called commit interval.
Here is a very basic example of a xml based step configuration using an interval of 10:
And the following snippet is the annotation based version defining the readers, writers and processors involved, a chunk processing strategy and a commit interval of 10 (this is the one that we are using in the majority of examples in this tutorial):
Job Repositories
Job repositories are abstractions responsible of the storing and updating of metadata information related to Job instance executions and Job contexts. The basic interface that has to be implemented in order to configure a Job Repository ishttp://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/repository/JobRepository.html.
Spring stores as metadata information about their executions, the results obtained, their instances, the parameters used for the Jobs executed and the context where the processing runs. The table names are very intuitive and similar to their domain classes counterparts, in this link there is an image with a very good summary of these tables: http://docs.spring.io/spring-batch/reference/html/images/meta-data-erd.png.
For more information about the Spring Batch metadata schema, please visit http://docs.spring.io/spring-batch/reference/html/metaDataSchema.html.
Item Readers
Readers are abstractions responsible of the data retrieval. They provide batch processing applications with the needed input data. We will see in this tutorial how to create custom readers and we will see how to use some of the most important Spring Batch predefined ones. Here is a list of some readers provided by Spring Batch:
- AmqpItemReader
- AggregateItemReader
- FlatFileItemReader
- HibernateCursorItemReader
- HibernatePagingItemReader
- IbatisPagingItemReader
- ItemReaderAdapter
- JdbcCursorItemReader
- JdbcPagingItemReader
- JmsItemReader
- JpaPagingItemReader
- ListItemReader
- MongoItemReader
- Neo4jItemReader
- RepositoryItemReader
- StoredProcedureItemReader
- StaxEventItemReader
We can see that Spring Batch already provides readers for many of the formatting standards and database industry providers. It is recommended to use the abstractions provided by Spring Batch in your applications rather than creating your own ones.
Item Writers
Writers are abstractions responsible of writing the data to the desired output database or system. The same that we explained for Readers is applicable to Writers: Spring Batch already provides classes and interfaces to deal with many of the most used databases, these should be used. Here is a list of some of these provided writers:
- AbstractItemStreamItemWriter
- AmqpItemWriter
- CompositeItemWriter
- FlatFileItemWriter
- GemfireItemWriter
- HibernateItemWriter
- IbatisBatchItemWriter
- ItemWriterAdapter
- JdbcBatchItemWriter
- JmsItemWriter
- JpaItemWriter
- MimeMessageItemWriter
- MongoItemWriter
- Neo4jItemWriter
- StaxEventItemWriter
- RepositoryItemWriter
In this article we will show how to create custom writers and how to use some of the listed ones.
Item Processors
Processors are in charge of modifying the data records converting it from the input format to the output desired one. The main interfaces used for Item Processors configuration is http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/ItemProcessor.html.
In this article we will see how to create our custom item processors.
The following picture (from the Spring batch documentation) gives a very good summary of all these concepts and how the basic Spring Batch architecture is designed:
3. Use Cases
Although it is difficult to categorize the use cases where batch processing can be applied in the real world, I am going to try to list in this chapter the most important ones:
- Conversion Applications: These are applications that convert input records into the required structure or format. These applications can be used in all the phases of the batch processing (reading, processing and writing).
- Filtering or validation applications: These are programs with the goal of filtering valid records for further processing. Normally validation happens in the first phases of the batch processing.
- Database extractors: These are applications that read data from a database or input files and write the desired filtered data to an output file or to other database. There are also applications that updates large amounts of data in the same database where the input records come from. As a real life example we can think of a system that analyzes log files with different end user behaviors and, using this data, produces reports with statistics about most active users, most active periods of time, etc.
- Reporting: These are applications that read large amounts of data from a database or input files, process this data and produce formatted documents based on that data that are suitable for printing or sending via other systems. Accounting and Legal Banking systems can be part of this category: at the end of the business day, these systems read information from the databases, extract the data required and write this data into legal documents that may be sent to different authorities.
Spring Batch provides mechanisms to support all these scenarios, with the elements and components listed in the previous chapter programmers can implement batch applications for conversion of data, filtering records, validation, extracting information from databases or input files and reporting.
4. Controlling flow
Before starting talking about specific Jobs and Steps I am going to show how a Spring Batch configuration class looks like. The next snippet contains a configuration class with all the components needed for batch processing using Spring Batch. It contains readers, writers, processors, job flows, steps and all other needed beans.
During this tutorial we will show how to modify this configuration class in order to use different abstractions for our different purposes. The class bellow is pasted without comments and specific code, for the working class example please go to the download section in this tutorial where you can download all the sources:
In order to launch our spring context and execute the configured batch shown before we are going to use Spring Boot. Here is an example of a program that takes care of launching our application and initializing the Spring context with the proper configuration. This program is used with all the examples shown in this tutorial:
I am using Maven to resolve all the dependencies and launching the application using Spring boot. Here is the used
:And the goal used is:
Now we are going to go through the configuration file shown above step by step. First of all we are going to explain how
and are executed and what rules they follow.In the example application pasted above we can see how a Job and a first step are configured. Here we extract the related piece of code:
We can observe how a Job with the name “job1” is configured using just one step; in this case an step called “step1”. The class JobBuilderFactory creates a job builder and initializes the job repository. The method of the class JobBuildercreates an instance of the class JobFlowBuilder using the step1 method shown. This way the whole context is initialized and the Job “job1” is executed.
The step processes (using the processor) in chunks of 10 units the
records provided by the reader and writes them using the past writer. All dependencies are injected in runtime, Spring takes care of that since the class where all this happens is marked as a configuration class using the annotation .5. Custom Writers, Readers and Processors
As we already mentioned in this tutorial, Spring Batch applications consist basically of three steps: reading data, processing data and writing data. We also explained that in order to support these 3 operations Spring Batch provides 3 abstractions in form of interfaces:
Programmers should implement these interfaces in order to read, process and write data in their batch application jobs and steps. In this chapter we are going to explain how to create custom implementations for these abstractions.
Custom Reader
The abstraction provided by Spring Batch for reading records of data is the interface
. It only has one method ( ) and it is supposed to be executed several times; it does not need to be thread safe, this fact is very important to know by applications using these methods.The method
of the interface has to be implemented. This method expects no input parameters, is supposed to read one record of the data from the desired queue and returns it. This method is not supposed to do any transformation or data processing. If null is returned, no further data has to be read or analyzed.The custom reader above reads the next element in the internal list of
, this is only possible if the iterator is initialized or injected when the custom reader is created, if the iterator is instantiated every time the method is called, the job using this reader will never end and cause problems.Custom Processor
The interface provided by Spring Batch for data processing expects one input item and produces one output item. The type of both of them can be different but does not have to be different. Producing null means that the item is not required for further processing any more in case of concatenation.
In order to implement this interface, it is only necessary to implement the
method. Here is a dummy example:The class above may not be useful in any real life scenario but shows how to override the
interface and do whatever actions (in this case reversing the input pojo members) are needed in the process method.Custom Writer
In order to create a custom writer programmers need to implement the interface
. This interface only contains one method that expects an input item and returns . The write method can do whatever actions are wanted: writing in the database, writing in a csv file, sending an email, creating a formatted document etc. The implementations of this interface are in charge of flushing the data and leave structures in a safe state.Here is an example of a custom writer where the input item is written in the standard console:
Also not very useful in real life, only for learning purposes.
It is also important to mention that for almost all real life scenarios Spring Batch already provides specific abstractions that cope with most of the problems. For example Spring Batch contains classes to read data from MySQL databases, or to write data to a HSQLDB database, or to convert data from XML to CSV using JAXB; and many others. The code is clean, fully tested, standard and adopted by the industry, so I can just recommend to use them.
These classes can also be overridden in our applications in order to fulfil our wishes without the need of re implement the whole logic. Implementing the provided classes by Spring may be also useful for testing, debugging, logging or reporting purposes. So before discovering the wheel again and again, it would be worth to check the Spring Batch documentation and tutorials because probably we will find a better and cleaner way to solve our specific problems.
6. Flat file example
Using the example above, we are going to modify the readers and writers in order to be able to read from a csv file and write into a flat file as well. The following snippet shows how we should configure the reader in order to provide a reader that extracts the data from a flat file, csv in this case. For this purpose Spring already provides the class FlatFileItemReaderthat needs a resource property where the data should be coming from and a line mapper to be able to parse the data contained in that resource. The code is quite intuitive:
The following piece of code shows the modifications that are needed in the writer. In this case we are going to use a writer of the class FlatFileItemWriter that needs an output file to write to and an extractor mechanism. The extractor can be configured as shown in the snippet:
7. MySQL example
In this chapter we are going to see how to modify our writer and our data source in order to write processed records to a local MySQL DB.
If we want to read data from a MySQL DB we first need to modify the configuration of the data source bean with the needed connection parameters:
Here is how the writer can be modified using an SQL statement and a
that gets initialized with the data source shown above:It is good to mention here that there are problem with the required Jettison library:
http://stackoverflow.com/questions/28627206/spring-batch-exception-cannot-construct-java-util-mapentry.
8. In Memory DB (HSQLDB) example
As third example we are going to show how to create readers and writers in order to use an in memory database, this is very useful for testing scenarios. By default, if nothing else is specified, Spring Batch choose HSQLDB as data source.
The data source to be used is in this case the same one as for a MySQL DB but with different parameters (containing the HSQLDB configuration):
The writer does not differ (almost) from the MySQL one:
If we want that Spring takes care of the initialization of the DB to be used we can create an script with the name schema-all.sql (for all providers, schema-hsqldb.sql for Hsqldb, schema-mysql.sql for MySQL, etc.) in the resources project of our project:
This script is also provided in the download section at the end of the tutorial.
9. Unit testing
In this chapter we are going to see briefly how to test Batch applications using the Spring Batch testing capabilities. This chapter does not explain how to test Java applications in general or Spring based ones in particular. It only covers how to test from end to end Spring Batch applications, only Jobs or Steps testing is covered; that is why unit testing of single elements like item processors, readers or writers is excluded, since this does not differ from normal unit testing.
The Spring Batch Test Project contains abstractions that facilitate the unit testing of batch applications.
Two annotations are basic when running unit tests (using Junit in this case) in Spring:
- @RunWith(SpringJUnit4ClassRunner.class): Junit annotation to execute all methods marked as tests. With the class passed as parameter we are indicating that this class can use all spring testing capabilities.
- @ContextConfiguration(locations = {. . .}): we will not use the “locations” property because we are not using xml configuration files but configuration classes directly.
Instances of the class http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/test/JobLauncherTestUtils.html can be used for launching jobs and single steps inside the unit test methods (among many other functionalities. Its method executes a Job and its method executes an step from end to end. In the following example you can see how to use these methods in real jUnit tests:
You can assert or validate the tests checking the status of the Job execution for complete Jobs unit tests or asserting the results of the writer for single steps tests. In the example shown we do not use any xml configuration file, instead we use the already mentioned configuration class. In order to indicate the unit test to load this configuration, the annotation
with the properties “classes” and “loader” is used:More information about Spring Batch unit testing can be found in the following tutorial: http://docs.spring.io/spring-batch/trunk/reference/html/testing.html.
10. Error handling and retrying Jobs
Spring provides mechanisms for retrying Jobs but since the release 2.2.0 is not any more part of the Spring Batch framework but included in the Spring Retry: http://docs.spring.io/spring-retry/docs/api/current/. A very good tutorial can be found here: http://docs.spring.io/spring-batch/trunk/reference/html/retry.html.
Retry policies, callbacks and recovery mechanism are part of the framework.
11. Parallel Processing
Spring Batch supports parallel processing in two possible variations (single process and multi process) that we can separate into the following categories. In this chapter we are just going to list these categories and explain briefly how Spring Batch provides solutions to them:
- Multi-threaded Step (single process): Programmers can implement their readers and writers in a thread safe way, so multi threading can be used and the step processing can be executed in different threats. Spring batch provides out of the box several and implementations. In their description is stated normally if they are thread safe or not. In case this information is not provided or the implementations clearly state that they are not thread safe, programmers can always synchronize the call to the method. This way, several records can be processed in parallel.
- Parallel Steps (single process): If an application modules can be executed in parallel because their logic do not collapse, these different modules can be executed in different steps in a parallel way. This is different to the scenario explained in the last point where each step execution process different records in parallel; here, different steps run in parallel.
Spring Batch supports this scenario with the element .Here is an example configuration that may help to understand it better:
- Remote Chunking of Step (single process): In this mode, steps are separated in different processes, these are communicated with each other using some middleware system (for example JMX). Basically there is a master component running locally and several multiple remote processes, called slaves. The master component is a normal Spring Batch Step, its writer knows how to send chunks of items as messages using the middleware mentioned before. The slaves are implementations of item writers and item processors with the ability to process the messages. The master component should not be a bottleneck, the standard way to implement this pattern is to leave the expensive parts in the processors and writers and light parts in the readers.
- Partitioning a Step (single or multi process): Spring Batch offers the possibility to partition Steps and execute them remotely. The remote instances are Steps.
These are the main options that Spring Batch offers to programmers to allow them to process their batch applications somehow in parallel. But parallelism in general and specifically parallelism in batch processing is a very deep and complicate topic that is out of the scope of this document.
12. Repeating jobs
Spring Batch offers the possibility to repeat Jobs and Tasks in a programmatic and configurable way. In other words, it is possible to configure our batch applications to repeat Jobs or Steps until specific conditions are met (or until specific conditions are not yet met). Several abstractions are available for this purpose:
- Repeat Operations: The interface RepeatOperations is the basis for all the repeat mechanism in Spring Batch. It contains a method to be implemented where a callback is passed. This callback is executed in each iteration. It looks like the following:
The RepeatCallback interface contains the functional logic that has to be repeated in the Batch:
The
returned in their and respectively should be in case the Batch should continue iterating or in case the Batch processing should be terminated.Spring already provides some basic implementations for the
interface. - Repeat Templates: The class RepeatTemplate is a very useful implementation of the interface that can be used as starting point in our batch applications. It contains basic functionalities and default behavior for error handling and finalization mechanisms. Applications that do not want this default behavior should implement their custom Completion Policies.
Here is an example of how to use a repeat template with a fixed chunk termination policy and a dummy iterate method:In this case the batch will terminate after 10 iterations since the iterate() method returns always
and leaves the responsibility of the termination to the completion policy. - Repeat Status: Spring contains an enumeration with the possible continuation status:
Indicating that the processing should continue or it is finished can be successful or unsuccessful).
http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/repeat/RepeatStatus.html
- Repeat Context: It is possible to store transient data in the Repeat Context, this context is passed as parameter to the Repeat Callback RepeatContext for this purpose.
After the method is called, the context no longer exists. The repeat context have a parent context in case iterations are nested, in these cases, it is possible to use the parent context in order to store information that can be shared between different iterations, like counters or decision variables. method. Spring Batch provides the abstraction - Repeat Policy: Repeat template termination mechanism is determined by a CompletionPolicy. This policy is also in charge of creating a and pass it to the callback in every iteration. Once an iteration is completed, the template calls the completion policy and updates its state, which will be stored in the repeat context. After that, the template asks the policy to check if the processing is complete.
Spring contains several implementations for this interface, one of the most simple ones is the SimpleCompletionPolicy; which offers the possibility to execute the Batch just a fixed number of iterations.
13. JSR 352 Batch Applications for the Java Platform
Since Java 7, batch processing is included in the Java Platform. The JSR 352 (Batch applications for the Java Platform) specifies a model for batch applications and a runtime for scheduling and executing jobs. At the moment of writing this tutorial, the Spring Batch implementation (3.0) implements completely the specification of the JSR-352.
The domain model and the vocabulary used is pretty similar to the one used in Spring Batch.
JSR 352: Batch Applications for the Java Platform: , , , , , , etc. are present in the Java Platform JSR 352 model as well. The differences are minor between both frameworks and configuration files looks almost the same.
This is a good thing for both programmers and the industry; since the industry profits from the fact that a standard has been created in the Java Platform, using as basis a very good library like Spring Batch, which is widely used and well tested. Programmers benefit because in case Spring Batch is discontinued or cannot be used for any reason in their applications (compatibility, company policies, size restrictions…) they can choose the Java standard implementation for Batch processing without much changes in their systems.
For more information about how Spring Batch has been adapter to the JSR 352, please visit the linkhttp://docs.spring.io/spring-batch/reference/html/jsr-352.html.
14. Summary
So that’s it. I hope you have enjoyed it and you are able now to configure and implement batch applications using Spring Batch. I am going to summarize here the most important points explained in this article:
- Spring Batch is a batch processing framework built upon the Spring Framework.
- Mainly (simplifying!) it is composed of <code<Jobs, containing , where , and and configured and concatenated to execute the desired actions.
- Spring Batch contains mechanism that allow programmers to work with the main providers like MySQL, Mongo DB and formats like SQL, CSV or XML out of the box.
- Spring Batch contains features for error handling, repeating and and retrying and .
- It also offers possibilities for parallel processing.
- It contains classes and interfaces for batch applications unit testing.
In this tutorial I used no xml file (apart from some examples) for configuring the spring context, everything was done via annotations. I did it this way for clarity reasons but I do not recommend to do this in real life applications since xml configuration files may be useful in specific scenarios. As I said, this was a tutorial about Spring Batch and not about Spring in general.
15. Resources
The following links contain a lot of information and theoretical examples where you can learn all the features of the Spring Batch module:
- http://docs.spring.io/spring-batch/reference/html/index.html
- https://jcp.org/en/jsr/detail?id=352
- https://spring.io/guides/gs/batch-processing/
- https://kb.iu.edu/d/afrx
16. Download
http://www.javacodegeeks.com/2015/03/spring-batch-tutorial.html?utm_content=bufferfdac5&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer