Introduction
This article explores the integration of Spring Batch and MyBatis within a data processing pipeline that interacts with Denodo. The goal is to demonstrate how to build a sample batch job that reads data from Denodo, processes and stores the result in a target system (which will still be Denodo, for demonstration purposes).
To achieve this, we combine two main components with Denodo:
- Spring Batch: A lightweight framework designed for processing large volumes of data. It handles job execution, step management, chunk-based processing, and error handling.
- MyBatis: A persistence framework that maps SQL queries to Java objects. It provides precise control over SQL, which is essential for interacting with Denodo and handling complex VQL queries.
Spring Batch coordinates the job execution and chunk processing logic. It delegates data access responsibilities to MyBatis, which executes VQL queries against Denodo. The data is then mapped into Java objects (entities) during read and write operations.
Throughout the article, we will build a simple batch job, walk through its configuration, and test it with simple VQL queries to demonstrate the end-to-end integration.
Prerequisites
This article focuses on the necessary steps to build a working example where Spring Batch interacts with Virtual DataPort using MyBatis. Basic knowledge of Spring Batch jobs, dependency injection, and annotations is assumed.
For more information, refer to the Spring Batch Reference Documentation.
Initial Setup
We have created the basic structure of a Maven project that was initially generated using Spring Initializr.
An export of the project is linked at the bottom of this document (see the Maven Project section).
SSL/TLS certificates
In case your Virtual DataPort SSL certificate is not signed by a CA, you will need to add the certificate to the Truststore used by your IDE or set the following JVM options to reference an existing Truststore:
-java -Djavax.net.ssl.trustStore="path/to/your/cacerts" -Djavax.net.ssl.trustStorePassword="changeit" |
Dependencies
Denodo JDBC Driver
The Denodo JDBC driver is currently not distributed through Maven Central. It can be added to the local Maven repository using the following command:
mvn install:install-file ^ -Dfile="/path/to/your/denodo-vdp-jdbcdriver.jar" ^ -DgroupId=com.denodo ^ -DartifactId=denodo-vdp-jdbcdriver ^ -Dversion=9.2.0 ^ -Dpackaging=jar |
This command allows you to insert the driver as a Maven dependency in the project’s pom.xml file.
Pom file
The following dependencies must be added to your pom.xml file inside the <dependencies></dependencies> section:
<!-- Spring --> <groupId>org.springframework.batch</groupId> <artifactId>spring-batch-core</artifactId> </dependency> <!-- MyBatis --> <dependency> <groupId>org.mybatis.spring.boot</groupId> <artifactId>mybatis-spring-boot-starter</artifactId> <version>3.0.4</version> </dependency> <!-- JDBC drivers --> <dependency> <groupId>com.denodo</groupId> <artifactId>denodo-vdp-jdbcdriver</artifactId> <version>9.2.0</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.33</version> </dependency> |
The MySQL driver is used to store Spring Batch metadata, as we will introduce further on. Any other relational database can be used.
Entity classes and MyBatis Mappers
Entity classes serve as a data model representing a single row of a relational database table. It consists of fields corresponding to the table columns and provides accessor methods (constructor, get, set) to allow data binding during read and write operations.
One entity class is required for every read/write operation defined by each query in a Mapper. For instance, the following entity class encapsulates a table (view) with simple numeric and string fields.
public class MyReadEntityPOJO { private int id; private String string_col_1; private BigDecimal numeric_col_1; … // Getters and Setters public int getId() { return id; } public void setId(int id) { this.id = id; } … } |
The entity class includes a constructor that will be used to create a copy of the read entity in the processing method.
public class MyWriteEntityPOJO { private int id; private String string_col_1; private BigDecimal total; … public MyWriteEntityPOJO(MyReadEntityPOJO readEntity, BigDecimal total) { this.id = readEntity.getId(); this.string_col_1 = readEntity.getString_col_1(); this.total = total; … } // Getters and Setters public int getId() { return id; } public void setId(int id) { this.id = id; } … } |
MyBatis XML Mapper
An XML Mapper in MyBatis defines SQL queries (VQL in our context) and their mappings to Java methods.
The query tags, named as standard SQL operations (insert, select, ..) provide information of three main things:
- databaseId, the database identifier, which communicates to MyBatis that the query is written for Denodo.
- Id: the method from the mapper interface that represents the query.
- resultType: defines the entity class that is in charge of dealing with the result of the query.
For instance, the following are two VQL queries: a SELECT with a filter and an INSERT, operating on Denodo Virtual DataPort Base Views:
<select databaseId="denodo" id="selectWithFilter" resultType="com.denodo.springbatchkb.entity.MyReadEntityPOJO"> SELECT id, string_col_1, numeric_col_1, numeric_col_2, timestamp_column FROM bv_springbatch_read WHERE timestamp_column > #{timestampColumn} ORDER BY id </select> <insert databaseId="denodo" id="insertIntoWrite" INSERT INTO bv_springbatch_write (id, string_col_1, numeric_col_1, numeric_col_2, total, VALUES (#{id}, #{string_col_1}, #{numeric_col_1}, </insert> |
Each of the #{parameter_name} placeholders corresponds to an attribute in their corresponding entity class. They will be dynamically replaced with the respective value during execution.
MyBatis Mapper Interface
MyBatis Mapper interfaces define the available methods for executing SQL operations on data, automatically mapping query results to entity classes. They are identified by the @Mapper annotation. For instance:
@Mapper @Primary public interface BvSpringBatchMapper { List<MyReadEntityPOJO> selectAllFromRead(); List<MyReadEntityPOJO> selectWithFilter(Timestamp timestampColumn); void insertIntoWrite(MyWriteEntityPOJO writeItem); … } |
In addition to entity classes, the Mapper interface methods also include additional parameters defined in their corresponding queries.
Data Sources
Declaration
The DataSources should be declared in an application.yaml or application.properties file. In addition to the DataSources (in our scenario: Denodo) required for read and write operations, the Spring Batch framework requires storing metadata persistently in a database.
While configuring the metadata database to align with a read/write database is a common approach, this is not directly applicable to Denodo due to the lack of native dialect support within Spring Batch. For this reason, we are providing a dedicated Data Source to store Spring Batch metadata.
In our example, we have configured a MySQL data source to store metadata. The data source is placed in our YAML file at spring.batch.datasource. The second data source of our example is Denodo, which we configured with the required JDBC parameters:
spring: batch: datasource: jdbcUrl: jdbc:mysql://localhost:3306/test_schema username: denodo password: denodo driver-class-name: com.mysql.cj.jdbc.Driver hikari: maximumPoolSize: 2 initialize-schema: never datasource: isolationLevel: ISOLATION_READ_COMMITTED jdbcUrl: jdbc:denodo://<vdp_host>:9999/<virtual_database_name> username: admin password: admin driver-class-name: com.denodo.vdp.jdbc.Driver type: com.zaxxer.hikari.HikariDataSource hikari: maximumPoolSize: 3 minimumIdle: 2 idleTimeout: 30000 |
Further information:
- The hikari: section allows the definition of a connection pool, managed by Spring Batch.
- Spring Batch can create the required metadata tables; this feature is called auto-configuration. However, we are providing a custom JobRepository bean to explicitly assign the two DataSources to their correct usage (Write/Read data or persist Metadata), which disables the autoconfiguration but allows more control. For this reason, we declared the initialize-schema: never parameters, which tells Spring Batch that we are manually configuring the metadata DataSource, and we are also taking care of initializing the metadata tables.
Annotation
Upon defining the two data sources, we have to create two beans, one for each data source, that allow MyBatis to select among them depending on the annotations. The two beans are associated with the data sources in our configuration files using the @ConfigurationProperties(prefix = <yaml_file_prefix>) annotation.
For our metadata persistency data source:
@Bean(name = "batchDataSource") @Primary @ConfigurationProperties(prefix = "spring.batch.datasource") public DataSource batchDataSource() { return DataSourceBuilder.create().build(); } |
For our Denodo data source:
@Bean(name = "dataSource") @ConfigurationProperties(prefix = "spring.datasource") public DataSource dataSource() { return DataSourceBuilder.create().build(); } |
Metadata tables initialization
As already introduced, we are explicitly assigning data sources to the right components using a custom JobRepository. This disables the auto-configuration and requires us to manually initialize the metadata tables using SQL scripts that are provided by the framework.
Depending on your metadata database type, you can find different init scripts by exploring the content of your org.springframework.batch.core (from the project dependencies).
Alternatively, the same scripts are also available at the following GitHub link: spring-projects / spring-batch / core.
Configuration classes
The @Configuration annotation marks a class as a source of Spring bean definitions.
A method annotated with @Bean inside a class marked with @Configuration is automatically executed during the initialization of the application context, before the application starts, ensuring that all required beans are created and properly wired.
In our example, a class called AnnotateDataSources contains the Data Source annotations.
MyBatis configuration
The MyBatis configuration class has different purposes:
- Create a Bean that provides a SqlSession using a SqlSessionFactory.
- Initialize the SqlSession with a datasource and some configuration parameters.
- Scan MyBatis mappers.
SqlSession init
A dedicated SqlSession is required for every read/write operation. A single Bean is in charge of returning the SqlSession using multiple SqlSessionFactory:
@Bean @Primary public SqlSession persistenceSqlSession(SqlSessionFactory sqlSessionFactory) throws Exception { return new SqlSessionTemplate(sqlSessionFactory, ExecutorType.BATCH); } |
When this bean is called, an annotation is used to specify if the SqlSession must be built with a readerSqlSessionFactory or a writerSqlSessionFactory. The two SqlSessionFactories are configured as follows:
- A DataSource is assigned. In our demonstration, this will be Denodo for both read and write SqlSessionFactory.
- A mapper location, which is where the MyBatis XML mappers are located.
- A configuration file, which only sets some parameters required to work with Denodo.
Below is an example of our readerSqlSessionFactory, which is similar to the writerSqlSessionFactory with the exception that it is initialized with a different configuration file:
@Bean public SqlSessionFactory readerSqlSessionFactory() throws Exception { SqlSessionFactoryBean sqlSessionFactory = new SqlSessionFactoryBean(); sqlSessionFactory.setDataSource(this.dataSource); this.setMyBatisReaderConfigLocation(sqlSessionFactory); this.setMyBatisMapperLocation(sqlSessionFactory); sqlSessionFactory.setDatabaseIdProvider(denodoDatabaseIdProvider); sqlSessionFactory.afterPropertiesSet(); return sqlSessionFactory.getObject(); } |
The MyBatis mapper scanner is configured as follows:
protected void setMyBatisMapperLocation(SqlSessionFactoryBean sqlSessionFactory) throws IOException { PathMatchingResourcePatternResolver resolver = new PathMatchingResourcePatternResolver(); Resource[] configLocation = resolver.getResources("classpath:mapper/*.xml"); sqlSessionFactory.setMapperLocations(configLocation); } |
The MyBatis reader config file is configured as follows:
protected void setMyBatisReaderConfigLocation(SqlSessionFactoryBean sqlSessionFactory) throws Exception { PathMatchingResourcePatternResolver resolver = new PathMatchingResourcePatternResolver(); Resource configLocation = resolver .getResource("classpath:mybatis_config/mybatis-config-batch.xml"); sqlSessionFactory.setConfigLocation(configLocation); } |
In our demonstration, we are providing a dedicated MyBatis configuration file for every SqlSessionFactory (read and write).
DatabaseIdProvider
MyBatis leverages a DatabaseIdProvider to select the correct queries from the Mapper depending on the Data Source’s database type. Denodo is not among the database names known by MyBatis. For this reason, we have to override DatabaseIdProvider to make sure a proper mapper is selected.
The purpose of the overridden method is to return a database identifier, which is a simple string (“denodo”), when the database name is “Virtual DataPort”. The database name is retrieved from the Data Source properties, combined with the information from the JDBC driver.
@Configuration public class DenodoDatabaseIdProviderConfig { @Bean public DatabaseIdProvider databaseIdProvider(DataSource dataSource) { return new DenodoDatabaseIdProvider(dataSource); } static class DenodoDatabaseIdProvider implements DatabaseIdProvider { … @Override public String getDatabaseId(DataSource dataSource) {
DatabaseMetaData metaData = String databaseProductName = if ("Virtual DataPort".equals(databaseProductName)) { } … } } } |
BatchConfiguration
The batch configuration class is the core of the Spring Batch implementation. In this configuration class, the ItemStreamReader and ItemWriter are instantiated using a SqlSessionFactory, a query and additional parameters, depending on the query design.
Additionally:
- The @EnableBatchProcessing annotation tells Spring Batch to automatically configure the essential infrastructure components required to run batch jobs.
- The @ComponentScan annotation tells Spring Batch where additional components are located. These components are the Processor and a customized ItemStreamReader, and they will be autoregistered as beans.
@Configuration @EnableBatchProcessing @ComponentScan("com.denodo.springbatchkb.component") public class BatchConfig { … } |
A JobRepository is a Spring Batch component that handles the persistence of jobs and steps execution metadata. We already have a dedicated Data Source, but we are also providing a customized implementation of the JobRepository bean to set the correct Data Source, which is identified by the batchDataSource annotation.
@Bean public JobRepository jobRepository(@Qualifier("batchDataSource") DataSource batchDataSource) throws Exception { JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean(); factory.setDataSource(batchDataSource); factory.setTransactionManager(new factory.setDatabaseType("MYSQL"); factory.afterPropertiesSet(); return factory.getObject(); } |
The core of the Spring Batch & MyBatis integration are ItemWriter and ItemStreamReader objects. These Spring Batch classes are composed of a query specified in the mapper, the sqlSessionFactory from MyBatis configuration, and input parameters. Finally, they are parametrized with entity classes to handle data during read, process and write operations.
This is an example of an ItemStreamReader with a parameter as a filter:
@Bean @StepScope @Qualifier("filteredItemStreamReader") public ItemStreamReader<MyReadEntityPOJO> filteredItemStreamReader() { Map<String, Object> queryParams = new HashMap<>(); queryParams.put("timestampColumn", Timestamp.valueOf("2025-03-24 00:00:00")); DenodoMyBatisCursorItemReader<MyReadEntityPOJO> reader = reader.setSqlSessionFactory(this.readSqlSessionFactory); reader.setParameterValues(queryParams); reader.setName("DenodoFilteredStreamReader"); reader.setQueryId("selectWithFilter"); return reader; } |
Additionally, a simple ItemWriter with no parameters:
@Bean @StepScope @Qualifier("itemWriter") public ItemWriter<MyWriteEntityPOJO> itemWriter() { MyBatisBatchItemWriterBuilder<MyWriteEntityPOJO> writer = writer.sqlSessionFactory(this.writeSqlSessionFactory); writer.assertUpdates(false); writer.statementId("insertIntoWrite"); return writer.build(); } |
The setQueryId (streamReader) and statementId (itemWriter) set the query name from the MyBatis Mapper interface. Ultimately, setParameterValues of the streamReader sets the parameters that are requested by the query from the Mapper.
Reader, writer and processor implementation
In this section, we will explore how the read, process, and write operations are implemented. The ItemWriter requires no special customization, as Spring Batch already provides an efficient batch writing mechanism for data sources. On the other hand, we implement a streaming reader to minimize memory usage during execution.
Ultimately, we will provide a simple processor implementation that we are using to connect read and write operations.
MyBatisCustomCursorItemReader
Spring Batch does not natively support MyBatis cursor-based streaming for read operations. For this reason, we have to implement a custom MyBatisCursorItemReader by extending an AbstractItemCountingItemStreamItemReader and providing an implementation of its abstract methods.
The full implementation is included in the attached project files. We are just posting a sample of the code:
public class DenodoMyBatisCursorItemReader<T> extends AbstractItemCountingItemStreamItemReader<T> implements InitializingBean { private String queryId; private Map<String, Object> parameterValues; … @Override protected void doOpen() throws Exception { Map<String, Object> parameters = new HashMap(); if (this.parameterValues != null) { parameters.putAll(this.parameterValues); } this.sqlSession = this.sqlSessionFactory.openSession(); this.cursor = this.sqlSession.selectCursor(this.queryId, parameters); this.cursorIterator = this.cursor.iterator(); } …
protected T doRead() throws Exception { System.out.println("read@DenodoMyBatisCursorItemReader"); T next = null; if (this.cursorIterator.hasNext()) { next = this.cursorIterator.next(); } return next; } |
The main tasks of the MyBatisCursorItemReader are to open a cursor on the SqlSessionFactory, passing filtering parameters (if required), stream the result, and close the connection.
Processor
Below is a sample of a simple processor that takes a read entity, processes the content of a record using read entity attributes, and returns a new instance of the write entity:
@Component public class SumProcessor implements ItemProcessor<MyReadEntityPOJO, MyWriteEntityPOJO> { … @Override public MyWriteEntityPOJO process(MyReadEntityPOJO item) { // sample transformation BigDecimal sum = BigDecimal.valueOf(-1); if(item.getNumeric_col_1() != null sum = item.getNumeric_col_1().add(item.getNumeric_col_2()); MyWriteEntityPOJO out = new MyWriteEntityPOJO(item, sum); return out; } … } |
Execution
Assembling a Job
All the reader, writer and processor beans are now ready to be used. To build a runnable Job, we can now autowire these components in a single place. In a normal project, we would have multiple implementations of these beans, each one associated with different queries or processor implementations. The Job is built with a JobBuilder by selecting among the available components declared in our BatchConfig class and the custom processors (@processor) we declared.
@Autowired @Qualifier("filteredItemStreamReader") ItemStreamReader<MyReadEntityPOJO> filteredItemStreamReader; @Autowired @Qualifier("itemWriter") ItemWriter<MyWriteEntityPOJO> itemWriter; @Autowired SumProcessor sumProcessor; |
Since these class attributes are declared as @Autowired, Spring Batch will inject the required Beans into these variables at runtime. The @Qualifier(“<qualifier_name>”) helps the framework choose the right Beans.
Below is the job implementation using a simple StepBuilder. A reader, processor and writer are passed to the StepBuilder, alongside general configuration parameters.
@Bean public Job myJob(JobRepository jobRepository, @Qualifier("stepFilter") Step myStep) { return new org.springframework.batch.core.job.builder. .start(myStep) .build(); } @Bean @Qualifier("stepFilter") public Step myStepFilter(JobRepository jobRepository, PlatformTransactionManager transactionManager) { return new StepBuilder("myStep", jobRepository) .<MyReadEntityPOJO, MyWriteEntityPOJO>chunk(10) .reader(filteredItemStreamReader) .processor(sumProcessor) .writer(itemWriter) .allowStartIfComplete(true) … .build(); } |
By default, a Job is not restartable after it has already been completed. Furthermore, the status is stored persistently, and we would not be able to execute it even through different runs of our application. For this reason, the Job is configured with allowStartIfComplete=true, which makes the Job’s steps restartable.
Ultimately, the Job itself is a Bean that will be injected into our Spring Batch application when it is started.
Spring Boot Application
To complete our sample implementation, we just have to set up our application and let Spring Batch inject the Job we have just implemented. The following class contains a main method that calls a CommandLineRunner that executes the Job.
@SpringBootApplication public class MyApplication { public class MyApplication { @Autowired private JobLauncher jobLauncher; @Autowired private Job job; public static void main(String[] args) { SpringApplication.run(MyApplication.class, args); } @Bean public CommandLineRunner runJob() { return args -> { jobLauncher.run(job, }; } } |
Testing
To start the project linked below, simply configure your IDE to run the MyApplication class, which contains the public static void main method.
The project includes the following test jobs that can be executed:
- Simple read, process, and write job. Calculate the sum of two fields.
- Read with a filter, process, and write the result.
- Read with an analytical query, write the result without any further processing.
Different jobs can be configured to run by editing the JobConfig configuration class.
Maven Project
References
MyBatis 3 | Introduction – mybatis
Spring Batch - Reference Documentation
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.
