You can translate the document:

Introduction

This article explores the integration of Spring Batch and MyBatis within a data processing pipeline that interacts with Denodo. The goal is to demonstrate how to build a sample batch job that reads data from Denodo, processes and stores the result in a target system (which will still be Denodo, for demonstration purposes).

To achieve this, we combine two main components with Denodo:

  • Spring Batch: A lightweight framework designed for processing large volumes of data. It handles job execution, step management, chunk-based processing, and error handling.
  • MyBatis: A persistence framework that maps SQL queries to Java objects. It provides precise control over SQL, which is essential for interacting with Denodo and handling complex VQL queries.

Spring Batch coordinates the job execution and chunk processing logic. It delegates data access responsibilities to MyBatis, which executes VQL queries against Denodo. The data is then mapped into Java objects (entities) during read and write operations.

Throughout the article, we will build a simple batch job, walk through its configuration, and test it with simple VQL queries to demonstrate the end-to-end integration.

Prerequisites

This article focuses on the necessary steps to build a working example where Spring Batch interacts with Virtual DataPort using MyBatis. Basic knowledge of Spring Batch jobs, dependency injection, and annotations is assumed.

For more information, refer to the Spring Batch Reference Documentation.

Initial Setup

We have created the basic structure of a Maven project that was initially generated using Spring Initializr.

An export of the project is linked at the bottom of this document (see the Maven Project section).

SSL/TLS certificates

In case your Virtual DataPort SSL certificate is not signed by a CA, you will need to add the  certificate to the Truststore used by your IDE or set the following JVM options to reference an existing Truststore:

-java -Djavax.net.ssl.trustStore="path/to/your/cacerts" -Djavax.net.ssl.trustStorePassword="changeit"

Dependencies

Denodo JDBC Driver

The Denodo JDBC driver is currently not distributed through Maven Central. It can be added to the local Maven repository using the following command:

mvn install:install-file ^

  -Dfile="/path/to/your/denodo-vdp-jdbcdriver.jar" ^

  -DgroupId=com.denodo ^

  -DartifactId=denodo-vdp-jdbcdriver ^

  -Dversion=9.2.0 ^

  -Dpackaging=jar

This command allows you to insert the driver as a Maven dependency in the project’s pom.xml file.

Pom file

The following dependencies must be added to your pom.xml file inside the <dependencies></dependencies> section:

<!-- Spring -->
<dependency>

   <groupId>org.springframework.batch</groupId>

   <artifactId>spring-batch-core</artifactId>

</dependency>

<!-- MyBatis -->

<dependency>

   <groupId>org.mybatis.spring.boot</groupId>

   <artifactId>mybatis-spring-boot-starter</artifactId>

   <version>3.0.4</version>

</dependency>

<!-- JDBC drivers -->

<dependency>

    <groupId>com.denodo</groupId>

    <artifactId>denodo-vdp-jdbcdriver</artifactId>

    <version>9.2.0</version>

</dependency>

<dependency>

    <groupId>mysql</groupId>

    <artifactId>mysql-connector-java</artifactId>

    <version>8.0.33</version>

</dependency>

The MySQL driver is used to store Spring Batch metadata, as we will introduce further on. Any other relational database can be used.

Entity classes and MyBatis Mappers

Entity classes serve as a data model representing a single row of a relational database table. It consists of fields corresponding to the table columns and provides accessor methods (constructor, get, set) to allow data binding during read and write operations.

One entity class is required for every read/write operation defined by each query in a Mapper. For instance, the following entity class encapsulates a table (view) with simple numeric and string fields.

public class MyReadEntityPOJO {

   private int id;

   private String string_col_1;

   private BigDecimal numeric_col_1;

   …

   // Getters and Setters

   public int getId() {

       return id;

   }

   public void setId(int id) {

       this.id = id;

   }

   …

  }

The entity class includes a constructor that will be used to create a copy of the read entity in the processing method.

public class MyWriteEntityPOJO {

   private int id;

   private String string_col_1;

   private BigDecimal total;

   …

   public MyWriteEntityPOJO(MyReadEntityPOJO readEntity, BigDecimal total) {

        this.id = readEntity.getId();

        this.string_col_1 = readEntity.getString_col_1();

        this.total = total;

                 

    }

   // Getters and Setters

   public int getId() {

       return id;

   }

   public void setId(int id) {

       this.id = id;

   }

   …

}

MyBatis XML Mapper

An XML Mapper in MyBatis defines SQL queries (VQL in our context) and their mappings to Java methods.

The query tags, named as standard SQL operations (insert, select, ..) provide information of three main things:

  • databaseId, the database identifier, which communicates to MyBatis that the query is written for Denodo.
  • Id: the method from the mapper interface that represents the query.
  • resultType: defines the entity class that is in charge of dealing with the result of the query.

For instance, the following are two VQL queries: a SELECT with a filter and an INSERT, operating on Denodo Virtual DataPort Base Views:

<select databaseId="denodo" id="selectWithFilter" resultType="com.denodo.springbatchkb.entity.MyReadEntityPOJO">

   SELECT id, string_col_1, numeric_col_1, numeric_col_2, timestamp_column

   FROM bv_springbatch_read

   WHERE timestamp_column > #{timestampColumn}

   ORDER BY id

</select>

<insert databaseId="denodo" id="insertIntoWrite"    
   parameterType="com.denodo.springbatchkb.entity.MyWriteEntityPOJO">

    INSERT INTO bv_springbatch_write

        (id, string_col_1, numeric_col_1, numeric_col_2, total,
       timestamp_column)

    VALUES (#{id}, #{string_col_1}, #{numeric_col_1},
       #{numeric_col_2}, #{total}, #{timestamp_column})

</insert>

Each of the #{parameter_name} placeholders corresponds to an attribute in their corresponding entity class. They will be dynamically replaced with the respective value during execution.

MyBatis Mapper Interface

MyBatis Mapper interfaces define the available methods for executing SQL operations on data, automatically mapping query results to entity classes. They are identified by the @Mapper annotation. For instance:

@Mapper

@Primary

public interface BvSpringBatchMapper {

    List<MyReadEntityPOJO> selectAllFromRead();

    List<MyReadEntityPOJO> selectWithFilter(Timestamp timestampColumn);

    void insertIntoWrite(MyWriteEntityPOJO writeItem);

    …

}

In addition to entity classes, the Mapper interface methods also include additional parameters defined in their corresponding queries.

Data Sources

Declaration

The DataSources should be declared in an application.yaml or application.properties file. In addition to the DataSources (in our scenario: Denodo) required for read and write operations, the Spring Batch framework requires storing metadata persistently in a database.

While configuring the metadata database to align with a read/write database is a common approach, this is not directly applicable to Denodo due to the lack of native dialect support within Spring Batch. For this reason, we are providing a dedicated Data Source to store Spring Batch metadata.

In our example, we have configured a MySQL data source to store metadata. The data source is placed in our YAML file at spring.batch.datasource. The second data source of our example is Denodo, which we configured with the required JDBC parameters:

spring:

 batch:

   datasource:

     jdbcUrl: jdbc:mysql://localhost:3306/test_schema

     username: denodo

     password: denodo

     driver-class-name: com.mysql.cj.jdbc.Driver

     hikari:

       maximumPoolSize: 2

   initialize-schema: never

 datasource:

   isolationLevel: ISOLATION_READ_COMMITTED

   jdbcUrl: jdbc:denodo://<vdp_host>:9999/<virtual_database_name>

   username: admin

   password: admin

   driver-class-name: com.denodo.vdp.jdbc.Driver

   type: com.zaxxer.hikari.HikariDataSource

   hikari:

     maximumPoolSize: 3

     minimumIdle: 2

     idleTimeout: 30000

Further information:

  • The hikari: section allows the definition of a connection pool, managed by Spring Batch.
  • Spring Batch can create the required metadata tables; this feature is called auto-configuration. However, we are providing a custom JobRepository bean to explicitly assign the two DataSources to their correct usage (Write/Read data or persist Metadata), which disables the autoconfiguration but allows more control. For this reason, we declared the initialize-schema: never parameters, which tells Spring Batch that we are manually configuring the metadata DataSource, and we are also taking care of initializing the metadata tables.

Annotation

Upon defining the two data sources, we have to create two beans, one for each data source, that allow MyBatis to select among them depending on the annotations. The two beans are associated with the data sources in our configuration files using the @ConfigurationProperties(prefix = <yaml_file_prefix>) annotation.

For our metadata persistency data source:

@Bean(name = "batchDataSource")

@Primary

@ConfigurationProperties(prefix = "spring.batch.datasource")

public DataSource batchDataSource() {

   return DataSourceBuilder.create().build();

}

For our Denodo data source:

@Bean(name = "dataSource")

@ConfigurationProperties(prefix = "spring.datasource")

public DataSource dataSource() {

   return DataSourceBuilder.create().build();

}

Metadata tables initialization

As already introduced, we are explicitly assigning data sources to the right components using a custom JobRepository. This disables the auto-configuration and requires us to manually initialize the metadata tables using SQL scripts that are provided by the framework.

Depending on your metadata database type, you can find different init scripts by exploring the content of your org.springframework.batch.core (from the project dependencies).

Alternatively, the same scripts are also available at the following GitHub link: spring-projects / spring-batch / core.

Configuration classes

The @Configuration annotation marks a class as a source of Spring bean definitions.

A method annotated with @Bean inside a class marked with @Configuration is automatically executed during the initialization of the application context, before the application starts, ensuring that all required beans are created and properly wired.

In our example, a class called AnnotateDataSources contains the Data Source annotations.

MyBatis configuration

The MyBatis configuration class has different purposes:

  • Create a Bean that provides a SqlSession using a SqlSessionFactory.
  • Initialize the SqlSession with a datasource and some configuration parameters.
  • Scan MyBatis mappers.

SqlSession init

A dedicated SqlSession is required for every read/write operation. A single Bean is in charge of returning the SqlSession using multiple SqlSessionFactory:

@Bean

@Primary

public SqlSession persistenceSqlSession(SqlSessionFactory sqlSessionFactory) throws Exception {

   return new SqlSessionTemplate(sqlSessionFactory, ExecutorType.BATCH);

}

When this bean is called, an annotation is used to specify if the SqlSession must be built with a readerSqlSessionFactory or a writerSqlSessionFactory. The two SqlSessionFactories are configured as follows:

  • A DataSource is assigned. In our demonstration, this will be Denodo for both read and write SqlSessionFactory.
  • A mapper location, which is where the MyBatis XML mappers are located.
  • A configuration file, which only sets some parameters required to work with Denodo.

Below is an example of our readerSqlSessionFactory, which is similar to the writerSqlSessionFactory with the exception that it is initialized with a different configuration file:

@Bean

public SqlSessionFactory readerSqlSessionFactory() throws Exception {

   SqlSessionFactoryBean sqlSessionFactory = new SqlSessionFactoryBean();

   sqlSessionFactory.setDataSource(this.dataSource);

   this.setMyBatisReaderConfigLocation(sqlSessionFactory);

   this.setMyBatisMapperLocation(sqlSessionFactory);

   sqlSessionFactory.setDatabaseIdProvider(denodoDatabaseIdProvider);

   sqlSessionFactory.afterPropertiesSet();

   return sqlSessionFactory.getObject();

}

The MyBatis mapper scanner is configured as follows:

   protected void setMyBatisMapperLocation(SqlSessionFactoryBean sqlSessionFactory) throws IOException {

        PathMatchingResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();

        Resource[] configLocation = resolver.getResources("classpath:mapper/*.xml");

        sqlSessionFactory.setMapperLocations(configLocation);

    }

The MyBatis reader config file is configured as follows:

   protected void setMyBatisReaderConfigLocation(SqlSessionFactoryBean sqlSessionFactory) throws Exception {

        PathMatchingResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();

        Resource configLocation = resolver

                .getResource("classpath:mybatis_config/mybatis-config-batch.xml");

        sqlSessionFactory.setConfigLocation(configLocation);

    }

In our demonstration, we are providing a dedicated MyBatis configuration file for every SqlSessionFactory (read and write).

DatabaseIdProvider

MyBatis leverages a DatabaseIdProvider to select the correct queries from the Mapper depending on the Data Source’s database type. Denodo is not among the database names known by MyBatis. For this reason, we have to override DatabaseIdProvider to make sure a proper mapper is selected.

The purpose of the overridden method is to return a database identifier, which is a simple string (“denodo”), when the database name is “Virtual DataPort”. The database name is retrieved from the Data Source properties, combined with the information from the JDBC driver.

@Configuration

public class DenodoDatabaseIdProviderConfig {

    @Bean

    public DatabaseIdProvider databaseIdProvider(DataSource dataSource) {

        return new DenodoDatabaseIdProvider(dataSource);

    }

    static class DenodoDatabaseIdProvider implements DatabaseIdProvider {

        …

        @Override

        public String getDatabaseId(DataSource dataSource) {

           

            DatabaseMetaData metaData =
               dataSource.getConnection().getMetaData();

            String databaseProductName =
               metaData.getDatabaseProductName();

            if ("Virtual DataPort".equals(databaseProductName)) {
               return "denodo";

            }

                       

        }               

    }

}

BatchConfiguration

The batch configuration class is the core of the Spring Batch implementation. In this configuration class, the ItemStreamReader and ItemWriter are instantiated using a SqlSessionFactory, a query and additional parameters, depending on the query design.

Additionally:

  • The @EnableBatchProcessing annotation tells Spring Batch to automatically configure the essential infrastructure components required to run batch jobs.
  • The @ComponentScan annotation tells Spring Batch where additional components are located. These components are the Processor and a customized ItemStreamReader, and they will be autoregistered as beans.

@Configuration

@EnableBatchProcessing

@ComponentScan("com.denodo.springbatchkb.component")

public class BatchConfig {

   …

}

A JobRepository is a Spring Batch component that handles the persistence of jobs and steps execution metadata. We already have a dedicated Data Source, but we are also providing a customized implementation of the JobRepository bean to set the correct Data Source, which is identified by the batchDataSource annotation.

@Bean

public JobRepository jobRepository(@Qualifier("batchDataSource") DataSource batchDataSource) throws Exception {

    JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();

    factory.setDataSource(batchDataSource);

    factory.setTransactionManager(new            
       DataSourceTransactionManager(batchDataSource));

    factory.setDatabaseType("MYSQL");

    factory.afterPropertiesSet();

    return factory.getObject();

}

The core of the Spring Batch & MyBatis integration are ItemWriter and ItemStreamReader objects. These Spring Batch classes are composed of a query specified in the mapper, the sqlSessionFactory from MyBatis configuration, and input parameters. Finally, they are parametrized with entity classes to handle data during read, process and write operations.

This is an example of an ItemStreamReader with a parameter as a filter:

@Bean

@StepScope

@Qualifier("filteredItemStreamReader")

public ItemStreamReader<MyReadEntityPOJO> filteredItemStreamReader() {

    Map<String, Object> queryParams = new HashMap<>();

    queryParams.put("timestampColumn", Timestamp.valueOf("2025-03-24 00:00:00"));

    DenodoMyBatisCursorItemReader<MyReadEntityPOJO> reader =
       new DenodoMyBatisCursorItemReader();

    reader.setSqlSessionFactory(this.readSqlSessionFactory);

    reader.setParameterValues(queryParams);

    reader.setName("DenodoFilteredStreamReader");

    reader.setQueryId("selectWithFilter");

    return reader;

}

Additionally, a simple ItemWriter with no parameters:

@Bean

@StepScope

@Qualifier("itemWriter")

public ItemWriter<MyWriteEntityPOJO> itemWriter() {

    MyBatisBatchItemWriterBuilder<MyWriteEntityPOJO> writer =
       new MyBatisBatchItemWriterBuilder();

    writer.sqlSessionFactory(this.writeSqlSessionFactory);

    writer.assertUpdates(false);

    writer.statementId("insertIntoWrite");

    return writer.build();

}

The setQueryId (streamReader) and statementId (itemWriter) set the query name from the MyBatis Mapper interface. Ultimately, setParameterValues of the streamReader sets the parameters that are requested by the query from the Mapper.

Reader, writer and processor implementation

In this section, we will explore how the read, process, and write operations are implemented. The ItemWriter requires no special customization, as Spring Batch already provides an efficient batch writing mechanism for data sources. On the other hand, we implement a streaming reader to minimize memory usage during execution.

Ultimately, we will provide a simple processor implementation that we are using to connect read and write operations.

MyBatisCustomCursorItemReader

Spring Batch does not natively support MyBatis cursor-based streaming for read operations. For this reason, we have to implement a custom MyBatisCursorItemReader by extending an AbstractItemCountingItemStreamItemReader and providing an implementation of its abstract methods.

The full implementation is included in the attached project files. We are just posting a sample of the code:

public class DenodoMyBatisCursorItemReader<T> extends     AbstractItemCountingItemStreamItemReader<T> implements InitializingBean {

    private String queryId;

    private Map<String, Object> parameterValues;

    …

    @Override

    protected void doOpen() throws Exception {

        Map<String, Object> parameters = new HashMap();

        if (this.parameterValues != null) {

            parameters.putAll(this.parameterValues);

        }

        this.sqlSession = this.sqlSessionFactory.openSession();

        this.cursor = this.sqlSession.selectCursor(this.queryId, parameters);

        this.cursorIterator = this.cursor.iterator();

    }

    …


   @Override

    protected T doRead() throws Exception {

        System.out.println("read@DenodoMyBatisCursorItemReader");

        T next = null;

        if (this.cursorIterator.hasNext()) {

            next = this.cursorIterator.next();

        }

        return next;

    }
}

The main tasks of the MyBatisCursorItemReader are to open a cursor on the SqlSessionFactory, passing filtering parameters (if required), stream the result, and close the connection.

Processor

Below is a sample of a simple processor that takes a read entity, processes the content of a record using read entity attributes, and returns a new instance of the write entity:

@Component

public class SumProcessor implements ItemProcessor<MyReadEntityPOJO, MyWriteEntityPOJO> {

       

    @Override

    public MyWriteEntityPOJO process(MyReadEntityPOJO item) {

        // sample transformation

        BigDecimal sum = BigDecimal.valueOf(-1);

        if(item.getNumeric_col_1() != null
           && item.getNumeric_col_2() != null)

            sum = item.getNumeric_col_1().add(item.getNumeric_col_2());

        MyWriteEntityPOJO out = new MyWriteEntityPOJO(item, sum);

        return out;

    }

    …

}

Execution

Assembling a Job

All the reader, writer and processor beans are now ready to be used. To build a runnable Job, we can now autowire these components in a single place. In a normal project, we would have multiple implementations of these beans, each one associated with different queries or processor implementations. The Job is built with a JobBuilder by selecting among the available components declared in our BatchConfig class and the custom processors (@processor) we declared.

@Autowired

@Qualifier("filteredItemStreamReader")

ItemStreamReader<MyReadEntityPOJO> filteredItemStreamReader;

@Autowired

@Qualifier("itemWriter")

ItemWriter<MyWriteEntityPOJO> itemWriter;

@Autowired

SumProcessor sumProcessor;

Since these class attributes are declared as @Autowired, Spring Batch will inject the required Beans into these variables at runtime. The @Qualifier(“<qualifier_name>”) helps the framework choose the right Beans.

Below is the job implementation using a simple StepBuilder. A reader, processor and writer are passed to the StepBuilder, alongside general configuration parameters.

@Bean

public Job myJob(JobRepository jobRepository, @Qualifier("stepFilter") Step myStep) {

    return new org.springframework.batch.core.job.builder.
       JobBuilder("myStep", jobRepository)

        .start(myStep)

        .build();

}

@Bean

@Qualifier("stepFilter")

public Step myStepFilter(JobRepository jobRepository, PlatformTransactionManager transactionManager) {

    return new StepBuilder("myStep", jobRepository)

        .<MyReadEntityPOJO, MyWriteEntityPOJO>chunk(10)

        .reader(filteredItemStreamReader)

        .processor(sumProcessor)

        .writer(itemWriter)

        .allowStartIfComplete(true)

        …

        .build();

}

By default, a Job is not restartable after it has already been completed. Furthermore, the status is stored persistently, and we would not be able to execute it even through different runs of our application. For this reason, the Job is configured with allowStartIfComplete=true, which makes the Job’s steps restartable.

Ultimately, the Job itself is a Bean that will be injected into our Spring Batch application when it is started.

Spring Boot Application

To complete our sample implementation, we just have to set up our application and let Spring Batch inject the Job we have just implemented. The following class contains a main method that calls a CommandLineRunner that executes the Job.

@SpringBootApplication

public class MyApplication {

public class MyApplication {

    @Autowired

    private JobLauncher jobLauncher;

    @Autowired

    private Job job;

    public static void main(String[] args) {

         SpringApplication.run(MyApplication.class, args);

    }

    @Bean

    public CommandLineRunner runJob() {

        return args -> {

            jobLauncher.run(job,
               new org.springframework.batch.core.JobParameters());

        };

    }

}

Testing

To start the project linked below, simply configure your IDE to run the MyApplication class, which contains the public static void main method.

The project includes the following test jobs that can be executed:

  • Simple read, process, and write job. Calculate the sum of two fields.
  • Read with a filter, process, and write the result.
  • Read with an analytical query, write the result without any further processing.

Different jobs can be configured to run by editing the JobConfig configuration class.

Maven Project

Spring Batch Samples

References

MyBatis 3 | Introduction – mybatis

Spring Batch - Reference Documentation

Spring Initializr

Disclaimer
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.
Recommendation

Questions

Ask a question

You must sign in to ask a question. If you do not have an account, you can register here