Denodo includes different parameters for specifying how many rows are fetched in one transfer when accessing or inserting data. These parameters allow to configure:
- How many rows are returned in one transfer from the Denodo Virtual DataPort Server to the applications consuming the data (northbound). These can be any external application accessing the Denodo Server such as JDBC/ODBC/Ado.Net/Web Services programs but also the Denodo clients such as the VDP Administration Tool or the Denodo Scheduler for example.
- How many rows are returned in one transfer from the data sources to the Virtual DataPort Server (southbound).
- How many rows are inserted/updated at the same time in the caching or the data movement processes.
This document describes the different parameters with examples of use in each case.
Northbound / Southbound
The amount of data (or, more specifically, the number of rows) read by one call to the Denodo Virtual DataPort Server (or from VDP to the sources) is configurable using the parameters Fetch Size (southbound) / Chunk Size (northbound). Transferring data efficiently is important because network communication time affects performance more than any other factor. For example, to read 10 rows from a Denodo VDP view or data source if each row requires a separate round trip to the source, it takes 10 times longer to access the data than if the 10 rows are read in one network round trip. In practice, rather than being critical for performance, for most applications, adjusting these access batch size parameters is more like fine-tuning performance.
If consumer applications regularly access more than the default fetch size, then the number of network trips can be reduced by increasing the fetch size. This can make a big difference, depending on how much data is retrieved. If an application regularly gets 33 rows and the fetch size is 32, an extra network call is needed for the 33rd row. If an application gets 10,000 rows, then a 10-row fetch size requires 1,000 network trips. A 32-row fetch size reduces that amount by a third, but still requires 313 network trips. A fetch size of 512 requires just 20 network trips. Depending on how the data is processed in the application, this change could alleviate a significant bottleneck. The tradeoff to increasing the fetch size is increased memory use. All fetched data has to be held in the client layer (e.g. JDBC, ODBC, VDP Admin Tool…) and in the server, and this memory can add up excessively if a large default batch size that applies to every request is used. Another consideration to take into account when increasing the fetch size is that the client application will take longer to obtain the first results. In relational databases this may not be important but Denodo can access multiple and diverse sources such as web pages. In that case we may have a fetch size of 200 but the web page may be slow returning the data. That means that the time that the client application will be waiting until the first 200 results are obtained will be longer than if we had configured a smaller fetch size.
The fetched data is held in the ResultSet object generated by executing a query. If the fetch size is 10, then accessing the first 10 records simply iterates internally through the ResultSet data held on the client. The 11th access causes a call to the Virtual DataPort Server (or the data source) for another 10 records, and so on for each group of 10 records.
The Chunk Size/Fetch Size parameters can be set in several ways, depending on how widely the changes should apply:
- It can be set at the client application level, e.g. for JDBC/ODBC/Ado.NET applications
- JDBC, in the connection url
- In an application that uses this driver, this parameter can be added to the connection URL as above and/or before executing the query, invoking the method setFetchSize of the class Statement. The value set with the setFetchSize method overrides the value set in the URL.
- ODBC, in the DSN dialog it is possible to set the Chunk Size. The Chunk Size of the DSN is equivalent to the Fetch Size of the JDBC connections.
- VDP Administration Tool ( via the Admin Tool Preferences option of the Tools menu)
- Denodo Scheduler on the VDP Data Source Details page:
- At the data source level, JDBC/ODBC data sources can be configured with a fetch size. This setting gives the JDBC/ODBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed. This parameter is however, just a hint and some drivers ignore it, so the fetch size will depend on the underlying driver supporting this functionality. This parameter can be changed in the “Read & Write” tab of the data source configuration:
There is another parameter that can be configured in combination with the chunk size. This parameter is the Chunk Timeout and it establishes the maximum time (in milliseconds) the Server waits before returning a new block of data. When this time is exceeded, the Server sends the current block to the client application, even if it does not contain the number of results specified in the Chunk Size parameter. It is possible to set the Chunk Timeout to 0 so the chunk size behavior will be the one explained above.
Note: If Chunk Size and Chunk Timeout are both 0, the Server returns all the results in a single block. If both values are different from 0, the Server returns a chunk for whichever one of these conditions is satisfied first:
- The chunk is filled (Chunk Size)
- Or, after a certain time of not sending any chunk to the client (Chunk Timeout)
The Chunk Timeout can be changed using the same methods as the Chunk Size explained above but using Chunk Timeout as parameter name instead.
Batch inserts simply means sending multiple update statements in one transaction and one call to the database. Denodo supports this capability when caching data and also when using the Data Movement functionality.
Note that for some data sources the use of batch inserts will require some additional configuration, see, for instance, MySQL configuration for batch inserts.
The parameter “Batch insert size” determines the number of rows per batch. The default value is the Batch size defined for the database in the “Cache configuration” dialog of the database (Administration – Database Management dialog). If the database does not define the Batch size, the cache module uses the value defined for the Server (Administration – Server Configuration – Cache - Read & Write menu).
This value can be changed for a particular view too (View Options tab):
Increasing this value may improve the performance of the cache insertion but there should be a balance between the number of rows and the amount of memory required by the server for that batch insert size. If the batch insert size is too big and the server requires a lot of memory to perform the insert a performance degradation may occur.
The parameter “Batch insert size” can be configured for the “Data Movement” functionality when defining the data source configuration in the “Read & Write” tab.
This number will be used to define the number of INSERT requests per batch during a Data Movement process. When performing Data Movement the Server is inserting data obtained from one data source into another. If the value of the target data source’s configuration property “Supports batch inserts” is “yes”, Virtual DataPort inserts the rows into the database in batches.
More information about Data Movement can be found in the Virtual DataPort Administration Guide. This property does not affect INSERT statements sent to this data source because they are not executed in batches.