Denodo includes different parameters for specifying how many rows are fetched in one transfer when accessing or inserting data. These parameters allow to configure:
This document describes the different parameters with examples of use in each case.
The amount of data (or, more specifically, the number of rows) read by one call to the Denodo Virtual DataPort Server (or from VDP to the sources) is configurable using the parameters Fetch Size (southbound) / Chunk Size (northbound). Transferring data efficiently is important because network communication time affects performance more than any other factor. For example, to read 10 rows from a Denodo VDP view or data source if each row requires a separate round trip to the source, it takes 10 times longer to access the data than if the 10 rows are read in one network round trip. In practice, rather than being critical for performance, for most applications, adjusting these access batch size parameters is more like fine-tuning performance.
If consumer applications regularly access more than the default fetch size, then the number of network trips can be reduced by increasing the fetch size. This can make a big difference, depending on how much data is retrieved. If an application regularly gets 33 rows and the fetch size is 32, an extra network call is needed for the 33rd row. If an application gets 10,000 rows, then a 10-row fetch size requires 1,000 network trips. A 32-row fetch size reduces that amount by a third, but still requires 313 network trips. A fetch size of 512 requires just 20 network trips. Depending on how the data is processed in the application, this change could alleviate a significant bottleneck. The tradeoff to increasing the fetch size is increased memory use. All fetched data has to be held in the client layer (e.g. JDBC, ODBC, VDP Admin Tool…) and in the server, and this memory can add up excessively if a large default batch size that applies to every request is used. Another consideration to take into account when increasing the fetch size is that the client application will take longer to obtain the first results. In relational databases this may not be important but Denodo can access multiple and diverse sources such as web pages. In that case we may have a fetch size of 200 but the web page may be slow returning the data. That means that the time that the client application will be waiting until the first 200 results are obtained will be longer than if we had configured a smaller fetch size.
The fetched data is held in the ResultSet object generated by executing a query. If the fetch size is 10, then accessing the first 10 records simply iterates internally through the ResultSet data held on the client. The 11th access causes a call to the Virtual DataPort Server (or the data source) for another 10 records, and so on for each group of 10 records.
The Chunk Size/Fetch Size parameters can be set in several ways, depending on how widely the changes should apply:
There is another parameter that can be configured in combination with the chunk size. This parameter is the Chunk Timeout and it establishes the maximum time (in milliseconds) the Server waits before returning a new block of data. When this time is exceeded, the Server sends the current block to the client application, even if it does not contain the number of results specified in the Chunk Size parameter. It is possible to set the Chunk Timeout to 0 so the chunk size behavior will be the one explained above.
Note: If Chunk Size and Chunk Timeout are both 0, the Server returns all the results in a single block. If both values are different from 0, the Server returns a chunk for whichever one of these conditions is satisfied first:
The Chunk Timeout can be changed using the same methods as the Chunk Size explained above but using Chunk Timeout as parameter name instead.
Batch inserts simply means sending multiple update statements in one transaction and one call to the database. Denodo supports this capability when caching data and also when using the Data Movement functionality.
Note that for some data sources the use of batch inserts will require some additional configuration, see, for instance, MySQL configuration for batch inserts.
The parameter “Batch insert size” determines the number of rows per batch. The default value is the Batch size defined for the database in the “Cache configuration” dialog of the database (Administration – Database Management dialog). If the database does not define the Batch size, the cache module uses the value defined for the Server (Administration – Server Configuration – Cache - Read & Write menu).
This value can be changed for a particular view too (View Options tab):
Increasing this value may improve the performance of the cache insertion but there should be a balance between the number of rows and the amount of memory required by the server for that batch insert size. If the batch insert size is too big and the server requires a lot of memory to perform the insert a performance degradation may occur.
The parameter “Batch insert size” can be configured for the “Data Movement” functionality when defining the data source configuration in the “Read & Write” tab.
This number will be used to define the number of INSERT requests per batch during a Data Movement process. When performing Data Movement the Server is inserting data obtained from one data source into another. If the value of the target data source’s configuration property “Supports batch inserts” is “yes”, Virtual DataPort inserts the rows into the database in batches.
More information about Data Movement can be found in the Virtual DataPort Administration Guide. This property does not affect INSERT statements sent to this data source because they are not executed in batches.