Troubleshooting Usage¶
This section provides information on how to resolve the most common problems while using the Denodo Lakehouse Accelerator (formerly known as Denodo Embedded MPP).
To identify and troubleshoot common errors during the deployment of the Denodo Lakehouse Accelerator and its use from Denodo Virtual DataPort see Denodo Lakehouse Accelerator Troubleshooting.
PKIX path building failed: unable to find valid certification path to requested target
- Cause
The server certificate you are trying to connect is missing from the truststore of the client’s JVM. The problem is that the server certificate is self-signed or it is signed by a private authority that does not exist within the client’s truststore.
- Solution
Make sure you have imported the certificate of the Denodo Lakehouse Accelerator into the Denodo server’s truststore. See the instructions in the SSL/TLS section.
Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden;)
- Cause
The most common cause is that the IAM Role used by the Denodo Lakehouse Accelerator does not has the required permission to access the S3 data.
- Solution
To troubleshoot 403 issues from the Denodo Lakehouse Accelerator to S3 data check the following documentation:
Query exceeded per-node user memory limit of xGB, Query exceeded per-node total memory limit of xGB and Query exceeded distributed user memory limit of xGB
- Cause
Query exceeded per-node user memory limit of xGB:
The maximum amount of user memory a query can use on an individual MM Worker node, the
query.max-memory-per-node, has as default value theJVM max memory * 0.1(10% of the JVM’s assigned memory on the node).Depending on the configuration, the maximum JVM heap size for Denodo Lakehouse Accelerator processes (Coordinator and Workers) is automatically set to 80% of the Kubernetes memory limit or 80% of the total physical RAM available on the underlying node. But this error means that your Lakehouse Accelerator needs more memory than
JVM max memory * 0.1to handle queries.Query exceeded per-node total memory limit of xGB:
The maximum amount of user and system memory a query can use on an individual MM Worker node, the
query.max-total-memory-per-node, has as default value theJVM max memory * 0.2.Depending on the configuration, the maximum JVM heap size for Denodo Lakehouse Accelerator processes (Coordinator and Workers) is automatically set to 80% of the Kubernetes memory limit or 80% of the total physical RAM available on the underlying node. But this error means that your Lakehouse Accelerator needs more memory than
JVM max memory * 0.2to handle queries.Query exceeded distributed user memory limit of xGB:
The maximum amount of user memory that a query can use across all Lakehouse Accelerator Workers in the entire cluster, the
query.max-memory, has as default value the20GB.But this error means that your Lakehouse Accelerator needs more memory than
20GBto handle queries.
- Solution
Analyze query plans with EXPLAIN (TYPE DISTRIBUTED) on the problematic queries, using an external JDBC client like DBeaver. The query plan will provide clues for optimization by:
Gathering statistics for each view involved in the queries. Accurate statistics allow the query optimizer to create more efficient execution plans.
Adding filter conditions (WHERE clauses) as early as possible in the queries to reduce the amount of data processed by the Lakehouse Accelerator engine.
If this recommendation does not work well in your scenario, you need to increase the available memory by configuring the memory settings,
query.max-memory-per-node,query.max-total-memory-per-nodeandquery.max-memoryand apply the configuration change to the cluster by executinghelm upgrade lakehouseaccelerator lakehouseaccelerator/.For more information on how to configure the memory settings see the See Memory Settings for Query Performance section.
In addition to adjusting the memory settings, sometimes, the only solution to handle large queries is to use instances with higher memory per node or adding more nodes to the cluster (increasing
numWorkers).
Abandoned queries: Query … has not been accessed since…
- Cause
This means that the client of the Denodo Lakehouse Accelerator, that is Denodo, is not processing the query results or is processing them slowly, so the Lakehouse Accelerator assumes that the client has left.
- Solution
You can increase
query.client.timeoutfor the Denodo Lakehouse Accelerator coordinator, default value is 5 minutes,5.00m, in thevalues.yaml, and apply the configuration change to the cluster by executinghelm upgrade lakehouseaccelerator lakehouseaccelerator/.Additional properties in values.yaml¶presto: coordinator: additionalConfig: [ query.client.timeout=10.00m ]
But, in most cases, this is an indication that you need to review your query to identify where the bottleneck is and take actions to improve your query performance as explained in Detecting Bottlenecks in a Query.
hive-metastore:9083: java.net.SocketTimeoutException: Read timed out
- Cause
A read timeout occurs querying the Metastore, probably because the query involves very big tables or with too many partitions.
- Solution
If this only happens with some queries, you can increase the Metastore request timeout at the
values.yamlfile:metastore request timeout in values.yaml¶presto hive hiveMetastoreTimeout=60s delta hiveMetastoreTimeout=60s iceberg hiveMetastoreTimeout=69s
Otherwise, if the timeout error occurs on every query, check the connection from the Presto pod to the Hive-Metastore pod.
Error fetching next … returned an invalid response: JsonResponse{statusCode=500, statusMessage=Server Error, headers={connection=[close]}, hasValue=false} [Error: ]’
- Cause
This means that the HTTP header size exceeds its limits. Default value is
4kB.- Solution
You can increase the HTTP header limits for the Denodo Lakehouse Accelerator coordinator and the Lakehouse Accelerator workers to 64kB or bigger if needed , in the
values.yaml, and apply the configuration change to the cluster by executinghelm upgrade lakehouseaccelerator lakehouseaccelerator/.Additional properties in values.yaml¶presto: coordinator: additionalConfig: [ http-server.max-request-header-size=64kB, http-server.max-response-header-size=64kB ] worker: additionalConfig: [ http-server.max-request-header-size=64kB, http-server.max-response-header-size=64kB ]
org.apache.parquet.io.PrimitiveColumnIO cannot be cast to class org.apache.parquet.io.GroupColumnIO
- Cause
The Denodo Lakehouse Accelerator is reading a Hive table with complex/compound structures and the Hive table schema is not compatible with the Parquet schema.
- Solution
Check the schema in the Parquet files and compare it with the schema declared in the Hive table in the Lakehouse Accelerator. There are multiple tools available to inspect the schema of a Parquet file. One of the most common is called
parquet-tools.
Unable to establish connection: Unrecognized connection property ‘protocols’
- Cause
Denodo is loading an older version of the Presto driver.
- Solution
Remove Presto driver backups from
$DENODO_HOME/lib/extensions/jdbc-drivers/presto-0.1x, leaving only thepresto-jdbc.jar.
Registering Iceberg tables: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.facebook.presto.hive.s3.PrestoS3FileSystem not found
- Cause
Registering an Iceberg table using
s3a://, but in thelocationfield of the Iceberg table metadata the URI protocol used iss3://or vice-versa- Solution
Change the
register_tableprocedure of the Iceberg table so that it uses the same protocol as the one in thelocationfield of the Iceberg table metadata.Register iceberg table sentence s3a¶presto: CALL iceberg.system.register_table( 'default', 'denodo_iceberg_table', 's3a://bucket/path/to/iceberg/table/')
Solve problems with an OIDC provider and IRSA in Amazon EKS
- Cause
Problems can be caused by many different reasons, some of which are detailed in the AWS documentation.
- Solution
For troubleshooting issues related to an OIDC provider and IRSA in Amazon EKS, please refer to the following documentation:
system-memory-gb is greater than the node memory
system-memory-gb is greater than the node memory¶Line: /prestissimo/presto_cpp/main/LinuxMemoryChecker.cpp:102, Function:start, Expression: config_.systemMemLimitBytes <= availableMemoryOfDeployment (64424509440 vs. 33367973888) system memory limit = 64424509440 bytes is higher than the available machine memory of deployment = 33367973888 bytes., Source: RUNTIME, ErrorCode: INVALID_STATE
- Cause
The configured
system-memory-gbexceeds the physical memory available on the deployment node. Presto on Velox performs a check at startup to ensure the allocation is safe.- Solution
Adjust the
system-memory-gbvalue to be approximately 90% of the node’s total RAM. Review the error log for the exact comparison (in bytes):
query-memory-gb is greater than system-memory-gb
query-memory-gb is greater than system-memory-gb crash¶Reason: (38 vs. 23) Query memory capacity must not be larger than system memory capacity Retriable: False Expression: queryMemoryGb <= memoryGb
- Cause
The memory allocated for all queries,
query-memory-gb. cannot exceed the total memory limit managed by the system,system-memory-gb.- Solution
Update the configuration to ensure
query-memory-gbis less than or equal tosystem-memory-gb. We recommend settingquery-memory-gbto 60% of the machine memory andsystem-memory-gbto 90%.
system-mem-limit-gb is higher than the node memory
system-mem-limit-gb is higher than available machine memory crash¶Line: /prestissimo/presto_cpp/main/LinuxMemoryChecker.cpp:102, Function:start, Expression: config_.systemMemLimitBytes <= availableMemoryOfDeployment (64424509440 vs. 33367973888) system memory limit = 64424509440 bytes is higher than the available machine memory of deployment = 33367973888 bytes., Source: RUNTIME, ErrorCode: INVALID_STATE terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError' what(): Exception: VeloxRuntimeError Error Source: RUNTIME Error Code: INVALID_STATE Reason: (64424509440 vs. 33367973888) system memory limit = 64424509440 bytes is higher than the available machine memory of deployment = 33367973888 bytes. Retriable: False Expression: config_.systemMemLimitBytes <= availableMemoryOfDeployment
- Cause
The threshold for memory pushbacks or heap dumps,
system-mem-limit-gb, is set higher than the actual RAM available on the worker node.- Solution
Reduce
system-mem-limit-gbso it is equal to or slightly lower than the total machine memory. It should be greater than or equal tosystem-memory-gb.
The query fails because it exceeds the local memory limit
The query fails because it exceeds the local memory limit¶VeloxRuntimeError: Local arbitration failure. Reclaimable used capacity 0B is less than min reclaim bytes 128.00MB ARBITRATOR[SHARED CAPACITY[6.00GB] STATS[numRequests 36 numRunning 24 numSucceded 0 numAborted 0 numFailures 1 numNonReclaimableAttempts 0 reclaimedFreeCapacity 0B reclaimedUsedCapacity 0B maxCapacity 6.00GB freeCapacity 4.00GB freeReservedCapacity 4.00GB] CONFIG[kind=SHARED;capacity=6.00GB;arbitrationStateCheckCb=(set);global-arbitration-abort-time-ratio=0.5;global-arbitration-memory-reclaim-pct=10;memory-reclaim-threads-hw-multiplier=0.5;memory-pool-min-reclaim-bytes=128MB;check-usage-leak=1;fast-exponential-growth-capacity-limit=512MB;slow-capacity-grow-pct=0.25;global-arbitration-enabled=false;memory-pool-min-free-capacity-pct=0.25;max-memory-arbitration-time=5m;memory-pool-reserved-capacity=64MB;global-arbitration-without-spill=false;memory-pool-min-free-capacity=128MB;memory-pool-initial-capacity=128MB;memory-pool-abort-capacity-limit=8GB;reserved-capacity=4GB;]] Memory Pool[20260209_115728_00001_2nr29_0 AGGREGATE root[20260209_115728_00001_2nr29_0] parent[null] MMAP track-usage thread-safe]<max capacity 4.00GB capacity 2.00GB used 1.84GB available 0B reservation [used 0B, reserved 2.00GB, min 0B] counters [allocs 0, frees 0, reserves 0, releases 0, collisions 0])>
- Cause
The query has requested more memory than allowed by the local node limit or the total query memory pool. This usually occurs with complex joins or large aggregations.
- Solution
Review the query plan and the configured limits. You may need to increase
query.max-memory-per-nodeorquery-memory-gbif the hardware allows, or optimize the query to reduce its memory footprint.
S3 Query fails with HTTP 301 error
S3 Region Mismatch Error (HTTP 301)¶VeloxRuntimeError: Failed to get metadata for S3 object due to: 'Unknown error'. Path:'s3://your-bucket-name/data.parquet', SDK Error Type:100, HTTP Status Code:301, S3 Service:'AmazonS3', Message:'No response body.', Source: RUNTIME, ErrorCode: INVALID_STATE
- Cause
The Presto on Velox engine is attempting to access an S3 bucket located in a different region than the default endpoint, s3.us-east-1.amazonaws.com. While table creation and metadata exploration may work, the native Velox S3 client fails with an HTTP 301 (Moved Permanently) error when the endpoint does not point to the specific region where the bucket resides.
- Solution
Update the
s3Endpointproperty in yourvalues.yamlto include the correct region matching your S3 bucket. For example, use the structures3.<region>.amazonaws.com.
