Overview
Since update 8.0u20230301, Denodo includes embedded Massively Parallel Processing (MPP) capabilities to improve performance on environments containing data in an object storage. For this purpose, Denodo now embeds a customized version of Presto, which is an open source parallel SQL query engine that excels in accessing data lake content.
The Denodo Embedded MPP cluster can be deployed following the instructions in the Denodo Embedded MPP user manual. Once the cluster is available, Virtual DataPort can make use of the embedded MPP to access data in an object storage system like HDFS, S3, ADLS gen 2, and also leverage its massive parallel capabilities to accelerate queries (see section Denodo Embedded MPP of the Virtual dataPort Administration Guide).
The goal of this document is to provide information and tools to Denodo Administrators to troubleshoot the most common issues during the deployment of the cluster and its usage from Virtual DataPort.
Embedded MPP deployment
The Denodo Embedded MPP deployment is based on Kubernetes and Helm. Therefore, in order to manage and troubleshoot the deployment, the reference tools are the Kubernetes command line tool Kubectl and the Helm CLI. You can take a look at the kubectl Quick Reference guide to be familiar with the most common commands.
Summary of the most useful commands for troubleshooting
- Review the services deployed using kubectl get (basic) or describe (verbose):
$ kubectl get services # List all services in the namespace $ kubectl get svc <service name> # Get service details $ kubectl describe service <service name> |
- Review the configuration used for all the Kubernetes resources deployed on the values.yaml. If the values.yaml is not available use helm get manifest.
$ helm get manifest prestocluster |
- Review and debug the cluster pods:
$ kubectl get pods # List all pods in the namespace $ kubectl describe pod <pod id> # Describe a pod $ kubectl logs <pod name> # dump pod logs (stdout) $ kubectl exec -it <pod name> -- bash (interactive shell access to the pods) |
- Review the log of a pod that has already crashed (useful in case you want to figure out why the pod crashed in the first place):
$ kubectl logs <pod name> --previous # dump pod logs (stdout) for a previous instantiation of a container |
Troubleshooting problems during the deployment
Common issues
- The deployment requires Helm 3. Running the process using Helm 2 can fail with errors related to permissions like “<resource> is forbidden: User “username” cannot list resource “<resource>” in API group “” in the namespace “<namespace>””.
- kubectl get svc shows Pending for the EXTERNAL-IP of the presto service for a long time:
- Check if there are other Presto clusters running. It can happen that there are no public IPs available in the subnets selected because they are all used for the other clusters. If that’s the case:
- Try deleting one of the others
- In the current cluster execute: cluster delete and cluster deploy again.
- In addition, check the load balancers list. If there are load balancers that are not being used remove them so the IPs become available again.
- Execute kubectl describe svc presto. If it shows "service Failed build model due to unable to resolve at least one subnet (0 match VPC and tags)". In case you are using a AWS Load Balancer Controller and the Load Balancer has type network (The type is available at the AWS console at the Load Balancers section) check the AWS article “How do I automatically discover the subnets that my Application Load Balancer uses in Amazon EKS?”
- Otherwise see section “How to debug a service?”
- kubectl get pods shows Postgres pod in Pending status: See section “Postgres pod in Pending status”
- kubectl get pods shows a pod with status ImagePullBackOff or ErrImagePull: See section “Pod with status ImagePullBackOff or ErrImagePull”
- Hive metastore gets stuck trying to connect to Postgres. Check the section “Hive-metastore cannot connect to PostgreSQL”.
- Presto cannot connect to the Hive-metastore: See section “Presto cannot connect to the Hive-Metastore”.
- For any other issue follow the steps in section “Steps for troubleshooting”,
Postgres pod in Pending status
- If the cluster is deployed in AWS, this could be caused by the Kubernetes version being >= 1.23. For clusters with Kubernetes version >= 1.23 you must install the Amazon EBS CSI driver Amazon EKS add-on and attach the policy “AmazonEBSCSIDriverPolicy” to: 1) the worker node’s roles for the cluster and 2) to the cluster’s ServiceRole. See section AWS Environment Requirements of the Denodo Embedded MPP AWS Checklist for more information.
- This can also happen if you have deleted the cluster and then deployed it again very quickly. Try deleting the cluster again and deploy it one more time waiting a little before the new deployment.
- Otherwise follow the instructions in section “How to debug a pod?”.
Pod with status ImagePullBackOff or ErrImagePull
These errors happen when it was not possible to pull the docker image from the registry:
- Use kubectl describe pod command and look in the Events section.
- If that does not provide valuable information, review the values.yaml to check the repository settings. If the values.yaml is not available use helm get manifest.
- Review that you have configured either pullSecret or pullCredentials. In case you are using pullCredentials make sure enabled is set to true.
- If the registry is the Denodo's Harbor Registry, log in in Harbor using the same user configured on the image.repository section of the values.yaml. This is to make sure there exists an active session for that user, which is required to verify the Client ID configured.
- For more information follow the instructions in section “How to debug a pod?”.
Hive-metastore cannot connect to PostgreSQL
Check the log of the hive-metastore pod:
- Find the name of the hive-metastore pod executing: $ kubectl get pods
- Get the logs: $ kubectl logs <hive-metastore pod name>
If the log contains an error “java.security.KeyStoreException: jceks not found” or “MetastoreSchemaTool: Error getting metastore password” it most likely mean it is an OpenShift cluster running in FIPS-mode (The Federal Information Processing Standard (FIPS). Contact the support team to assist you on this issue.
For more information follow the instructions in section “How to debug a pod?”.
Presto cannot connect to the Hive-Metastore
- Verify the hive-metastore is up and running executing $ kubectl get pods
- Check the Presto logs (see section “Manage Presto logs”)
- If the connection error only happens creating a very big table it is probably due to a timeout. You can increase the default timeout at the values.yaml file:
- presto.hive.hiveMetastoreTimeout
- presto.delta.hiveMetastoreTimeout
- presto.iceberg.hiveMetastoreTimeout
- If the connection error happens for every query check the connection from the Presto node to the hive-metastore node following the steps in the section “How to debug a connectivity issue between services”.
Steps for troubleshooting
- Review the status of the cluster: Use kubectl get and kubectl describe to review the status of the cluster services and pods.
- If one of the services or pods shows an incorrect state: Follow the instructions under “How to debug a service?” or “How to debug a pod?”.
- Edit the configuration to fix the issue and update the deployment if necessary. Take into account not all changes require redeploying the cluster (see section “Edit the cluster configuration without re-deploying”).
The following subsections provide more details for each of these steps.
Review the status of the cluster
- Review the services deployed using kubectl get:
$ kubectl get services # List all services in the namespace $ kubectl get svc <service name> # Get service details |
In case one of the services shows something incorrect, for example the presto service does not obtain an External IP address, follow instructions in section “How to debug a service?”.
- Review the cluster pods:
$ kubectl get pods # List all pods in the namespace $ kubectl describe pod <pod id> # Describe a pod |
- If the pod shows the status ImagePullBackOff or ErrImagePull follow instructions in section “Pod with status ImagePullBackOff or ErrImagePull”.
- If the status of the Postgres pod is Pending follow the steps in section “Postgres pod in Pending status”.
- Otherwise follow the instructions in section “How to debug a pod?”.
How to debug a service?
- Use kubectl describe service to show the service details.
When the presto service does not obtain an External IP address
$ kubectl describe service presto |
- Review the latest events in case it provide some insights into the state of the cluster (keep in mind that events retention period is short (approximately 1h)):
$ kubectl get events --sort-by='.lastTimestamp' [--namespace mynamespace] |
How to debug a pod?
- Use kubectl describe pod command to describe a pod with verbose output. This command is useful if kubectl get pods shows that a pod is not running and it has an incorrect status like pending, ImagePullBackOff, ErrImagePull, etc:
kubectl describe pod <POD_NAME> |
Check if the exit code is one of the Docker exit status codes, otherwise you can also review this extended list.
- Use kubectl logs to review the log of a pod, use –previous option if the pod has already crashed:
$ kubectl logs <pod name> # dump pod logs (stdout) $ kubectl logs <pod name> --previous # dump pod logs (stdout) for a previous instantiation of a container |
- Review the latest events in case it provide some insights into the state of the cluster:
$ kubectl get events --sort-by='.lastTimestamp' [--namespace mynamespace] |
- Verify the manual modifications on the yaml files are correct. If any of the yaml files were edited manually, take into account it is very easy to make a mistake related to the formatting, especially editing lists (e.g. the memory limits under worker.additionalConfig). Try restoring the previous version before the manual changes to verify if that is the cause of the error and consider editing the yaml file from an editor that includes validations. For example Visual Studio Code installing the YAML extension.
- Use kubectl exec -it to access the pod using an interactive shell:
$ kubectl exec -it <pod name> -- bash (interactive shell access to the pods) |
- If the pod crashes and the log does not provide enough information you can do the following in order to access the pod before it crashes:
- Edit the yaml template
- Replace the start command by the sleep command instead to prevent the pod from crashing.
- Redeploy the cluster
- Access the pod using kubectl exec -it <pod name> -- bash
For example, in order to debug the hive-metastore pod:
- Disable livenessreadiness property in the values.yaml for the specific service (metastore in this case), otherwise the livenessProbe will kill the pod after a few seconds.
- Edit prestocluster/templates/hive-metastore-template.yaml
- Replace the start command: command: ["/opt/run-hive-metastore.sh"] by one of the following alternatives:
command: [ "sleep" ] |
or
command: ["/bin/sh", "-ec", "sleep 1000"] |
This way it is possible to access the hive metastore container and do additional checks. For instance, check the keystore file, creds.jceks, which is located in /opt/creds. You can also run the hive metastore process with /opt/run-hive-metastore.sh, since it is not running due to the change in the yaml file.
How to debug a connectivity issue between services
This could be due to different causes, for instance:
- Network issue:
- DNS resolution
- Firewall rule
- Wrong credentials
- Failed SSL/TLS handshake
First, check the pod log as described in “How to debug a pod?”. If that does not provide enough information:
- Review the configuration used for all the Kubernetes resources deployed on the values.yaml file. If the values.yaml file is not available use helm get manifest.
- Check if there is a connectivity issue between the pods.
- Try a ping from one pod to the other. In order to do that, you can run a pod using the busybox image that provides some useful tools like ping:
- kubectl run busybox --rm -it --image=busybox
- From the busybox shell you can use commands like ping and telnet.
- You can run kubectl get pods -o wide to see the different IPs assigned to each pod and verify the ping works.
- Check if it is a DNS issue following this DNS resolution guide from the Kubernetes documentation.
Edit the cluster configuration without re-deploying
To modify a Kubernetes resource without redeploying, just update the necessary files and execute:
$ helm upgrade prestocluster prestocluster/ |
For example, to modify the Presto log levels from ERROR to DEBUG just edit the section presto.log of the values.yaml and execute the above helm command. We recommend editing the yaml file from an editor that includes validations to avoid problems. For example Visual Studio Code installing the YAML extension.
NOTE: For versions of the “Denodo Embedded MPP” prior to 20240307, instead of helm upgrade you can use kubectl edit and kubectl rollout commands to modify a Kubernetes resource without redeploying. Take into account that kubectl edit opens an editor.
Troubleshooting errors in Presto
Manage Presto logs
- Check the Presto logs:
- Find the name of the Presto coordinator pod executing: $ kubectl get pods
- Get the logs: $ kubectl logs <presto coordinator pod name>
- Change Presto log level to debug following the example in section “Edit the cluster configuration without re-deploying”.
Troubleshoot failing queries
- Denodo 9:
- Execute the query from the Design studio and check the execution trace. For each access to Presto it contains a node “MPP Execution Plan” and several nodes “MPP Stage Plan” including information regarding the execution in Presto. If there is an error the trace should include the error message. If this does not provide enough information try the steps for the Denodo 8 version.
- Denodo 8:
- Execute the following query to identify the id of the Presto query: SELECT * FROM admin_denodo_mpp.denodo_mpp_queries ORDER BY started DESC;
- Execute view admin_denodo_mpp.denodo_mpp_query_details using the id from the previous query and check the information provided, specially the value for column failure info.
- If the previous information is not enough, check the “Presto logs” and try to access the Presto web UI at https://presto-denodo:8443/ui/. Find the query Denodo is sending on the list and click on the id:
- If there is an exception the exception trace is available at the ‘Overview’ tab
- Otherwise, click on the ‘Live Plan’ tab and review if any of the Stages has errors. If there exist an error at a ScanFilter stage accessing a table in the “denodo” catalog: 1) If the error shows it could not connect the Virtual DataPort servers, see section “Troubleshoot connection error from Presto to Denodo”; 2) Otherwise:
- Review the errors in log <DENODO_HOME>/logs/vdp/vdp.log.
- Execute the query again from the Design Studio using the option CONTEXT('data_movement_clean_resources'='false')
- Identify again the ScanFilter stage that is failing and copy the name of the schema and the table Presto is trying to access.
- In the Design studio, open the VQL Shell and execute: SELECT * FROM <schemaName>.<tableName> trace
Troubleshooting performance problems
Presto resource management
In general, the recommended approach to make the best use of the cluster resources is to:
- Follow the sizing recommendations for the MPP cluster.
- Configure auto scaling to control CPU and memory management. For this purpose, visit the KB articles regarding MPP auto scaling in AWS and Azure.
- Adjust the memory limits using global properties at the values.yaml file (presto -> worker | coordinator -> additionalConfig) like query.max-memory-per-node. Take into account Presto kills the query if it reaches these limits. We recommend editing the yaml file from an editor that includes validations to avoid problems. For example Visual Studio Code installing the YAML extension.
- Use the Denodo resource manager for further control. This allows setting restrictions on:
- Concurrency.
- Execution time:
- Stop queries if they have been running for more than X minutes.
- Stop query after it has returned X rows.
- Both concurrency and execution time rules will also have an indirect effect on the CPU and memory management.
Query exceeding Presto memory limits
Presto includes some default memory limits for the maximum memory used. If a query fails because it reaches one of these limits. Review if the default limits are too low for the machines used for the worker nodes and the concurrency levels of the workload and follow the recommendations in section “Presto resource management”.
Check the resources used by the MPP cluster
Using Denodo, execute the view admin_denodo_mpp.denodo_mpp_workers_status. For each worker it provides useful information about the current status and the heap memory used.
Using kubectl, the following commands allow to know the nodes that compose the Kubernetes cluster and their available resources:
- kubectl get nodes
- kubectl describe node <node-name>
- kubectl get pods -o wide # command to know how the pods are distributed
- kubectl get pods -o wide -n <namespace> # Same as above for a particular <namespace>
- kubectl describe ns <namespace>
- kubectl top pod -n <namespace> # top displays CPU and memory usage
NOTE: The top option requires having an additional tool installed to measure the consumption, for example metrics.
Check latency between Presto and Denodo
In order to check the latency between the Denodo MPP and the Virtual DataPort server:
From the Virtual DataPort server execute: telnet <Presto address> 443
Using kubectl to access Denodo from the MPP cluster from a busybox container:
- kubectl run busybox --rm -it --image=busybox
- Execute: telnet <Virtual DataPort address> 9999
If it is not possible to run the tests above please contact the Administration team to gather this information.
Embedded MPP and Virtual DataPort connectivity
Troubleshoot connection error from Denodo to Presto
- Review Virtual DataPort logs (<DENODO_HOME>/logs/vdp/vdp.log)
- Review the password for Presto is correct. That it matches the password specified at the presto.coordinator.prestoPassword property of the values.yaml file. Take into account if you redeploy the cluster with a password for Presto different than the one used at the time when embedded_mpp was created, then you need to update the password in Virtual DataPort executing the following from the VQL Shell of the Design Studio:
SET 'com.denodo.embeddedmpp.password.secret' = '<clearPassword>'; |
- Review the SSL/TLS configuration:
- In case you are using the certificate in certs/certificate.crt (which is distributed for testing purposes ONLY), make sure that the /etc/hosts of the Virtual DataPort server contains an entry for presto-denodo, as described in section SSL/TLS of the MPP cluster manual.
- In case you are using a different certificate, but it is signed by a private authority, or it is self-signed, make sure that it is included in the truststore of the Virtual DataPort servers.
- In order to debug problems during an SSL connection:
- Include the following JVM parameter in the Virtual DataPort servers:
-Djavax.net.debug=all |
- Test the connection.
- Remove the JVM parameter as it is very verbose.
- Review log <DENODO_HOME>/logs/vdp/vdp.log.
- Review the host configured on the VQL of the data sources embedded_mpp and denodo_mpp_api in database admin_denodo_mpp is correct.
- Review the network security rules to verify there is an inbound rule to allow connections from the Virtual DataPort servers to the Presto cluster.
Troubleshoot connection error from Presto to Denodo
Presto includes a connector to Denodo. If there is a connection error from Presto to Denodo:
- Review the “Presto logs”.
- Review the configuration in section denodoConnector of the values.yaml. In case the values.yaml is not available, execute helm get manifest, look for the ConfigMap presto-catalog and review the denodo.properties configuration.
- If Denodo is configured using SSL/TLS and the certificate is signed by a private authority, or it is self-signed, verify that it has the trustStore configured specifying the truststore containing the certificate.
- If the previous steps have not solved the issue see section “How to debug a connectivity issue between services”.
Virtual DataPort
Problems creating base views from the embedded MPP data source
License error
If there is an error regarding it is not possible to validate the license for the embedded MPP click on the ‘Validate MPP License’ button that is next to the Test connection button to get more information (available since Denodo 9), or use the VALIDATE_MPP_LICENSE stored procedure.
Error accessing the object storage
- Review the network security rules for the storage to verify Denodo Virtual DataPort can access.
- If you are using SSL/TLS to access the object storage and the certificate is signed by a private authority, or it is self-signed, make sure that it is included in the truststore of the Virtual DataPort servers.
- Review the Virtual DataPort log (<DENODO_HOME>/logs/vdp/vdp.log).
- In case it is accessing an Azure storage account take into account:
- TLS 1.0 and 1.1 support will be removed for new & existing Azure storage accounts starting Nov 2024.
- Recent TLS 1.3 support in Azure storage accounts may cause connections using SSL/TLS to fail and Denodo could return a timeout as the connection never goes through. In that case, include the following JVM parameters to specify the TLS versions Virtual DataPort should allow excluding version 1.3. For instance:
-Dhttps.protocols="TLSv1,TLSv1.1,TLSv1.2" -Djdk.tls.client.protocols="TLSv1,TLSv1.1,TLSv1.2" |
- In any other case, if the log does not provide enough information, execute the following from a VQL Shell of the Design Studio to log more information:
CALL LOGCONTROLLER('com.denodo.vdb.util.instrospectionservice.actions.hadoopbased.DescHadoopBasedSourceAction', 'TRACE'); CALL LOGCONTROLLER('com.denodo.vdb.util.hdfs', 'TRACE'); CALL LOGCONTROLLER('org.apache.hadoop.fs.FileSystem', 'DEBUG'); |
- Test the connection to the storage route again .
- Restore the log levels to error.
- Review log <DENODO_HOME>/logs/vdp/vdp.log.
- Finally, in order to debug problems with the SSL connection, if none of the previous steps have clarified the issue:
- Include the following JVM parameter in the Virtual DataPort servers:
-Djavax.net.debug=all |
- Test the connection.
- Remove the JVM parameter as it is very verbose.
- Review log <DENODO_HOME>/logs/vdp/vdp.log
After creating a base view over data in an object storage the execution fails because the table does not exist
- Review the test connection to the embedded MPP data source. If the test connection does not work, follow instructions in section “Troubleshoot connection error from Denodo to Presto”.
- Review the Presto log. See section “Manage Presto logs”.
- Review the Virtual DataPort log (<DENODO_HOME>/logs/vdp/vdp.log)
- If this log does not provide useful information:
- Execute the following from a VQL Shell of the Design Studio:
CALL LOGCONTROLLER('com.denodo.vdb.catalog.view.BaseView', 'TRACE'); CALL LOGCONTROLLER('com.denodo.vdb.util.tablemanagement', 'DEBUG'); |
- Create the base view again.
- Restore the log levels to error.
- Review log <DENODO_HOME>/logs/vdp/vdp.log
Embedded MPP acceleration is not working
- Review the Virtual DataPort log (<DENODO_HOME>/logs/vdp/vdp.log)
- If the log does not provide enough information:
- Execute the following from a VQL Shell of the Design Studio:
CALL LOGCONTROLLER('com.denodo.vdb.catalog.view.transformation', 'debug'); CALL LOGCONTROLLER('com.denodo.vdb.engine.costoptimizer', 'debug'); CALL CALL LOGCONTROLLER('com.denodo.vdb.interpreter.execution.util.DataMovementExecutor', 'debug'); |
- Run the query
- Restore the log levels to error
- Review log <DENODO_HOME>/logs/vdp/vdp.log
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.