Introduction
This document summarizes the steps from the Embedded MPP Documentation in a checklist format so it is easy to follow.
This document assumes that you are on the latest version of the component.
NOTE: For an automated deployment check the Deploying Denodo MPP in Azure Using ARM guide.
Azure Architecture
Elements Needed
Configuration Checklist
Step |
Done |
Task |
Section 0 - Planning |
||
0A |
⬜ |
Architecture Decisions By default the recommendation is to use N + 2 nodes:
Depending on the purpose of the Installation (production,POC, etc), these are the minimum recommended number of nodes:
Each node should have at least 128GB of memory and 16-32 cores. For example, in Azure you can start with Standard_E16s_v5 nodes or Standard_D32s_v5 Decide the number of nodes and if autoscaling will be enabled. For more information, check the Administration guide |
0B |
⬜ |
Select How Denodo MPP will be exposed to Denodo VDP You need to select one of these options to determine how Denodo MPP will communicate to the Denodo VDP Instance:
These values are configured in values.yaml in section 5. |
0C |
⬜ |
Denodo’s Embedded MPP Data In the case of Azure, this data will be commonly stored in Azure Blob Storage or Azure Data Lake Storage Gen2 but other HDFS compatible storages can be used. Decide what will be your data storage. Remember that the storage data files won’t be deleted automatically if the cluster is deleted. |
0D |
⬜ |
Denodo’s Embedded MPP Metadata Denodo’s MPP Metastore has the following options available:
Depending on the option selected you can adjust you can reduce the number of nodes to N + 1 (instead of N +2) as no node will be required for the Metastore. Decide if you will use the default metastore or a different one. More information can be found in the Denodo Embedded MPP User Manual. |
0E |
Denodo VDP Considerations If you have a cluster of Denodo servers it needs to be configured to store its metadata in an external database to take full advantage of the Denodo Embedded MPP functionalities. In case you only have one node (during a PoC for example) you need to set this property to false: SET 'queryOptimization.parallelProcessing.denodoConnector.enableUsingSharedMetadataOnly'='false' In order to use the embedded MPP features you need Enterprise Plus, in other case you may get errors similar to: “Error: EmbeddedMPPMaxProcessors are limited to X but current number is unknown” Before configuring this feature, please validate that your license allows it executing CALL VALIDATE_MPP_LICENSE() |
|
Section 1 - Configure AKS (Azure) |
||
1A |
⬜ |
Create Azure RBAC Roles The following roles need to be created in the Azure RBAC configuration. Create a custom role and name it “Denodo MPP Cluster Role,” adding the following permissions:
Create a custom role and name it “Denodo MPP Nodes Role,” adding the following permissions:
OPTIONAL: If you use Azure Container Registry, add this role too:
|
1B |
⬜ |
Create the nodes and assign the roles to the AKS Cluster Using the information from Section 0 - Planning:
|
1C |
⬜ |
Network Configuration
|
1D |
⬜ |
Autoscaling If autoscaling is required, this guide explains the configuration steps for AKS |
Section 2 - Azure Environment Requirements |
||
2A |
⬜ |
HDFS Storage and credentials Using the information from section 0 - Planning:
<property> <name>fs.azure.account.auth.type</name> <value>OAuth</value> </property> <property> <name>fs.azure.account.oauth.provider.type</name> <value>org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider</value> </property> <property> <name>fs.azure.account.oauth2.msi.tenant</name> <value>ADD_MSI_TENANT_ID</value> </property> <property> <name>fs.azure.account.oauth2.msi.endpoint</name> <value>http://169.254.169.254/metadata/identity/oauth2/token </value> </property> <property> <name>fs.azure.account.oauth2.client.id</name> <value>ADD_CLIENT_ID</value> </property>
<property> <name>fs.azure.account.auth.type</name> <value>OAuth</value> </property> <property> <name>fs.azure.account.oauth.provider.type</name> <value>org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider</value> </property> <property> <name>fs.azure.account.oauth2.client.endpoint</name> <value>https://login.microsoftonline.com/<ADD_DIRECTORY_ID>/oauth2/token</value> </property> <property> <name>fs.azure.account.oauth2.client.id</name> <value>ADD_CLIENT_ID</value> </property> <property> <name>fs.azure.account.oauth2.client.secret</name> <value>ADD_SECRET</value> </property>
cluster.sh deploy --abfs-storage-account xxx --abfs-storage-key yyy --credstore-password zzz
Obtain your details from your preferred option as they will be used later. |
2B |
⬜ |
Denodo Platform running and connectivity Obtain your Denodo VDP url and credentials. The Denodo VDP server needs access to the AKS Denodo MPP load balancer. The following ports represent the default values: Network security group of the Denodo Cluster / Instance
|
Section 3.1 - Container Registry - (Option 1) - Denodo Harbor |
||
3.1A |
⬜ |
Considerations Denodo Harbor credentials expire every 6 months. While this option is suitable for testing and proof-of-concept (POC) purposes, please consider a private registry (see section 3.2) for production scenarios as a best practice. |
3.1B |
⬜ |
Denodo Container Registry (Harbor) Credentials and firewall access Provide firewall access to Denodo’s Registry in Harbor https://harbor.open.denodo.com/ Obtain your denodo_account_username and registry profile secret that you can find at https://harbor.open.denodo.com/ Open the “User Profile” menu and click on “Generate Secret”. Copy and store the cli secret as it will be used later. |
Section 3.2 - Container Registry - (Option 2) - ACR (skip this section if using Harbor) |
||
3.2A |
⬜ |
Configure Azure CLI Configure Azure CLI using: az login az aks get-credentials --resource-group <Resource_Group_Name> --name <AKS_Cluster_Name> Then download MPP images from support.denodo.com where the CLI is configured |
3.1B |
⬜ |
Login to Azure Container Registry with your Docker installation az acr login --name <acr_name> |
3.1C |
⬜ |
Download Denodo Embedded MPP You can find the Denodo Embedded MPP in the Denodo Connects section of the Denodo Support Site. Unzip the file once it is downloaded. It contains the images that will be uploaded to ACR. |
3.1D |
⬜ |
Upload Denodo MPP images to ACR Follow these steps to upload all images needed to ACR: # Option 1: If Images downloaded locally. Load postgres, hive metastore, and presto depending on what is defined in section 0D. docker load < prestocluster-presto-<version>.tar.gz docker load < prestocluster-postgresql-<version>.tar.gz docker load < prestocluster-hive-metastore-<version>.tar.gz
# Option 2: If Using Harbor, use docker pull instead of docker load. Load postgres, hive metastore, and presto depending on what is defined in section 0D.
docker pull harbor.open.denodo.com/denodo-connects-8.0/images/prestocluster-presto:<version> docker pull harbor.open.denodo.com/denodo-connects-8.0/images/prestocluster-postgresql:<version> docker pull harbor.open.denodo.com/denodo-connects-8.0/images/prestocluster-hive-metastore:<version>
# Tag postgres image and push to ACR docker tag prestocluster-postgresql:<version> <acr_name>.azurecr.io/prestocluster-postgresql:<version> docker push <acr_name>.azurecr.io/prestocluster-postgresql:<version> # Tag hive metastore image and push to ACR docker tag prestocluster-hive-metastore:<version> <acr_name>.azurecr.io/prestocluster-hive-metastore:<version> docker push <acr_name>.azurecr.io/prestocluster-hive-metastore:<version> # Tag presto image and push to ACR docker tag prestocluster-presto:<version> <acr_name>.azurecr.io/prestocluster-presto:<version> docker push <acr_name>.azurecr.io/prestocluster-presto:<version> |
Section 4 - Cluster.sh Requirements |
||
4A |
⬜ |
Download Denodo Embedded MPP You can find the Denodo Embedded MPP in the Denodo Connects section of the Denodo Support Site. Unzip the file once it is downloaded. It contains the cluster.sh script and the other dependent artifacts that will be used in the next section |
4B |
⬜ |
Use Linux or a compatible shell if in Windows To run cluster.sh on Windows you need to have a Bash compatible shell such as Cygwin or Git Bash installed or use Windows Subsystem for Linux (WSL). |
4C |
⬜ |
Configured and authenticated kubectl command The cluster.sh script calls kubectl so it needs to be properly configured with the correct context and with the right credentials You can check if kubectl is correct using: kubectl get nodes https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/ |
4C.1 |
⬜ |
Configure authentication when using Azure RBAC roles When using Azure RBAC roles for authentication or using the ARM template, kubelogin must be installed. You can install kubelogin using az cli with the command “az aks install-cli”. Note: If you get a ssl_certifacte error you can add it using the curl command “curl https://api.github.com/repos/Azure/kubelogin/releases/latest” After adding the Cluster context to kubectl you must configure the kubeconfig to use kubelogin using the command “kubelogin convert-kubeconfig -l azurecli” For more installation options you can visit: https://azure.github.io/kubelogin/install.html |
4D |
⬜ |
Install Helm V3 for Kubernetes Installations steps can be found here https://helm.sh/docs/intro/install/ |
4E |
⬜ |
Configure environment variable HADOOP_HOME (only if running cluster.sh from Windows) If you are running cluster.sh on Windows you need to apply extra configuration. Check if the environment variable HADOOP_HOME is set on this computer. Since Hadoop is required by cluster.sh to transparently manage the encryption of all user-provided credentials. If HADOOP_HOME is not set:
|
4F |
⬜ |
Install Java and configure JAVA_HOME and PATH An installation of Java (11 recommended) is required. The JAVA_HOME and PATH environment variables should be properly configured. E.g. export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 export PATH="$PATH:$JAVA_HOME/bin" |
4G |
⬜ |
Import the Denodo Embedded MPP certificate You need to import a certificate into the Denodo VDP server trust store using, for Windows: .\jre\bin\keytool -importcert -alias presto-denodo -file denodo-presto-k8scluster-x.x-xxxxxx/certs/certificate.crt -cacerts -storepass changeit Or if using Linux: sudo ./jre/bin/keytool -importcert -alias presto-denodo -file /denodo-presto-k8scluster-x.x-xxxxxx/certs/certificate.crt -cacerts -storepass changeit We provide a testing certificate inside /certs/certificate.crt that is meant FOR TESTING PURPOSES ONLY. This certificate accepts presto-denodo as the Denodo Embedded MPP hostname. We recommend that you use a certificate issued by a CA in production. Follow the documentation for these steps. |
Section 5 - Configure prestocluster/values.yaml for the Embedded MPP configuration |
||
5A |
⬜ |
Add the repository credentials (Denodo’s Harbor or Azure registry) If using Denodo Harbor, add the username and password image.repository: "harbor.open.denodo.com/denodo-connects-8.0/images" pullcredentials.enabled=true pullcredentials.name="denodo-mpp-registry-secret" pullcredentials.registry="harbor.open.denodo.com" pullcredentials.username= pullcredentials.pwd= If using Azure Container Registry (leave pullSecret empty): image.repository: “<account_id>.dkr.ecr.<ecr_region>.amazonAzure.com” image.pullSecret: “” |
5B |
⬜ |
Connection Details to the Denodo Server Using the information from Section 2C, configure your Denodo Server parameters in the denodoConnector section:
This step will create a denodoConnector.user and denodoConnector.password. This user will be used by Denodo MPP to connect to Denodo. The credentials used to create that user are prompted when running cluster.sh (deploy) register |
5C |
⬜ |
Configure your number of workers, cpu and memory that each Denodo MPP Worker will use The following parameters allow you to configure the number of presto workers and the CPU and memory resources used by each, in your AKS cluster. # -- Number of Presto workers in the cluster numWorkers: 4 # -- Number of cores assigned to each worker cpusPerNode: 16 # -- Total memory, in GB, assigned to each worker memoryPerNode: 128 As N+2 is the nodes recommendation, presto.numWorkers should be your total number of nodes in AKS - 2. E.g. if you have 3 nodes total, your presto.numWorkers should be 1 |
5D |
⬜ |
Configure the Service Section (if not using the default Load Balancer) If you want to configure a different Service option than LoadBalancer, you can use the service.type value of values.yaml service: # -- Service type: ClusterIP, NodePort or LoadBalancer type: LoadBalancer |
5E |
⬜ |
Configure an external Metastore (if not using the default Hive Metastore - Postgresql) If using the default Hive - Postgres option check section 5.1 for recommendations and skip section 5.2. If using an external Metastore, skip section 5.1 and check “5.2 Configure an external Metastore” instead. |
5F |
⬜ |
Adjust your components memory properties It is important to adjust memory settings for query performance, finding a balance between maximum memory per query and the maximum number of concurrent queries that can be run in the Denodo Embedded MPP. You can configure the memory settings in the files prestocluster/presto/conf/config.properties.coordinator and prestocluster/presto/conf/config.properties.worker. Find additionals details in section “Memory” here |
5G |
⬜ |
Configure an Internal Load Balancer If you want to configure an internal load balancer for your AKS cluster you will have to add an specific annotation to your values.yaml Remember that you will have to ensure connectivity from the internal ELB network using NAT Gateways for private networks or Private Links for isolated networks … presto: Service: … annotations: service.beta.kubernetes.io/azure-load-balancer-internal: "true" … … |
Section 5.1 (Optional) - Default Metastore |
||
5.1A |
⬜ |
How to maintain the metadata after redeploys Follow these steps if you want to maintain the metadata available in the Metastore after a redeploy:
pvClaim: annotations: "helm.sh/resource-policy": keep ... securityContext: # -- Force to run as a non-root user to ensure the least privilege runAsNonRoot: true # -- User ID for the container. Ignored on OpenShift. runAsUser: 1001 # -- Group ID for the pod volumes. Ignored on OpenShift fsGroup: 1001 fsGroupChangePolicy: OnRootMismatch Note: For deployments across multiple Availability Zones (AZ), it's crucial to ensure that the PostgreSQL pod is always rescheduled within the same AZ as its associated Persistent Volume (PV). To achieve this, you should verify the allowedTopologies for the StorageClass and the AZ affinity settings for PostgreSQL (found in the prestocluster values.yaml file). In the example below (in bold) you can see how the pod is always assigned to the same AZ: apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: standard provisioner: kubernetes.io/example parameters: type: pd-standard volumeBindingMode: WaitForFirstConsumer allowedTopologies: - matchLabelExpressions: - key: topology.kubernetes.io/zone Values: - eastus2-1 |
Section 5.2 (Optional) - Configure an external Metastore |
||
5.2A |
⬜ |
Configuring an existing Hive Metastore With this option, Denodo won’t provision Postgresql or Hive Metastore and will connect to an existing external Metastore.
If kerberos is required you can follow these steps or configure the steps in section “5.2B Configuring Hive Metastore with a existing metastore database” |
5.2B |
⬜ |
Configuring Hive Metastore with a non-existing database different from Postgresql In this scenario, Denodo will provision Hive Metastore but will use a database different from the default Postgresql. This option is used when there are policies restricting the type of RDBMS that can be installed or if there are issues with step 2.B.
|
5.2C |
⬜ |
Configuring AWS Glue as Metastore In case that you already have a AWS Glue Data Catalog containing table definitions you want to access from the Denodo Embedded MPP, you could use the AWS Glue Data Catalog as an external Metastore either as a replacement of the default Metastore or in addition even when deploying MPP in Azure. e.g. the most common scenario will be having one metastore in Azure and one in AWS allowing a multi-cloud architecture.
|
5.2D |
⬜ |
Configuring multiple Metastores With this option you will be able to connect to different Metastores. This option is common in migration scenarios. For example, if planning on migrating from Athena to Denodo MPP. More information can be found in the Denodo Embedded MPP User Manual. |
Section 6 - Deploying the server |
||
6A |
⬜ |
Execute cluster.sh deploy This command will deploy the Denodo Embedded MPP cluster. The syntax for this command will depend on the HDFS storage used and additional details can be found in the Deployment section here. If you are following the steps in this checklist, you just need to use a value for --credstore-password. The credstore-password will be used to protect the keystore cluster.sh deploy --credstore-password replace_with_a_password During the execution of the script, you will be able to specify the passwords for the Metastore and Denodo MPP. You can also leave them by default by clicking “return”:
|
6B |
⬜ |
Check that the pods are in “Running” status Execute the following command to check the status of each pod: Kubectl get pods The following pods should have STATUS=Running:
The pods use a Readiness Probe and Liveness Probe. Both probes are used to control the health of the cluster. If the Liveness probe fails, the container will be restarted, while if the Readiness probe fails, the container will stop serving traffic. If some of the pods didn’t start correctly, please check the “Kubernetes debugging commands” section at the end of this document. |
6C |
⬜ |
Denodo VDP Hosts File (When using the Self Service Certificate) When using the Self Service Certificate as described previously, the name of the Denodo MPP Server needs to be presto-denodo. In order to do that, we need to edit the hosts file (e.g. in Linux /etc/hosts or in Windows WIN/system32/drivers/etc/hosts of the server where Denodo is installed. Add the following line: Denodo_MPP_IP presto-denodo To obtain the value for Denodo_MPP_IP you can execute kubectl get svc and obtain the external url for the LoadBalancer (This depends on step 0B)
|
6D |
⬜ |
Register the MPP Server in the Denodo VDP Server Once all the pods are running, we need to create a datasource in our Denodo VDP Server pointing to Denodo MPP. To do that execute: cluster.sh register To run cluster.sh on Windows you need to have a Bash compatible shell such as Cygwin or Git Bash installed or use Windows Subsystem for Linux (WSL). |
6E |
⬜ |
Check that there is connectivity between Denodo VDP and Denodo MPP In Denodo, there should be a new vdb (by default admin_denodo_mpp) and a mpp datasource (embedded_mpp). There are three things that can be tested:
|
Section 7 - First steps after deployment |
||
7A |
⬜ |
Configure section “Use bulk data load APIs” The information in this section will be used when inserting data in your Denodo MPP Cluster. Once the section is configured click on “Test Bulk Load” and validate that there are no errors |
7B |
⬜ |
Configure Denodo Hadoop properties If the instructions on step 4C.1 were followed then you have to add the following hadoop properties on the embedded_mpp data source and the bulk data load APIs:
Value: OAuth
Value: org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider
Value: <MANAGED_IDENTITY_TENANT_ID>
Value: http://169.254.169.254/metadata/identity/oauth2/token
Value: <MANAGED_IDENTITY_CLIENT_ID> |
7C |
⬜ |
Create a base view for a parquet file Click on “Create Base View” to validate that you can introspect the parquet files (if any) available in your data storage routes |
Kubernetes debugging commands
The following commands are useful in case something fails:
- kubectl describe pod <pod name> (pod status information)
- kubectl logs <pod name> (pod logs)
- kubectl logs <pod name> --previous (logs of the last run of the pod before it crashed. Useful in case you want to figure out why the pod crashed in the first place.)
- kubectl exec -it <pod name> -- bash (interactive shell access to the pods)
- Interactive shell access to crashing pods, e.g. hive metastore:
- Edit prestocluster/templates/hive-metastore-template.yaml
- Replace:
command: ["/opt/run-hive-metastore.sh"]
by
command: [ "sleep" ]
args: [ "infinity" ]
- Redeploy the MPP
- Enter the hive metastore container to continue your debugging:
- kubectl exec -it <pod name> -- bash
- Since 8.0.20240306 you have to disable the livenessProbe in the values.yaml file, otherwise the livenessProbe will kill the pod after a few seconds
- E.g. for the metastore: metastore.livenessreadiness.enabled: false
- Change Denodo MPP configuration without redeploying (since 8.0.20240306):
execute helm upgrade prestocluster prestocluster/
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.