You can translate the document:

Introduction

This document summarizes the steps from the Embedded MPP Documentation in a checklist format so it is easy to follow.

This document assumes that you are on the latest version of the component.

NOTE: For an automated deployment check the Deploying Denodo MPP in Azure Using ARM guide.

Azure Architecture

Elements Needed

Configuration Checklist

Step

Done

Task

Section 0 - Planning

0A

Architecture Decisions

By default the recommendation is to use N + 2 nodes:

  1. One node for Denodo MPP’s coordinator
  2. One node for Denodo MPP’s Metastore (If using the default Hive Metastore and Postgresql option)
  3. N for the Denodo MPP workers

Depending on the purpose of the Installation (production,POC, etc), these are the minimum recommended number of nodes:

  • If only basic functionality in a POC like showing the cache start with 5 or 7 nodes (3-5 workers)
  • If the main focus is the Embedded MPP, then start with 10 or 18 nodes (8-16 workers)

Each node should have at least 128GB of memory and 16-32 cores. For example, in Azure you can start with Standard_E16s_v5 nodes or Standard_D32s_v5

Decide the number of nodes and if autoscaling will be enabled.

For more information, check the Administration guide

0B

Select How Denodo MPP will be exposed to Denodo VDP

You need to select one of these options to determine how Denodo MPP will communicate to the Denodo VDP Instance:

  • LoadBalancer (will be used by default)
  • ClusterIP private + Ingress (common in customers with existing infrastructure).  
  • NodePort (not commonly used)

These values are configured in values.yaml in section 5.

0C

Denodo’s Embedded MPP Data

In the case of Azure, this data will be commonly stored in Azure Blob Storage or

Azure Data Lake Storage Gen2 but other HDFS compatible storages can be used. Decide what will be your data storage.

Remember that the storage data files won’t be deleted automatically if the cluster is deleted.

0D

Denodo’s Embedded MPP Metadata

Denodo’s MPP Metastore has the following options available:

  1. Default Metastore: Postgresql and Hive Metastore will be installed automatically. Default option.

  1. Using an existing Metastore: With this option, Denodo won’t provision Postgresql or Hive Metastore and will connect to an existing external Metastore.
  1. AWS Glue is also supported as a external metastore although it may not be common when deploying in Azure

  1. Using Hive Metastore with a non-existing database different from Postgresql: In this scenario, Denodo will provision Hive Metastore but will use a database different from the default Postgresql. This option is used when there are policies restricting the type of RDBMS that can be installed. It is also useful when the DBAs want to keep this DB under their control: backups, maintenance, etc in addition to keeping metadata outside the cluster lifecycle

  1. Using AWS Glue as Metastore: With this option, Denodo can use AWS Glue Metastore either as a replacement of the default Metastore or in addition. While Glue is part of AWS, this configuration is still possible when deploying MPP in Azure. However, the most common scenario involves having one metastore in Azure and another in AWS Glue, enabling a multi-cloud architecture.

  1. Using multiple Metastores: with this option you will be able to connect to different metastores. This option is common in migration scenarios. For example, if planning on migrating from Athena to Denodo MPP.

Depending on the option selected you can adjust you can reduce the number of nodes to N + 1 (instead of N +2) as no node will be required for the Metastore.

Decide if you will use the default metastore or a different one. More information can be found in the Denodo Embedded MPP User Manual.

0E

Denodo VDP Considerations

If you have a cluster of Denodo servers it needs to be configured to store its metadata in an external database to take full advantage of the Denodo Embedded MPP functionalities. In case you only have one node (during a PoC for example) you need to set this property to false:

SET 'queryOptimization.parallelProcessing.denodoConnector.enableUsingSharedMetadataOnly'='false' 

In order to use the embedded MPP features you need Enterprise Plus, in other case you may get errors similar to:

“Error: EmbeddedMPPMaxProcessors are limited to X but current number is unknown”

Before configuring this feature, please validate that your license allows it executing CALL VALIDATE_MPP_LICENSE()

Section 1 - Configure AKS (Azure)

1A

Create Azure RBAC Roles

The following roles need to be created in the Azure RBAC configuration.

Create a custom role and name it “Denodo MPP Cluster Role,” adding the following permissions:

  • Microsoft.ContainerService/managedClusters/read
  • Microsoft.ContainerService/managedClusters/write
  • Microsoft.ContainerService/managedClusters/agentPools/read
  • Microsoft.ContainerService/managedClusters/agentPools/write
  • Microsoft.ContainerService/managedClusters/agentPools/delete

Create a custom role and name it “Denodo MPP Nodes Role,” adding the following permissions:

  • Microsoft.Compute/virtualMachines/read
  • Microsoft.Compute/virtualMachines/write
  • Microsoft.Compute/virtualMachines/delete
  • Microsoft.Network/networkInterfaces/read
  • Microsoft.Network/networkInterfaces/write
  • Microsoft.Network/networkInterfaces/delete
  • Microsoft.Network/networkInterfaces/join/action
  • Microsoft.Network/publicIPAddresses/read
  • Microsoft.Network/publicIPAddresses/write
  • Microsoft.Network/publicIPAddresses/delete

OPTIONAL: If you use Azure Container Registry, add this role too:

  • Microsoft.ContainerRegistry/registries/read

1B

Create the nodes and assign the roles to the AKS Cluster

Using the information from Section 0 - Planning:

  • create N+2 AKS nodes (N+1 based on section 0D)
  • Assign the “Denodo MPP Cluster Role” and “Denodo MPP Nodes Role” to the AKS cluster and nodes respectively

1C

Network Configuration

  • Create a Virtual Network and assign it to the AKS Cluster
  • Create / Add to the AKS Network Security Group
  • Inbound Rules
  • Allow TCP traffic on port 8080 from Denodo MPP HTTP
  • Allow TCP traffic on port 8443 from Denodo MPP HTTPs
  • Allow TCP traffic on port 5432 from Denodo MPP Postgresql
  • Allow TCP traffic on port 9083 from Denodo MPP Hive Metastore
  • Outbound Rules
  • Allow TCP traffic on port 9999 to Denodo VDP

1D

Autoscaling

If autoscaling is required, this guide explains the configuration steps for AKS

Section 2 - Azure Environment Requirements

2A

HDFS Storage and credentials

Using the information from section 0 - Planning:

  • Create an Azure Data Lake Storage where Denodo MPP will store/read data.
  • These are the methods to allow your AKS cluster to access ADLS (Gen2) in order of preference:

  1. Use Azure Managed Identities (Denodo deployment must be in Azure). For this you need to add the following properties to the presto/conf/catalog/core-site.xml and hive-metastore/conf/core-site.xml, before the Embedded MPP is deployed. The IP 169.254.169.254 points to Azure's Instance Metadata Service :

<property>

  <name>fs.azure.account.auth.type</name>

  <value>OAuth</value>

</property>

<property>

  <name>fs.azure.account.oauth.provider.type</name>

  <value>org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider</value>

</property>

<property>

  <name>fs.azure.account.oauth2.msi.tenant</name>

  <value>ADD_MSI_TENANT_ID</value>

</property>

<property>

  <name>fs.azure.account.oauth2.msi.endpoint</name>

  <value>http://169.254.169.254/metadata/identity/oauth2/token

</value>

</property>

<property>

  <name>fs.azure.account.oauth2.client.id</name>

  <value>ADD_CLIENT_ID</value>

</property>

 

  1. Provide the Azure OAuth2 client credentials. For this you need to add the following properties to the presto/conf/catalog/core-site.xml and hive-metastore/conf/core-site.xml files, before the Embedded MPP is deployed:

<property>

  <name>fs.azure.account.auth.type</name>

  <value>OAuth</value>

</property>

<property>

  <name>fs.azure.account.oauth.provider.type</name>

  <value>org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider</value>

</property>

<property>

  <name>fs.azure.account.oauth2.client.endpoint</name>

 <value>https://login.microsoftonline.com/<ADD_DIRECTORY_ID>/oauth2/token</value>

</property>

<property>

  <name>fs.azure.account.oauth2.client.id</name>

  <value>ADD_CLIENT_ID</value>

</property>

<property>

  <name>fs.azure.account.oauth2.client.secret</name>

  <value>ADD_SECRET</value>

</property>

  1. Provide the Azure credentials for the Shared Key authentication method to the cluster.sh script: e.g:

cluster.sh deploy --abfs-storage-account xxx --abfs-storage-key yyy --credstore-password zzz

 

Obtain your details from your preferred option as they will be used later.

2B

Denodo Platform running and connectivity

Obtain your Denodo VDP url and credentials. The Denodo VDP server needs access to the AKS Denodo MPP load balancer. The following ports represent the default values:

Network security group of the Denodo Cluster / Instance

  • Inbound Rules (if the Denodo instance already existed, these rules are probably in place already)
  • 19090 Denodo SM Web
  • 18080 Denodo SM Keycloak
  • 9999 Denodo Server
  • 3389 RDP or 22 SSH
  • Outbound
  • Source: IP/Application group of Denodo VM -> Destination: IP/Application group of the MPP -  Custom - 8443 - TCP - Allow

Section 3.1 - Container Registry - (Option 1) - Denodo Harbor

3.1A

Considerations

Denodo Harbor credentials expire every 6 months. While this option is suitable for testing and proof-of-concept (POC) purposes, please consider a private registry (see section 3.2) for production scenarios as a best practice.

3.1B

Denodo Container Registry (Harbor) Credentials and firewall access

Provide firewall access to Denodo’s Registry in Harbor https://harbor.open.denodo.com/ 

Obtain your denodo_account_username and registry profile secret that you can find at https://harbor.open.denodo.com/ 

Open the “User Profile” menu and click on “Generate Secret”. Copy and store the cli secret as it will be used later.

Section 3.2 - Container Registry - (Option 2) - ACR (skip this section if using Harbor)

3.2A

Configure Azure CLI

Configure Azure CLI using:

az login

az aks get-credentials --resource-group <Resource_Group_Name> --name <AKS_Cluster_Name>

Then download MPP images from support.denodo.com where the CLI is configured

3.1B

Login to Azure Container Registry with your Docker installation

az acr login --name <acr_name>

3.1C

Download Denodo Embedded MPP

You can find the Denodo Embedded MPP in the Denodo Connects section of the Denodo Support Site. Unzip the file once it is downloaded. It contains the images that will be uploaded to ACR.

3.1D

Upload Denodo MPP images to ACR

Follow these steps to upload all images needed to ACR:

# Option 1: If Images downloaded locally. Load postgres, hive metastore, and presto depending on what is defined in section 0D.

  docker load < prestocluster-presto-<version>.tar.gz  

  docker load < prestocluster-postgresql-<version>.tar.gz

  docker load < prestocluster-hive-metastore-<version>.tar.gz

 

# Option 2: If Using Harbor, use docker pull instead of docker load. Load postgres, hive metastore, and presto depending on what is defined in section 0D.

 

  docker pull harbor.open.denodo.com/denodo-connects-8.0/images/prestocluster-presto:<version>

docker pull harbor.open.denodo.com/denodo-connects-8.0/images/prestocluster-postgresql:<version>

docker pull harbor.open.denodo.com/denodo-connects-8.0/images/prestocluster-hive-metastore:<version>

 

# Tag postgres image and push to ACR

   docker tag prestocluster-postgresql:<version> <acr_name>.azurecr.io/prestocluster-postgresql:<version>

   docker push <acr_name>.azurecr.io/prestocluster-postgresql:<version>

# Tag hive metastore image and push to ACR

   docker tag prestocluster-hive-metastore:<version> <acr_name>.azurecr.io/prestocluster-hive-metastore:<version>

   docker push <acr_name>.azurecr.io/prestocluster-hive-metastore:<version>

# Tag presto image and push to ACR

   docker tag prestocluster-presto:<version> <acr_name>.azurecr.io/prestocluster-presto:<version>

   docker push <acr_name>.azurecr.io/prestocluster-presto:<version>

Section 4 - Cluster.sh Requirements

4A

Download Denodo Embedded MPP

You can find the Denodo Embedded MPP in the Denodo Connects section of the Denodo Support Site. Unzip the file once it is downloaded. It contains the cluster.sh script and the other dependent artifacts that will be used in the next section

4B

Use Linux or a compatible shell if in Windows

To run cluster.sh on Windows you need to have a Bash compatible shell such as Cygwin or Git Bash installed or use Windows Subsystem for Linux (WSL).

4C

Configured and authenticated kubectl command

The cluster.sh script calls kubectl so it needs to be properly configured with the correct context and with the right credentials

You can check if kubectl is correct using: kubectl get nodes

https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/ 

4C.1

Configure authentication when using Azure RBAC roles

When using Azure RBAC roles for authentication or using the ARM template, kubelogin must be installed.

You can install kubelogin using az cli with the command “az aks install-cli.

Note: If you get a ssl_certifacte error you can add it using the curl command “curl https://api.github.com/repos/Azure/kubelogin/releases/latest

After adding the Cluster context to kubectl you must configure the kubeconfig to use kubelogin using the command “kubelogin convert-kubeconfig -l azurecli”

For more installation options you can visit: https://azure.github.io/kubelogin/install.html 

4D

Install Helm V3 for Kubernetes

Installations steps can be found here https://helm.sh/docs/intro/install/ 

4E

Configure environment variable HADOOP_HOME (only if running cluster.sh from Windows)

If you are running cluster.sh on Windows you need to apply extra configuration.

Check if the environment variable HADOOP_HOME is set on this computer. Since Hadoop is required by cluster.sh to transparently manage the encryption of all user-provided credentials.

If HADOOP_HOME is not set:

  • Create a directory. For example, <DENODO_HOME>\hadoop_win_utils.
  • Create a directory named bin inside the new directory. For example, <DENODO_HOME>\hadoop_win_utils\bin.
  • Set the environment variable HADOOP_HOME to point to <DENODO_HOME>\hadoop_win_utils
  • Copy the content of the <DENODO_HOME>\dll\vdp\winutils directory to %HADOOP_HOME%\bin.

4F

Install Java and configure JAVA_HOME and PATH

An installation of Java (11 recommended) is required. The JAVA_HOME and PATH environment variables should be properly configured.

E.g.

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

export PATH="$PATH:$JAVA_HOME/bin"

4G

Import the Denodo Embedded MPP certificate

You need to import a certificate into the Denodo VDP server trust store using, for Windows:

.\jre\bin\keytool -importcert -alias presto-denodo -file denodo-presto-k8scluster-x.x-xxxxxx/certs/certificate.crt -cacerts -storepass changeit

Or if using Linux:

sudo ./jre/bin/keytool -importcert -alias presto-denodo -file /denodo-presto-k8scluster-x.x-xxxxxx/certs/certificate.crt -cacerts -storepass changeit

We provide a testing certificate inside /certs/certificate.crt that is meant FOR TESTING PURPOSES ONLY. This certificate accepts presto-denodo as the Denodo Embedded MPP hostname.

We recommend that you use a certificate issued by a CA in production. Follow the documentation for these steps.

 Section 5 - Configure prestocluster/values.yaml for the Embedded MPP configuration

5A

Add the repository credentials (Denodo’s Harbor or Azure registry)

If using Denodo Harbor, add the username and password

image.repository: "harbor.open.denodo.com/denodo-connects-8.0/images"

pullcredentials.enabled=true

pullcredentials.name="denodo-mpp-registry-secret"

pullcredentials.registry="harbor.open.denodo.com"

pullcredentials.username=

pullcredentials.pwd=

If using Azure Container Registry (leave pullSecret empty):

image.repository:  “<account_id>.dkr.ecr.<ecr_region>.amazonAzure.com”

image.pullSecret:  “”

5B

Connection Details to the Denodo Server

Using the information from Section 2C, configure your Denodo Server parameters in the denodoConnector section:

  • denodoConnector.Server: url of the Denodo Server
  • denodoConnector.SSL parameters (if enabled in your Denodo Server)

This step will create a denodoConnector.user and denodoConnector.password. This user will be used by Denodo MPP to connect to Denodo.

The credentials used to create that user are prompted when running cluster.sh (deploy) register

5C

Configure your number of workers, cpu and memory that each Denodo MPP Worker will use

The following parameters allow you to configure the number of presto workers and the CPU and memory resources used by each, in your AKS cluster.

 # -- Number of Presto workers in the cluster

  numWorkers: 4

  # -- Number of cores assigned to each worker

  cpusPerNode: 16

  # -- Total memory, in GB, assigned to each worker

  memoryPerNode: 128

As N+2 is the nodes recommendation, presto.numWorkers should be your total number of nodes in AKS - 2. E.g. if you have 3 nodes total, your presto.numWorkers should be 1

5D

Configure the Service Section (if not using the default Load Balancer)

If you want to configure a different Service option than LoadBalancer, you can use the service.type value of values.yaml

 service:

    # -- Service type: ClusterIP, NodePort or LoadBalancer

    type: LoadBalancer

5E

Configure an external Metastore (if not using the default Hive Metastore - Postgresql)

If using the default Hive - Postgres option check section 5.1 for recommendations and skip section 5.2.

If using an external Metastore, skip section 5.1 and check “5.2 Configure an external Metastore” instead.

5F

Adjust your components memory properties

It is important to adjust memory settings for query performance, finding a balance between maximum memory per query and the maximum number of concurrent queries that can be run in the Denodo Embedded MPP.

You can configure the memory settings in the files prestocluster/presto/conf/config.properties.coordinator and prestocluster/presto/conf/config.properties.worker.

Find additionals details in section “Memory” here

5G

Configure an Internal Load Balancer

If you want to configure an internal load balancer for your AKS cluster you will have to add an specific annotation to your values.yaml

Remember that you will have to ensure connectivity from the internal ELB network using NAT Gateways for private networks or Private Links for isolated networks

presto:

 Service:

    …

    annotations:

      service.beta.kubernetes.io/azure-load-balancer-internal: "true"

    …

Section 5.1 (Optional) - Default Metastore

5.1A

How to maintain the metadata after redeploys

Follow these steps if you want to maintain the metadata available in the Metastore after a redeploy:

  1. Edit the values.yaml file
  2. Remove comment from postgresql.pvClaim.annotations "helm.sh/resource-policy": keep
  3. Include fsGroupChangePolicy: OnRootMismatch to the postgresql.securityContext element. It should look like this:

pvClaim:

annotations:

"helm.sh/resource-policy": keep

...

securityContext:

# -- Force to run as a non-root user to ensure the least privilege

runAsNonRoot: true

# -- User ID for the container. Ignored on OpenShift.

runAsUser: 1001

# -- Group ID for the pod volumes. Ignored on OpenShift

fsGroup: 1001

fsGroupChangePolicy: OnRootMismatch

Note: For deployments across multiple Availability Zones (AZ), it's crucial to ensure that the PostgreSQL pod is always rescheduled within the same AZ as its associated Persistent Volume (PV). To achieve this, you should verify the allowedTopologies for the StorageClass and the AZ affinity settings for PostgreSQL (found in the prestocluster values.yaml file). In the example below (in bold) you can see how the pod is always assigned to the same AZ:

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

name: standard

provisioner: kubernetes.io/example

parameters:

type: pd-standard

volumeBindingMode: WaitForFirstConsumer

allowedTopologies:

- matchLabelExpressions:

- key: topology.kubernetes.io/zone

Values:

- eastus2-1

Section 5.2 (Optional) - Configure an external Metastore

5.2A

Configuring an existing Hive Metastore

With this option, Denodo won’t provision Postgresql or Hive Metastore and will connect to an existing external Metastore.

  • Change these parameter in prestocluster/values.yaml to false:
  • postgresql.enabled: false
  • metastore.enabled: false
  • If you are using Glue as a replacement of the default Metastore, move to section 5.2C
  • Edit the metastore.service.name and metastore.service.port values with the connection parameters of the external Hive Metastore
  • Create a new catalog, e.g: hivelegacy.properties by copying the right file depending on the files/tables that we want to read:
  • copy the hive.properties file to read Parquet files
  • copy the iceberg.properties file to read Iceberg tables
  • copy the delta.properties file to read Delta tables
  • Add the external Metastore files, hdfs-site.xml and core-site.xml to prestocluster/presto/conf/catalog 
  • Add this property to the hivelegacy.property file just created: hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml,/opt/presto-server/etc/catalog/hdfs-site.xml

If kerberos is required you can follow these steps or configure the steps in section “5.2B Configuring Hive Metastore with a existing metastore database”

5.2B

Configuring Hive Metastore with a non-existing database different from Postgresql

In this scenario, Denodo will provision Hive Metastore but will use a database different from the default Postgresql. This option is used when there are policies restricting the type of RDBMS that can be installed or if there are issues with step 2.B.

  • Change this parameter in prestocluster/values.yaml to false postgresql.enabled=false 
  • Edit the following values with the existing database connection parameters in prestocluster/values.yaml too
  • metastore.connectionUrl
  • metastore.connectionDriverName
  • metastore.ConnectionUser
  • metastore.ConnectionPassword
  • Load the initialization script located prestocluster/hive-metastore/scripts for your DBMS, MySQL, Oracle, PostgreSQL or SQL Server

5.2C

Configuring AWS Glue as Metastore

In case that you already have a AWS Glue Data Catalog containing table definitions you want to access from the Denodo Embedded MPP, you could use the AWS Glue Data Catalog as an external Metastore either as a replacement of the default Metastore or in addition even when deploying MPP in Azure. e.g. the most common scenario will be having one metastore in Azure and one in AWS allowing a multi-cloud architecture.

  • Follow the steps in section “AWS Glue Data Catalog” here
  • Note: if there is an error with the introspection of base views “java.lang.NullPointerException: StorageDescriptor SerDeInfo is null”, you should upgrade to version 8.0.20231122 of the MPP or later

5.2D

Configuring multiple Metastores

With this option you will be able to connect to different Metastores. This option is common in migration scenarios. For example, if planning on migrating from Athena to Denodo MPP.

More information can be found in the Denodo Embedded MPP User Manual.

 Section 6 - Deploying the server

6A

Execute cluster.sh deploy

This command will deploy the Denodo Embedded MPP cluster. The syntax for this command will depend on the HDFS storage used and additional details can be found in the Deployment section here.

If you are following the steps in this checklist, you just need to use a value for --credstore-password. The credstore-password will be used to protect the keystore

cluster.sh deploy --credstore-password replace_with_a_password 

During the execution of the script, you will be able to specify the passwords for the Metastore and Denodo MPP. You can also leave them by default by clicking “return”:

  • Default Password for the Metastore: hive
  • Default Password for Presto: pr3st%

6B

Check that the pods are in “Running” status

Execute the following command to check the status of each pod:

Kubectl get pods

The following pods should have STATUS=Running:

  1. Hive-metastore-xxxx
  2. Postgresql-xxxx
  3. Presto-coodinator-xxxxx
  4. As many Presto-worker as configured in values.yaml

The pods use a Readiness Probe and Liveness Probe. Both probes are used to control the health of the cluster. If the Liveness probe fails, the container will be restarted, while if the Readiness probe fails, the container will stop serving traffic.

If some of the pods didn’t start correctly, please check the “Kubernetes debugging commands” section at the end of this document.

6C

Denodo VDP Hosts File (When using the Self Service Certificate)

When using the Self Service Certificate as described previously, the name of the Denodo MPP Server needs to be presto-denodo. In order to do that, we need to edit the hosts file (e.g. in Linux /etc/hosts or in Windows WIN/system32/drivers/etc/hosts of the server where Denodo is installed. Add the following line:

Denodo_MPP_IP presto-denodo

To obtain the value for Denodo_MPP_IP you can execute

kubectl get svc 

and obtain the external url for the LoadBalancer (This depends on step 0B)

 

6D

Register the MPP Server in the Denodo VDP Server

Once all the pods are running, we need to create a datasource in our Denodo VDP Server pointing to Denodo MPP. To do that execute:

cluster.sh register

To run cluster.sh on Windows you need to have a Bash compatible shell such as Cygwin or Git Bash installed or use Windows Subsystem for Linux (WSL).

6E

Check that there is connectivity between Denodo VDP and Denodo MPP

In Denodo, there should be a new vdb (by default admin_denodo_mpp) and a mpp datasource (embedded_mpp). There are three things that can be tested:

  • Open the embedded_mpp datasource and click on “Test Connection” to validate the connectivity
  • please validate that your license allows his feature executing CALL VALIDATE_MPP_LICENSE()

 Section 7 - First steps after deployment

7A

Configure section “Use bulk data load APIs”

The information in this section will be used when inserting data in your Denodo MPP Cluster. Once the section is configured click on “Test Bulk Load” and validate that there are no errors

7B

Configure Denodo Hadoop properties

If the instructions on step 4C.1 were followed then you have to add the following hadoop properties on the embedded_mpp data source and the bulk data load APIs:

  • Name: fs.azure.account.auth.type

Value: OAuth

  • Name: fs.azure.account.oauth.provider.type

Value: org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider

  • Name: fs.azure.account.oauth2.msi.tenant

Value: <MANAGED_IDENTITY_TENANT_ID>

  • Name: fs.azure.account.oauth2.msi.endpoint

Value: http://169.254.169.254/metadata/identity/oauth2/token

  • Name: fs.azure.account.oauth2.client.id

Value: <MANAGED_IDENTITY_CLIENT_ID>

7C

Create a base view for a parquet file

Click on “Create Base View” to validate that you can introspect the parquet files (if any) available in your data storage routes

 

Kubernetes debugging commands

The following commands are useful in case something fails:

  • kubectl describe pod <pod name> (pod status information)

  • kubectl logs <pod name> (pod logs)

  • kubectl logs <pod name> --previous (logs of the last run of the pod before it crashed. Useful in case you want to figure out why the pod crashed in the first place.)

  • kubectl exec -it <pod name> -- bash (interactive shell access to the pods)

  • Interactive shell access to crashing pods, e.g. hive metastore:
  • Edit prestocluster/templates/hive-metastore-template.yaml
  • Replace:

             command: ["/opt/run-hive-metastore.sh"]

             by

             command: [ "sleep" ]
             args: [ "infinity" ]

  • Redeploy the MPP
  • Enter the hive metastore container to continue your debugging:
  • kubectl exec -it <pod name> -- bash
  • Since 8.0.20240306 you have to disable the livenessProbe in the values.yaml file, otherwise the livenessProbe will kill the pod after a few seconds
  • E.g. for the metastore: metastore.livenessreadiness.enabled: false

  • Change Denodo MPP configuration without redeploying (since 8.0.20240306):

execute helm upgrade prestocluster prestocluster/

Disclaimer
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.

Questions

Ask a question

You must sign in to ask a question. If you do not have an account, you can register here