You can translate the document:

Overview

The Denodo Embedded MPP is a customized version of Presto optimized to behave as Denodo’s Massive Parallel Processing (MPP) engine. Denodo can leverage the processing capabilities of the engine by sending queries to it in situations where doing so can accelerate performance. Therefore it’s important to have an Embedded MPP that is appropriately sized to effectively support times of peak load for your Denodo environment.

The Embedded MPP includes configuration options that allow it to be deployed in autoscaling mode. This can be useful as it can allow the Embedded MPP to handle peak load while also automatically downsizing and saving costs during off-peak hours.

The following article discusses the steps to deploy the Denodo Embedded MPP in autoscaling mode in Amazon EKS. This covers both the steps for the Embedded MPP to automatically upsize and downsize the number of worker pods based on load, as well as the steps for EKS to automatically adjust the number of EC2 instances in response to the number of worker pods. In this way you can ensure the cluster can support high load while minimizing cost.

Prior to following the steps in this article, we would recommend reviewing the Sizing Recommendations for the Embedded MPP section of the Virtual DataPort Administration Guide to determine the ideal minimum and maximum sizes for your autoscaling cluster.

Prerequisites

If running on Windows:

  • Kubectl.
  • Docker.
  • Cygwin or Git bash to run the cluster.sh script.
  • Install the Window Subsystem for Linux with the command "wsl --upgrade" in PowerShell.
  • Rancher with dockerd (moby) as the container engine. Rancher as a desktop distribution of Docker for Windows. It already includes Kubectl so no need to install separately
  • Check if the environment variable HADOOP_HOME is defined on the computer. To see the list of environment variables, open a command line and execute SET. If HADOOP_HOME is already defined, copy the content of the directory <DENODO_HOME>\dll\vdp\winutils to %HADOOP_HOME%\bin.

            If HADOOP_HOME is undefined:

  1. Create a directory. For example, <DENODO_HOME>\hadoop_win_utils.
  1. Create a directory called bin within the new directory. For example, <DENODO_HOME>\hadoop_win_utils\bin.
  1. Define the environment variable HADOOP_HOME to point to <DENODO_HOME>\hadoop_win_utils.
  1. Copy the content of the directory <DENODO_HOME>\dll\vdp\winutils to %HADOOP_HOME%\bin.

Steps

Configure your access to AWS

  • If not already present, configure a config file in your .aws folder:

[profile mpp]

  region = us-east-1

  output = json

  Role_arn = <your_role_arn>

  Source_profile = mpp

  • If not already present, configure a credentials file in your .aws folder:

[mpp]

  aws_access_key_id = <your_key>

  aws_secret_access_key = <your_access_key>

Your role ARN is obtainable from the AWS console under “IAM > Roles > your_role”. Your AWS AWS secret access key and access key id can be obtained from “IAM > Users > your_user > Security Credentials > Access Keys > Create Access Key”.

Create the cluster and node group

  1. Create a cluster.yaml file to define your EKS cluster in AWS. Beside the cluster name, you can configure the instanceType, minSize and maxSize according to your needs. You will be able to change the min and max size later, but the instance type cannot be changed once created:

apiVersion: eksctl.io/v1alpha5

kind: ClusterConfig

metadata:

  name: <autoscaling-example>

  region: us-east-1

  version: '1.27'

managedNodeGroups:

  - name: m58x

    iam:

      withAddonPolicies:

        autoScaler: true

    instanceType: m5.8xlarge

    minSize: 3

    maxSize: 10

    privateNetworking: true

    propagateASGTags: true

    ssh:

      allow: false

 

  1. Create the cluster with the following command:

# eksctl create cluster -f cluster.yaml

  1. Retrieve your cluster's OIDC provider ID and store it in a variable:

# oidc_id=$(aws eks describe-cluster --name <autoscaling-example> --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f 5)

  1. Determine whether an IAM OIDC provider with your cluster's ID is already in your account:

# aws iam list-open-id-connect-providers | grep $oidc_id | cut -d "/" -f4

If output is returned, then you already have an IAM OIDC provider for your cluster and you can skip the next step.

If no output is returned, then you must create an IAM OIDC provider for your cluster with the following command. Replace <autoscaling-example> with your own value:

# eksctl utils associate-iam-oidc-provider --cluster <autoscaling-example> --approve

Create an IAM policy and role

  1. Save the following contents to a file that's named cluster-autoscaler-policy.json and change the <autoscaling-example> value.

{

        "Version": "2012-10-17",

        "Statement": [

            {

                "Sid": "VisualEditor0",

                "Effect": "Allow",

                "Action": [

                    "autoscaling:SetDesiredCapacity",

                    "autoscaling:TerminateInstanceInAutoScalingGroup"

                ],

                "Resource": "*",

                "Condition": {

                    "StringEquals": {

                        "aws:ResourceTag/k8s.io/cluster-autoscaler/<autoscaling-example>": "owned"

                    }

                }

            },

            {

                "Sid": "VisualEditor1",

                "Effect": "Allow",

                "Action": [

                    "autoscaling:DescribeAutoScalingInstances",

                    "autoscaling:DescribeAutoScalingGroups",

                    "ec2:DescribeLaunchTemplateVersions",

                    "autoscaling:DescribeTags",

                    "autoscaling:DescribeLaunchConfigurations",

                    "ec2:DescribeInstanceTypes"

                ],

                "Resource": "*"

            }

        ]

}

  1. Create the policy with the following command. You can change the value for policy-name.

# aws iam create-policy \

        --policy-name AmazonEKSClusterAutoscalerPolicy \

        --policy-document file://cluster-autoscaler-policy.json

Take note of the Amazon Resource Name (ARN) that's returned in the output. You need to use it in a later step.

  1. Create an IAM role and attach an IAM policy to it. Change the <autoscaling-example> value and use the previously obtained policy arn.

# eksctl create iamserviceaccount \

  --cluster=<autoscaling-example> \

  --namespace=kube-system \

  --name=cluster-autoscaler \    

--attach-policy-arn=arn:aws:iam::<ACCOUNT_ID>:policy/AmazonEKSClusterAutoscalerPolicy \

  --override-existing-serviceaccounts \

  --approve

  1. Detach the IAM policy that eksctl created and attached to the Amazon EKS node IAM role that eksctl created for your node groups. The role name has the format eksctl-<autoscaling-example>-nodegr-XX. Inside it, remove the policy with the name format eksctl-<autoscaling-example>-nodegroup-XXX-PolicyAutoScaling.

Deploy the Cluster Autoscaler

  1. Download the Cluster Autoscaler YAML file.

# curl -O https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

  1. Modify the YAML file and replace <YOUR CLUSTER NAME> with your cluster name. Also consider replacing the cpu and memory values as determined by your environment.

  1. Apply the YAML file to your cluster.

# kubectl apply -f cluster-autoscaler-autodiscover.yaml

  1. Annotate the cluster-autoscaler service account with the ARN of the IAM role that you created previously. Replace the example values with your own values. The role name has the format eksctl-<autoscaling-example>-addon-iamservicea-Role1-XXX

# kubectl annotate serviceaccount cluster-autoscaler \

  -n kube-system \

  eks.amazonaws.com/role-arn=arn:aws:iam::<ACCOUNT_ID>:role/YYYYYY

  1. Patch the deployment to add the cluster-autoscaler.kubernetes.io/safe-to-evict annotation to the Cluster Autoscaler pods with the following command.

# kubectl patch deployment cluster-autoscaler \

  -n kube-system \

  -p '{"spec":{"template":{"metadata":{"annotations":{"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"}}}}}'

  1. Edit the Cluster Autoscaler deployment and add the following options:

  • --balance-similar-node-groups
  • --skip-nodes-with-system-pods=false

To do that, execute the following command:

# kubectl -n kube-system edit deployment.apps/cluster-autoscaler

This will open the deployed cluster-autoscaler-autodiscover.yaml locally so you can edit it. Once edited and saved, kubectl will upload it back to the cluster. You can make the edits in the following section:

        spec:

          containers:

          - command

        . . .

     -       --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<autoscaling-example>

            - --balance-similar-node-groups

            - --skip-nodes-with-system-pods=false

  1. Apply the Kubernetes Metrics Server to your cluster:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Install the Amazon EBS CSI driver Amazon EKS add-on

  1. If you have pods using Amazon EBS volumes in your cluster, you must install the Amazon EBS CSI driver to use your cluster with Kubernetes versions >= 1.23. First, create a service role for the driver. Replace <autoscaling-example> with your own value:

# eksctl create iamserviceaccount \

    --name ebs-csi-controller-sa \

    --namespace kube-system \

    --cluster <autoscaling-example> \

    --role-name AmazonEKS_EBS_CSI_DriverRole \

    --role-only \

    --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \

    --approve

  1. Use the service role you just created to deploy the driver to your cluster. Once again Replace <autoscaling-example> and <ACCOUNT_ID> with your own values:

# eksctl create addon --name aws-ebs-csi-driver --cluster <autoscaling-example> --service-account-role-arn arn:aws:iam::<ACCOUNT_ID>:role/AmazonEKS_EBS_CSI_DriverRole

Enable the autoscaler in the Embedded MPP configuration

  1. Edit your <denodo_mpp_home>/prestocluster/values.yaml file and make the following adjustments to enable the Kubernetes autoscaler:

Presto.autoscaling.enabled: true

Make sure that Presto.autoscaling.maxReplicas is within the range that you have configured in the EKS autoscaler, otherwise you can deploy multiple worker pods in the same EC2 machine, which will potentially lead to crashes during execution. By default, targetCPUUtilizationPercentage is set at 80% meaning a new node pod will be created when CPU utilization reaches 80%. This percentage typically works well but could be tuned to be more or less aggressive.

  1. Configure presto.worker.resources section as shown below. Consider the value you want to set for your presto.worker.resources.requests.cpu. This represents the amount of CPU reserved for the pod. Deploying in EKS, We recommend setting the value to be 80% of the total number of CPUs of an EC2 instance as a starting point. This helps ensure enough room is leftover on the EC2 instance for other tasks it may need to perform. So for 32 core instances you can set the value to 25.6 or 25600m, each of which represents 25.6 cores:

This can then be fine tuned as needed. The autoscaler will use this value along with the targetCPUUtilizationPercentage in the image above to determine if a new worker is needed.

  1. You can now start your cluster as normal with:

./cluster.sh deploy

Validate your deployment

After deploying your cluster, you can check to ensure your Horizontal Pod Autoscaler (HPA) is correctly tracking the CPU usage of your cluster with:

kubectl get hpa

Above we can see the HPA is configured for a minimum of 2 pods and a maximum of 32 and will scale up once the average CPU usage reaches And is currently tracking an average CPU usage of 3%. It may take a few minutes after your cluster is deployed before the HPA begins tracking.

As load is applied and CPU usage on the cluster reaches the threshold, the HPA will automatically add more nodes. These will then appear as pending while the await sufficient hardware:

The Cluster Autoscaler will then add additional instances to the cluster allowing the pods to deploy:

Now as Denodo routes queries to your MPP engine, your cluster will be able to upsize and downsize automatically in order to adjust to the level of demand.

References

Embedded Parallel Processing — Virtual DataPort Administration Guide

Denodo Presto Cluster on Kubernetes - User Manual

Sizing recommendations for the Embedded MPP — Virtual DataPort Administration Guide

Autoscaling - Amazon EKS

Creating an Amazon EKS cluster

Horizontal Pod Autoscaling | Kubernetes

Kubernetes Metrics Server

Amazon EBS CSI driver - Amazon EKS

Questions

Ask a question

You must sign in to ask a question. If you do not have an account, you can register here