Configuring a Denodo Cluster with HAProxy Load Balancer

Applies to: Denodo 7.0
Last modified on: 05 Apr 2019
Tags: Administration Cluster configuration

Download document

Introduction

This document aims at explaining how to setup an active-active Denodo Cluster with 2 nodes behind the HAProxy load balancer. While the configuration is specific to the chosen load balancer, the concepts should be general enough to serve as guidelines for other load balancers configuration.

Software Versions

Denodo Platform (Administration Tool, VDP Server, Solution Manager)

7.0 20190312

HAProxy

HA-Proxy version 1.8.8-1ubuntu0.4 2019/01/24

Cluster Architecture

The architecture is shown in the following diagram:

Client applications (on the left) access the Denodo servers (node01.ubuntu1804-denodo7 and node02.ubuntu1804-denodo7) through the load balancer machine (ubuntu1804-loadbalancer). The two Denodo nodes are referenced in the Solution Manager (ubuntu1804-den7-sol-man on the right) with which have normal interactions (license check, monitoring, promotion, …). The bidirectional Solution Manager -> nodes traffic bypasses the HAProxy Load Balancer.

All the nodes run on a Ubuntu18.04 operating system.

Introduction to HAProxy

From Wikipedia:

HAProxy is free, open source software that provides a high availability load balancer and proxy server for TCP and HTTP-based applications that spreads requests across multiple servers.[1] It is written in C[2] and has a reputation for being fast and efficient (in terms of processor and memory usage).

Setup of haproxy in the Load Balancer Machine

Install the package for example via apt (you may want to install it from source for the latest version):

ubuntu@ubuntu1804-loadbalancer:~$ sudo apt install haproxy

The main configuration file is /etc/haproxy/haproxy.cfg and the installation creates a systemd service:

ubuntu@ubuntu1804-loadbalancer:~$ systemctl status haproxy

● haproxy.service - HAProxy Load Balancer

   Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor preset: enabled)

   Active: active (running) since Wed 2019-03-27 07:58:53 UTC; 3h 34min ago

         Docs: man:haproxy(1)

               file:/usr/share/doc/haproxy/configuration.txt.gz

  Process: 926 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS (code=exited, status=0/SUCCESS)

 Main PID: 941 (haproxy)

        Tasks: 2 (limit: 1110)

   CGroup: /system.slice/haproxy.service

               ├─941 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

               └─945 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

Mar 27 07:58:53 ubuntu1804-loadbalancer systemd[1]: Starting HAProxy Load Balancer...

Mar 27 07:58:53 ubuntu1804-loadbalancer haproxy[941]: [WARNING] 085/075853 (941) : parsing [/etc/haproxy/haproxy.cfg:14] : 'option httplog' not usable with frontend 'tcp-in-scheduler' (needs 'm

Mar 27 07:58:53 ubuntu1804-loadbalancer haproxy[941]: [WARNING] 085/075853 (941) : config : 'option forwardfor' ignored for frontend 'tcp-in-scheduler' as it requires HTTP mode.

Mar 27 07:58:53 ubuntu1804-loadbalancer systemd[1]: Started HAProxy Load Balancer.

Every time you modify the configuration you have to restart the service in order to get it taken into account. Before restarting you may want to check that the configuration file is valid

ubuntu@ubuntu1804-loadbalancer:~$ haproxy -f /etc/haproxy/haproxy.cfg -c

Configuration file is valid

Configuration File

Let’s have a look at the configuration file

global

            log 127.0.0.1 local2 info

            chroot /var/lib/haproxy

            pidfile /var/run/haproxy.pid

            maxconn 256

            user haproxy

            group haproxy

            daemon

defaults

            mode                http

            log                 global

            option              httplog

            timeout             connect 10s

            timeout             client 30s

            timeout             server 30s

## http/9090 => Proxy for Tomcat Based Web Applications (Data Catalog, RESTful Web Services, Scheduler Web Admin Panel, ...)

frontend http-in

            bind                        *:9090

            default_backend             backend_servers_tomcat_http

            option                      forwardfor

backend backend_servers_tomcat_http

            balance             roundrobin

            cookie SRVNAME  insert

            server              node01 node01.ubuntu1804-denodo7:9090 cookie ck_nd01 check

            server              node02 node02.ubuntu1804-denodo7:9090 cookie ck_nd02 check

## tcp/8000 => Proxy for Scheduler Server

frontend tcp-in-scheduler

            mode                        tcp

            bind                        *:8000

            default_backend             backend_servers_tcp_in_sched

            option                      tcplog

backend backend_servers_tcp_in_sched

            mode                tcp

            balance             roundrobin

            server              node01 node01.ubuntu1804-denodo7:8000 check

            server              node02 node02.ubuntu1804-denodo7:8000 check

## tcp/9999 => Proxy for JDBC traffic

frontend jdbc-in

            mode                        tcp

            bind                        *:9999

            default_backend             backend_jdbc_servers

            option                      tcplog

backend backend_jdbc_servers

            mode                tcp

            balance             roundrobin

            server              node01 node01.ubuntu1804-denodo7:9999 check

            server              node02 node02.ubuntu1804-denodo7:9999 check

  • The global section instructs about process-wide parameters, we are for example telling haproxy to use the rsyslogd daemon for logging and specifying user and group of the process
  • The defaults section lists some default parameter values
  • Finally comes the proxies section, one for each open port we want to make available through the load balancer. Each proxy consists in a frontend section and backend section.
  • The frontend section has to specify
  • the mode (tcp, http, …)
  • the port (ex. *:9999)
  • the default_backend
  • And any additional option
  • The backend section has to specify the list of servers in the backend

server <name> <address>[:[port]] [param*]

Every backend section name should match one and only one default_backend name in the configuration. Address can be a hostname (defined in /etc/hosts) but it is resolved at start time.

HTTP Session Stickiness

When behind the load balancer we put web applications that serve different content based on logged-in users, we must find a way to avoid same-session switching between nodes. One of the methods to deal with this is the Cookie Insert Method by which the HAProxy inserts a Set-Cookie: header that is included in every following request from the same client.  

You enable this by adding the cookie keyword followed by a unique cookie name like in the following example:

server node01 node01.ubuntu1804-denodo7:9090 cookie ck_nd01 check

In this case we are associating to each newly routed connection to node01 the cookie ck_nd01. The next time the same client sends a request to the load balancer it will include the cookie and the load balancer will force the routing to node01.

Configuration of Runtime API

In a later section we will see how to setup Deployment Scripts in the Solution Manager. To do that we must allow runtime modifications to the HAProxy configuration, for example to cut or enable access to a given backend server.

It turns out that we can do that by using the HAProxy Runtime API, there is a link in the reference section.

For that to work we just need to add the following lines in the global section of the configuration file:

stats socket ipv4@127.0.0.1:9999 level admin

stats socket /var/run/haproxy.sock mode 666 level admin

stats timeout 2m

Setup in the Solution Manager

In the Solution Manager you should define a cluster (in my case cluster01) in an environment (in my case (env01). Then you define in the cluster the Scheduler and VDP servers of each node.

As an example, here is the definition of node01-vdpserver

Deployment Scripts

In the Solution Manager we can define scripts that disable the servers in a cluster during a promotion and bring them back to life once the promotion finished. The idea of these scripts is to communicate to the load balancer to temporarily cut communication to the backend node being updated.

To do that we select the environment, then Deployment Scripts. We can either cut access at the cluster and node level, in our example we chose the latter

NOTE: you must install the sshpass package in the machine hosting the Solution Manager Server to get the script working

Here is the script to be uploaded:

#!/bin/bash

#####################

## This script allows enabling or disabling a node on a HAProxy based load balancer

#####################

nodename=$1

status=$2

username=$3

password=$4

loadbalancer_host=$5

set_instruction=`echo \""set server backend_jdbc_servers/${nodename} state ${status}\""`

echo "echo ${set_instruction} | socat stdio /var/run/haproxy.sock" | sshpass -p${password} ssh ${username}@${loadbalancer_host} 'bash -'

In this script we are telling HAProxy via a remote SSH command (sshpass) that we want to change the status of backend_jdbc_servers/${nodename} to ${status}. As such this script can be used both for enabling (state: ready) and disabling (state: maint) the relevant node.

Parameter passing is done by graphically defining parameters. In our example there are five of them:

param_number

name

type

value

0

nodename

Load balancing variable

nodename

1

state

literal

ready

2

username

literal

ubuntu

3

password

literal

**********

4

loadbalancer_host

literal

ubuntu1804-denodo7

Parameters can be of type literal, where you specify at creation time the value, or load balancing variable, that are defined at the cluster level and can be handy when a parameter corresponds to a node (ex. Nodename and username)

Testing of Deployment Scripts

You can test that your deployments scripts work as expected by deploying a revision.

Just have a look at the load balancer logs (/var/log/haproxy.log)

Apr  3 14:07:20 localhost.localdomain haproxy[4503]: Server backend_jdbc_servers/node01 is going DOWN for maintenance. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

Apr  3 14:07:21 localhost.localdomain haproxy[4503]: Server backend_jdbc_servers/node01 is UP/READY (leaving forced maintenance).

Apr  3 14:07:22 localhost.localdomain haproxy[4503]: Server backend_jdbc_servers/node02 is going DOWN for maintenance. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

Apr  3 14:07:24 localhost.localdomain haproxy[4503]: Server backend_jdbc_servers/node02 is UP/READY (leaving forced maintenance).

From which we clearly see that the node access has been cut off and re established one after another.

We can also have a look at the deployment summary in the Solution Manager Administration Tool that confirms that VQL deployments have been embedded among SCRIPT_SERVER tasks that correspond to script executions in the load balancer

Setup in Denodo nodes

For each node in your cluster, apply the following modifications.

Change the hostname

ubuntu@node01:~$ sudo hostnamectl set-hostname node01.ubuntu1804-denodo7

ubuntu@node01:~$ hostname

node01.ubuntu1804-denodo7

Change the RMI hostname in the Denodo configuration files:

ubuntu@node01:/opt/denodo/7.0$ grep -R node01 $DENODO_HOME/conf

/opt/denodo/7.0/conf/scheduler/ConfigurationParameters.properties:Launcher/registryURL=node01.ubuntu1804-denodo7

/opt/denodo/7.0/conf/vdp/VDBConfiguration.properties:com.denodo.vdb.vdbinterface.server.VDBManagerImpl.registryURL=node01.ubuntu1804-denodo7

Check that the license is checked via the Solution Manager:

conf/SolutionManager.properties:com.denodo.license.host=ubuntu1804-den7-sol-man

Regenerate the shell files and restart the servers (note that if you did not define the node in the Solution Manager, the node will refuse to start)

ubuntu@node01:/opt/denodo/7.0$ $DENODO_HOME/bin/regenerateFiles.sh

ubuntu@node01:/opt/denodo/7.0$ $DENODO_HOME/bin/vqlserver_shutdown.sh

ubuntu@node01:/opt/denodo/7.0$ $DENODO_HOME/bin/vqlserver_startup.sh

ubuntu@node01:/opt/denodo/7.0$ $DENODO_HOME/bin/scheduler_shutdown.sh

ubuntu@node01:/opt/denodo/7.0$ $DENODO_HOME/bin/scheduler_startup.sh

Testing

Web based applications:

  • Login and do some typical actions (search, query, categorize, administer..) then logout
  • Login and do some typical actions (explore jobs, create a new one, delete one job, edit one job, …) then logout
  • Login and do some typical actions (navigate views, search with url parameters, …) then logout
  • JDBC Client (ex. DBeaver 6.0.0)

References

HAProxy in Wikipedia

HAProxy Home page

Dynamic Configuration of HAProxy via API

HTTP Session stickiness

HAProxy tutorial

Configuring Deployment Scripts in the Solution Manager

Questions

Ask a question
You must sign in to ask a question. If you do not have an account, you can register here

Featured content

DENODO TRAINING

Ready for more? Great! We offer a comprehensive set of training courses, taught by our technical instructors in small, private groups for getting a full, in-depth guided training in the usage of the Denodo Platform. Check out our training courses.

Training