Applies to:
Denodo 8.0
,
Denodo 7.0
Last modified on: 18 Aug 2020
Tags:
Administration
Cluster configuration
This document aims at explaining how to set up an active-active Denodo Cluster with 2 nodes behind the HAProxy load balancer. While the configuration is specific to the chosen load balancer, the concepts should be general enough to serve as guidelines for other load balancers configuration.
In detail, we are going to cover the following topics:
Denodo Platform |
7.0 20200205 |
HAProxy |
HA-Proxy version 1.8.8-1ubuntu0.9 2019/12/02 |
Operating System |
Ubuntu Server 18.04 on all the machines |
Denodo Platform |
8.0 |
HAProxy |
HA-Proxy version 2.0.13-2 2020/04/01 - https://haproxy.org/ (Denodo 8) |
Operating System |
Ubuntu Server 20.04 on all the Machines |
The architecture discussed in this article is shown in the following diagram:
Figure 1: General Architecture with 2 Denodo Platform active nodes and a load balancer
Client applications (on the left) access the Denodo platform nodes (node01.denodo8 and node02.denodo8) through the load balancer machine (loadbalancer). The two Denodo nodes are referenced in the Solution Manager (sm.denodo8 on the right) with which they have normal interactions (license check, monitoring, promotion, …). The bidirectional Solution Manager -> nodes traffic bypasses the HAProxy Load Balancer.
External (from client applications) and internal (between Denodo servers) communication is secured via SSL/TLS encryption. For Denodo 8, this is not just a typical recommendation for production deployments, but a requirement to be able to perform promotions in the Solution Manager. Indeed, starting from this version, to enable promotions the Denodo Security Token must be enabled in the involved servers and the Denodo Security Token in turn requires SSL/TLS.
In the following table, we can see all the components of the Denodo Platform that can be load balanced:
Component |
Protocol |
Default Port |
Comments |
Virtual DataPort - JDBC |
TCP |
9999 |
|
Virtual DataPort - JDBC (RMI Factory Port) |
TCP |
9997 |
Denodo 7 only |
Virtual DataPort - ODBC |
TCP |
9996 |
|
Scheduler |
TCP |
8000 |
|
Scheduler Index |
TCP |
9000 |
Not treated in this article |
Data Catalog |
HTTP |
9090 (SSL: 9443) |
|
Scheduler Web Admin Tool |
HTTP |
9090 (SSL: 9443) |
|
RESTful Web Service |
HTTP |
9090 (SSL: 9443) |
|
REST Web Services |
HTTP |
9090 (SSL: 9443) |
|
Diagnostic & Monitoring Tool |
HTTP |
9090 (SSL: 9443) |
Components of the Denodo Platform that can be load balanced
This article does not address load balancing for the ITPilot modules (ITPilot Browser Pool, ITPilot Verification Server, ITPilot PDF Conversion Server) and the Scheduler Index Server.
The load balancer is based on HAProxy, a free and open source software that provides a high availability load balancer and proxy server for TCP and HTTP-based applications that spreads requests across multiple servers. It is written in C and has a reputation for being fast and efficient in terms of processor and memory usage.
Install the package via apt (you may want to install it from source for the latest version):
$ sudo apt install haproxy |
The main configuration file is /etc/haproxy/haproxy.cfg and the installation creates a systemd service:
$ systemctl status haproxy ● haproxy.service - HAProxy Load Balancer Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2020-08-10 12:36:00 CEST; 5h 55min ago Docs: man:haproxy(1) file:/usr/share/doc/haproxy/configuration.txt.gz Process: 48519 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS (code=exited, status=0/SUCCESS) Main PID: 48530 (haproxy) Tasks: 2 (limit: 1075) Memory: 3.9M CGroup: /system.slice/haproxy.service ├─48530 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/haproxy-master.sock └─48531 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/haproxy-master.sock |
Every time you modify the configuration you have to restart the service in order to get it taken into account.
$ sudo systemctl restart haproxy |
Before restarting you may want to check that the configuration file is valid
$ haproxy -f /etc/haproxy/haproxy.cfg -c Configuration file is valid |
Let’s have a look at the configuration file
global log 127.0.0.1 local2 info chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 256 user haproxy group haproxy daemon ## enabling the HAProxy Runtime API stats socket ipv4@127.0.0.1:7777 level admin stats socket /var/run/haproxy.sock mode 666 level admin stats timeout 2m defaults mode http log global option httplog timeout connect 10s timeout client 30s timeout server 30s ## http/9090 => Proxy for Apache Tomcat Based Web Applications ## - Data Catalog ## - RESTful Web Service ## - REST Services ## - Scheduler Web Admin Tool ## - Design Studio ## - Diagnostic & Monitoring Tool frontend http-in bind *:9090 default_backend backend_servers_tomcat_http option forwardfor backend backend_servers_tomcat_http balance roundrobin cookie SERVERID insert server node01 192.168.141.121:9090 cookie ck_node01 check server node02 192.168.141.122:9090 cookie ck_node02 check ## https/9443 => Proxy for Apache Tomcat-based Web Applications using ## a secure (SSL/TLS) channel frontend https-in bind *:9443 ssl crt /etc/haproxy/denodo_server_key_store.pem default_backend backend_servers_tomcat_https backend backend_servers_tomcat_https balance roundrobin cookie SERVERID insert server node01 192.168.141.121:9090 cookie ck_node01 check server node02 192.168.141.122:9090 cookie ck_node02 check ## tcp/8000 => Proxy for Scheduler Server traffic frontend tcp-in-scheduler mode tcp bind *:8000 default_backend backend_servers_tcp_in_sched option tcplog backend backend_servers_tcp_in_sched mode tcp balance roundrobin server node01 192.168.141.121:8000 check server node02 192.168.141.122:8000 check ## tcp/9999 => Proxy for JDBC traffic frontend jdbc-in mode tcp bind *:9999 default_backend backend_jdbc_servers option tcplog backend backend_jdbc_servers mode tcp balance roundrobin # These names must match the names defined in the Solution Manager server vdp.node01 192.168.141.121:9999 check server vdp.node02 192.168.141.122:9999 check ## tcp/9997 => Proxy for JDBC traffic (RMI Factory Port) ## The RMI Factory Port (by default 9997) is not needed anymore in Denodo 8. ## If you are running Denodo 7, this must be configured only if the clients ## cannot access directly the cluster nodes. The RMI factory port must ## not be balanced and it must be different for every node. ## Comment this out if running Denodo 8 frontend jdbc-in-factory-node01 mode tcp bind *:9997 default_backend backend_jdbc_srv_factory_node01 option tcplog ## Comment this out if running Denodo 8 backend backend_jdbc_srv_factory_node01 mode tcp server node01 192.168.141.121:9997 check ## Comment this out if running Denodo 8 ## The RMI factory port must be different for every node frontend jdbc-in-factory-node02 mode tcp bind *:9995 default_backend backend_jdbc_srv_factory_node02 option tcplog ## Comment this out if running Denodo 8 backend backend_jdbc_srv_factory_node02 mode tcp server node02 192.168.141.122:9995 check ## tcp/9996 => Proxy for ODBC traffic frontend odbc-in mode tcp bind *:9996 default_backend backend_odbc_servers option tcplog backend backend_odbc_servers mode tcp balance roundrobin server node01 192.168.141.121:9996 check server node02 192.168.141.122:9996 check |
server <name> <address>[:[port]] [param*]
Every backend section name should match one and only one default_backend name in the configuration. Address can be a hostname (for example defined in /etc/hosts) but it is resolved at haproxy service start time.
There are several techniques:
Each approach brings benefits and drawbacks, see the article linked in the reference section for an in-depth discussion and implementation tips in HAProxy. In our case we chose SSL Termination.
When behind the load balancer we put web applications that serve different content based on logged-in users, we must find a way to avoid same-session switching between nodes. One of the methods to deal with this is to implement the so-called application layer persistence, in which the load balancer inserts a Set-Cookie: header that is included in every following request from the same client.
You enable the persistence by adding the cookie NAME insert instruction in the backend section along with the cookie keyword followed by a unique cookie name in each server instruction.
For example:
cookie SERVERID insert server node01 192.168.141.121:9090 cookie ck_node01 check server node02 192.168.141.122:9090 cookie ck_node02 check |
In this case we are associating each newly routed connection to node01 the cookie ck_node01. The next time the same client sends a request to the load balancer it will include the cookie and the load balancer will force the routing to node01, thus bypassing the load-balancing algorithm. The same behavior applies for node02.
In a later section we will see how to setup Deployment Scripts in the Solution Manager. To do that we must allow runtime modifications to the HAProxy configuration, for example to cut or enable access to a given backend server.
It turns out that we can do that by using the HAProxy Runtime API (link in the reference section).
For that to work we just need to add the following lines in the global section of the configuration file:
stats socket ipv4@127.0.0.1:9999 level admin stats socket /var/run/haproxy.sock mode 666 level admin stats timeout 2m |
As a preliminary step of the deployment scripts configuration in the Solution Manager you should define:
Environment, cluster and servers definition in the Solution Manager
As an example, here is the definition of vdp.node01:
Definition of a VDP server in the Solution Manager
In the Solution Manager we can define scripts that disable the servers in a cluster before a promotion deployment and bring them back online once the deployment is finished. The idea of these scripts is to communicate to the load balancer to temporarily cut communication to the backend node being updated.
To do that, we need to apply two settings:
Setting the Deployment Type in the Environment
NOTE: to get the script working, ensure that
$ ssh-keyscan -H loadbalancer >> ~/.ssh/known_hosts
Here is the script to be uploaded:
#!/bin/bash ##################### ## This script allows enabling or disabling a node on a ## HAProxy based load balancer ##################### nodename=$1 status=$2 loadbalancer_user=$3 loadbalancer_pwd=$4 loadbalancer_host=$5 set_instruction="set server backend_jdbc_servers/${nodename} state ${status}" echo "echo ${set_instruction} | socat stdio /var/run/haproxy.sock" | sshpass -p${loadbalancer_pwd} ssh ${loadbalancer_user}@${loadbalancer_host} 'bash -' echo "EXIT CODE=$?" |
In this script we are telling HAProxy via a remote SSH command (sshpass) that we want to change the status of backend_jdbc_servers/${nodename} to ${status}. As such this script can be used both for enabling (state: ready) and disabling (state: maint) the relevant node. In this example we only deal with the Virtual DataPort JDBC traffic; in a more realistic scenario you would need to take into account all the components that should be disabled/enabled (e.g. Scheduler and Data Catalog).
Parameter passing is done by graphically defining parameters. In our example there are five of them:
param_number |
name |
type |
value |
0 |
nodename |
load balancing variable |
name |
1 |
state |
literal |
ready |
2 |
loadbalancer_user |
load balancing variable |
loadbalancer_user |
3 |
loadbalancer_pwd |
load balancing variable |
********** |
4 |
loadbalancer_host |
load balancing variable |
loadbalancer_host |
Parameters can be of type literal, where you specify the value at creation time, or load balancing variable, that are defined at the cluster or the server level and can be handy when a parameter corresponds to a node (nodename) or to a cluster (such as loadbalancer_user,loadbalancer_pwd and loadbalanceer_host). Sensitive parameters, such as passwords, can be stored encrypted.
Deployment Scripts configuration: enabling (top) and disabling the node (bottom)
Definition of Load Balancing variables at the server and the cluster level
We can test that your deployments scripts work as expected by deploying a revision in the Solution Manager.
Afterwards, we can inspect the load balancer logs (/var/log/haproxy.log)
Aug 12 16:10:45 localhost.localdomain haproxy[55119]: Server backend_jdbc_servers/vdp.node01 is going DOWN for maintenance. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Aug 12 16:10:45 loadbalancer haproxy[55119]: [WARNING] 224/161045 (55119) : Server backend_jdbc_servers/vdp.node01 is going DOWN for maintenance. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Aug 12 16:10:51 localhost.localdomain haproxy[55119]: Server backend_jdbc_servers/vdp.node01 is UP/READY (leaving forced maintenance). Aug 12 16:10:51 loadbalancer haproxy[55119]: [WARNING] 224/161051 (55119) : Server backend_jdbc_servers/vdp.node01 is UP/READY (leaving forced maintenance). Aug 12 16:10:51 localhost.localdomain haproxy[55119]: Server backend_jdbc_servers/vdp.node02 is going DOWN for maintenance. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Aug 12 16:10:51 loadbalancer haproxy[55119]: [WARNING] 224/161051 (55119) : Server backend_jdbc_servers/vdp.node02 is going DOWN for maintenance. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Aug 12 16:10:56 localhost.localdomain haproxy[55119]: Server backend_jdbc_servers/vdp.node02 is UP/READY (leaving forced maintenance). Aug 12 16:10:56 loadbalancer haproxy[55119]: [WARNING] 224/161056 (55119) : Server backend_jdbc_servers/vdp.node02 is UP/READY (leaving forced maintenance). |
From the logs, we clearly see that the node access has been cut off and restored one after another.
We can also have a look at the deployment summary in the Solution Manager Administration Tool that confirms that VQL deployments have been bracketed among SCRIPT_SERVER tasks that correspond to script executions in the load balancer.
Deployment status showing that the enable and disable scripts have been run before and after the VQL deployment
In cluster mode, the Scheduler servers that constitute the cluster must share the Scheduler internal metadata database. You can consult the Scheduler Cluster Settings manual page to set up this scenario and to understand the technical implications of having such deployment.
In cluster mode, the Data Catalog servers that constitute the cluster must share the Data Catalog internal metadata database. You can consult the Data Catalog External Database Setup manual page for the configuration steps.
In cluster mode, the Virtual Data Port servers
So, in Denodo 7 and in Denodo 8, if each node interacts (read/write) only with its local metadata catalog, there must be an external agent that guarantees the synchronization between all the local metadata catalogs, one for each node.
If, in Denodo 8, a shared metadata catalog is configured, the replication is done automatically.
By default the Virtual DataPort metadata catalog is not shared. To understand how the metadata catalog can be shared and to read some considerations and insight on this choice you can refer to the Storing Catalog on External Database manual page.
Diagnostic & Monitoring Tool
The Diagnostic & Monitoring Tool does not support a shared metadata catalog so the configuration, environments and servers must be synchronized manually.
Summary
Denodo Platform Component |
Support for shared DB for metadata |
Compulsory shared DB in clusters |
Virtual DataPort |
yes |
no |
Scheduler |
yes |
yes |
Data Catalog |
yes |
yes |
Diagnostic & Monitoring Tool |
no |
no |
Summary of support for shared metadata catalogs of the Denodo components
Note: You need to perform these steps only if using Denodo 7.
For each node in your cluster, apply the following modifications.
$ sudo hostnamectl set-hostname node01.ubuntu1804-denodo7 $ hostname node01.ubuntu1804-denodo7 |
$ grep -R node01 $DENODO_HOME/conf conf/scheduler/ConfigurationParameters.properties:Launcher/registryURL=node01.ubuntu1804-denodo7 conf/vdp/VDBConfiguration.properties:com.denodo.vdb.vdbinterface.server.VDBManagerImpl.registryURL=node01.ubuntu1804-denodo7 |
$ grep factoryPort $DENODO_HOME/conf/vdp/VDBConfiguration.properties com.denodo.vdb.vdbinterface.server.VDBManagerImpl.factoryPort=9997 |
This port must be different for each Virtual DataPort node behind the load balancer and its value must be used in the corresponding directive in haproxy.cfg.
$ grep license.host $DENODO_HOME/conf/SolutionManager.properties conf/SolutionManager.properties:com.denodo.license.host=ubuntu1804-den7-sol-man |
$ $DENODO_HOME/bin/regenerateFiles.sh $ $DENODO_HOME/bin/vqlserver_shutdown.sh $ $DENODO_HOME/bin/vqlserver_startup.sh $ $DENODO_HOME/bin/scheduler_shutdown.sh $ $DENODO_HOME/bin/scheduler_startup.sh $ $DENODO_HOME/bin/datacatalog_shutdown.sh $ $DENODO_HOME/bin/datacatalog_startup.sh |
You can test your Denodo clustered deployment setup by:
(ex. DBeaver 7.1.4)
Successful connection to Virtual DataPort using JDBC
Windows ODBC Data Source Creation
You have several options to test the configuration:
$DENODO_HOME/samples/scheduler/scheduler-api/scripts/test_schedulerclient.sh -start job_extract_iv_sales -h loadbalancer -p 8000 |
Dynamic Configuration of HAProxy via API
Using SSL Certificates with HAProxy
Configuring Deployment Scripts in the Solution Manager