Embedded Hive Metastore¶
Denodo Embedded MPP is shipped with an Embedded Hive Metastore that acts as a repository of metadata, mapping Object Storage files -from S3, ADLS, GCS or HDFS- to tables.
The Embedded Hive Metastore stores the metadata in an Embedded PostgreSQL.
The metastore
section of the values.yaml
configures the connection to the Embedded PostgreSQL.
metastore:
enabled: true
connectionUrl: "jdbc:postgresql://postgresql:5432/metastore"
connectionDriverName: "org.postgresql.Driver"
connectionDatabase: "metastore"
connectionUser: "hive"
connectionPassword: "hive"
postgresql:
enabled: true
You can also choose to use an alternative external database (PostgreSQL, MySQL, SQL Server or Oracle) to work with the Embedded Hive Metastore. The externally-managed database option has the advantage of keeping the metadata outside the cluster lifecycle. And in some cases, it is the only option, when there are policies restricting the type of RDBMS that can be installed, backups, maintenance, etc.
To configure an external database fill in the metastore.connectionXXX
parameters with the connection details.
Make sure that the external database can be accessed from the Denodo Embedded MPP Kubernetes cluster.
And do not forget to disable the Embedded PostgreSQL with postgresql.enabled=false
, so that the Embedded PostgreSQL is not deployed.
metastore:
enabled: true
connectionUrl: "jdbc:sqlserver://xxxx.database.windows.net:1433;..."
connectionDriverName: "com.microsoft.sqlserver.jdbc.SQLServerDriver"
connectionDatabase: "metastore"
connectionUser: "user@DOMAIN"
connectionPassword: "mypassword"
postgresql:
enabled: false
connectionUrl: JDBC connection string for the database of the Embedded Hive Metastore, which can be:
the Embedded PostgreSQL,
jdbc:postgresql://postgresql:5432/metastore
, the default onean external PostgreSQL
an external MySQL
an external SQL Server
an external Oracle
connectionDriverName: JDBC Driver class name to connect to the database of the Embedded Hive Metastore, which can be:
org.postgresql.Driver
for PostgreSQL, the default oneorg.mariadb.jdbc.Driver
for MySQLcom.microsoft.sqlserver.jdbc.SQLServerDriver
for SQL Serveroracle.jdbc.OracleDriver
for Oracle
The Hive Metastore heap size is set to 2048MB, but it is possible to configure it via values.yaml
according to your needs.
metastore:
maxHeapSize: 2048
In addition, there is an initialization script for the external databases: PostgreSQL, MySQL, SQL Server or Oracle, included in
the hive-metastore/scripts
that must be run on the external database before deploying the Denodo Embedded MPP.
Database |
Minimum supported version |
---|---|
Postgres |
9.1.13 |
MySQL |
5.6.17 |
MS SQL Server |
2008 R2 |
Oracle |
11g |
CPU and Memory Management in Kubernetes¶
Kubernetes schedules pods across nodes based on the resource requests and limits for CPU and Memory. If a container pod requests certain CPU and/or memory values, Kubernetes will only schedule it on a node that can guarantee those resources. Limits, on the other hand, ensure that a container pod never exceeds a certain value.
metastore:
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 1
memory: 1Gi
Note that resources
are commented out, as we leave this setting as a choice for the Kubernetes cluster administrator.