USER MANUALS


Embedded Hive Metastore

Denodo Embedded MPP is shipped with an Embedded Hive Metastore that acts as a repository of metadata, mapping Object Storage files -from S3, ADLS, GCS or HDFS- to tables.

The Embedded Hive Metastore stores the metadata in an Embedded PostgreSQL.

The metastore section of the values.yaml configures the connection to the Embedded PostgreSQL.

metastore:
  enabled: true
  connectionUrl: "jdbc:postgresql://postgresql:5432/metastore"
  connectionDriverName: "org.postgresql.Driver"
  connectionDatabase: "metastore"
  connectionUser: "hive"
  connectionPassword: "hive"

postgresql:
  enabled: true

You can also choose to use an alternative external database (PostgreSQL, MySQL, SQL Server or Oracle) to work with the Embedded Hive Metastore. The externally-managed database option has the advantage of keeping the metadata outside the cluster lifecycle. And in some cases, it is the only option, when there are policies restricting the type of RDBMS that can be installed, backups, maintenance, etc.

To configure an external database fill in the metastore.connectionXXX parameters with the connection details. Make sure that the external database can be accessed from the Denodo Embedded MPP Kubernetes cluster. And do not forget to disable the Embedded PostgreSQL with postgresql.enabled=false, so that the Embedded PostgreSQL is not deployed.

metastore:
  enabled: true
  connectionUrl: "jdbc:sqlserver://xxxx.database.windows.net:1433;..."
  connectionDriverName: "com.microsoft.sqlserver.jdbc.SQLServerDriver"
  connectionDatabase: "metastore"
  connectionUser: "user@DOMAIN"
  connectionPassword: "mypassword"

postgresql:
  enabled: false
  • connectionUrl: JDBC connection string for the database of the Embedded Hive Metastore, which can be:

    • the Embedded PostgreSQL, jdbc:postgresql://postgresql:5432/metastore, the default one

    • an external PostgreSQL

    • an external MySQL

    • an external SQL Server

    • an external Oracle

  • connectionDriverName: JDBC Driver class name to connect to the database of the Embedded Hive Metastore, which can be:

    • org.postgresql.Driver for PostgreSQL, the default one

    • org.mariadb.jdbc.Driver for MySQL

    • com.microsoft.sqlserver.jdbc.SQLServerDriver for SQL Server

    • oracle.jdbc.OracleDriver for Oracle

The Hive Metastore heap size is set to 2048MB, but it is possible to configure it via values.yaml according to your needs.

metastore:
  maxHeapSize: 2048

In addition, there is an initialization script for the external databases: PostgreSQL, MySQL, SQL Server or Oracle, included in the hive-metastore/scripts that must be run on the external database before deploying the Denodo Embedded MPP.

Supported Databases for Embedded Hive Metastore

Database

Minimum supported version

Postgres

9.1.13

MySQL

5.6.17

MS SQL Server

2008 R2

Oracle

11g

CPU and Memory Management in Kubernetes

Kubernetes schedules pods across nodes based on the resource requests and limits for CPU and Memory. If a container pod requests certain CPU and/or memory values, Kubernetes will only schedule it on a node that can guarantee those resources. Limits, on the other hand, ensure that a container pod never exceeds a certain value.

metastore:
  resources:
    limits:
      cpu: 1
      memory: 1Gi
    requests:
      cpu: 1
      memory: 1Gi

Note that resources are commented out, as we leave this setting as a choice for the Kubernetes cluster administrator.

Add feedback