Object Storage Routes vs External Catalogs¶
The Create Base View dialog for the Lakehouse accelerator data source offers two options: Object Storage Routes and External Catalogs:
Object Storage Routes: is the default option to use. It allows to explore the different routes defined on the Read & Write tab and create base views for the desired folders.
External Catalogs: is an advanced option that allows to access external catalogs that have been defined on the Denodo Lakehouse accelerator cluster. This option allows to access tables registered in other catalogs like AWS Glue Data Catalog, Unity Catalog or Snowflake Open Catalog. See section External Metastore of the Denodo Lakehouse Accelerator Guide for more information.
The Lakehouse accelerator cluster includes by default catalogs Hive, Iceberg and Delta which allow access to object storage using an embedded Hive metastore as the metadata catalog. The general recommendation is to configure authentication in those catalogs that allows the Lakehouse accelerator cluster to access all the storage data that will be consumed from Virtual DataPort and then specify in Virtual DataPort the different privileges as described in the next section.
How to configure the Lakehouse accelerator to allow different Denodo developers to access different storage routes and catalogs¶
The general recommendation is to configure authentication in those catalogs that allows the Lakehouse accelerator cluster to access all the storage data that will be consumed from Virtual DataPort and then specify the different privileges in Virtual DataPort as mentioned above. However, there are situations where is necessary to provide different sets of credentials for different object storage routes, for instance in cases where there are developers from different domains, and each one has different access restrictions. In those situations, you can:
Make a copy of the global data source.
Specify credentials on the new data source to access the desired objects.
Configure privileges in Denodo to specify who can access and create views from the new data source.
In addition, you can specify which Lakehouse accelerator catalogs are available for table introspection on each data source on the External Catalogs option. This is done by executing the following command from a VQL Shell, substituting the parts in brackets with appropriate values:
-- You do not need to restart for this command to take effect SET 'com.denodo.embeddedmpp.allowedCatalogsForJdbcIntrospection.<database>.<data_source>' = '<comma_separated_list_of_allowed_catalogs>';
When this property is not configured, every catalog not configured to be used in the Object Storage Routes option by any copy of the Lakehouse accelerator data source will be available in the External Catalogs option.
For example, let’s say we want to access data available in AWS Glue, Unity and Snowflake. To do that, we have included extra catalogs in the the Lakehouse accelerator : glue, unity and snowflake (see section Catalogs Catalogs from the Denodo Lakehouse Accelerator Guide). At Virtual DataPort there exist two virtual databases domain_A_db and domain_B_db, each containing a copy of the Lakehouse accelerator data source: accelerator_A and accelerator_B, respectively. accelerator_A can only use catalogs glue and unity and accelerator_B can only use catalogs unity and snowflake. In order to achieve this, the commands to execute would be:
-- You do not need to restart for these commands to take effect SET 'com.denodo.embeddedmpp.allowedCatalogsForJdbcIntrospection.domain_A_db.accelerator_A' = 'glue,unity'; SET 'com.denodo.embeddedmpp.allowedCatalogsForJdbcIntrospection.domain_B_db.accelerator_B' = 'unity,snowflake';
In cases it is required to configure different credentials for different storage routes at the Lakehouse accelerator level, it is possible to create multiple Hive, Iceberg and Delta catalogs with different authentication credentials. For instance: hiveDomainA, hiveDomainB, icebergDomainA, icebergDomainB, etc. To configure the catalogs that an specific copy of the Lakehouse accelerator data source will use when creating base views using the Object Storage Routes option use the following commands from a VQL Shell substituting the parts in brackets with appropriate values:
-- You do not need to restart for these commands to take effect SET 'com.denodo.embeddedmpp.introspectionCatalog.<database>.<data_source>' = '<hive_catalog>'; SET 'com.denodo.embeddedmpp.introspectionCatalog.iceberg.<database>.<data_source>' = '<iceberg_catalog>'; SET 'com.denodo.embeddedmpp.introspectionCatalog.delta.<database>.<data_source>' = '<delta_catalog>';
For example, at Virtual DataPort there exist two virtual databases domain_A_db and domain_B_db, each containing a copy of the Lakehouse accelerator data source with its own object storage routes and credentials: accelerator_A and accelerator_B, respectively. accelerator_A must use catalogs hiveDomainA and icebergDomainA. accelerator_B must use catalogs icebergDomainB and deltaDomainB. So, in order to achieve this, the commands to execute would be:
-- You do not need to restart for these commands to take effect SET 'com.denodo.embeddedmpp.introspectionCatalog.domain_A_db.accelerator_A' = 'hiveDomainA'; SET 'com.denodo.embeddedmpp.introspectionCatalog.iceberg.domain_A_db.accelerator_A' = 'icebergDomainA'; SET 'com.denodo.embeddedmpp.introspectionCatalog.iceberg.domain_B_db.accelerator_B' = 'icebergDomainB'; SET 'com.denodo.embeddedmpp.introspectionCatalog.delta.domain_B_db.accelerator_B' = 'deltaDomainB';
