You can translate the question and the replies:

Databricks as a Cache store for Denodo

Has anyone used Databricks as a Cache store for Denodo? I am able to connect to Databricks cluster successfully but not able to configure it as a Cache.
user
26-11-2019 23:13:59 -0500

3 Answers

Hi, I have used Databricks as a cache store for Denodo but it has some caveats. Knowing that you already have Databricks successfully working as a data source I imagine you have already followed the Knowledge base article about [How to connect to Azure Databricks from Denodo](https://community.denodo.com/kb/view/document/How%20to%20connect%20to%20Azure%20Databricks%20from%20Denodo?category=Data+Sources). Also, I would upgrade the platform to the latest update as after the 20190903 update the Databricks connector started to be supported and simplifies the process. What I did to make it work as a cache store is to create a folder for storage inside an Azure blob container. This folder, at least in Azure, has to be inside the /mnt folder. After creating this folder in the blob container I created a Scala notebook and [mounted the folder inside the Databricks File System (DBFS)](https://docs.databricks.com/data/data-sources/azure/azure-storage.html#mount-an-azure-blob-storage-container). Denodo will use a Bulk Data Load so I needed to configure the Hadoop client too in order to provide it with the necessary OAuth credentials. I followed the Hadoop documentation article about [Hadoop Azure Support: Azure Blob Storage](https://hadoop.apache.org/docs/stable/hadoop-azure/index.html) to do this Once the storage is configured I downloaded the [Databricks CLI](https://docs.databricks.com/dev-tools/databricks-cli.html#set-up-the-cli) using the Databricks documentation, it serves as a client. I finished the process by reconfiguring the Denodo connection. You’ll need the JDBC driver, it can be found in the [Connect BI Tools](https://docs.databricks.com/bi/jdbc-odbc-bi.html#step-1-download-and-install-a-jdbc-odbc-driver) documentation page on Databricks site. Setup the JDBC source as a Spark adapter and just make sure to add the following tweaks: * For the user use “token” and for the password the actual [token](https://docs.databricks.com/dev-tools/api/latest/authentication.html#authentication) * On the Read & Write tab check the Bulk Data checkbox * On the new fields put the needed information. The HDFS URI is the URI to the folder you created first. This should be enough to make it work. The caveats I was referring are, because limitations of Spark’s external tables, that Matching rows invalidation cannot be used and concurrent querying as well as queries during the invalidation process should not be attempted. Hope this helps!
Denodo Team
27-11-2019 16:01:38 -0500
Thanks. I have already configured a folder /mnt in Azure Data lake and mounted it to Databricks File System. When I test the bulk load from Denodo cache 'Read-Write' tab, it shows that listing for this folder is successful but it does not show the actual contents of this folder. It then fails when trying to create a directory inside this folder. Says <xxx> is not a folder. Is the HDFS URI configured in this tab of the format 'https://<databricks host name>/mnt/<some folder>?
user
27-11-2019 18:08:50 -0500
Still no luck with this. I have configured Databricks CLI on the same server as Denodo server and able to connect successfully to Databricks. Also able to list folders, create folders using CLI but not able to test Bulk Load from Denodo VDP. I think it is something wrong with the Hadoop Executable Path and HDFS URI. My Hadoop Executable location is /apps01/denodo/lib-external/hadoop-3.0.0/bin/hadoop and HDFS URI is of the form https://<databricks host name>/mnt/<some folder>. Any help with this is highly appreciated.
user
02-12-2019 19:01:20 -0500
You must sign in to add an answer. If you do not have an account, you can register here