Introduction
Google BigQuery is a cloud-based enterprise data warehouse that allows its users to store and query massive datasets. BigQuery provides a web UI and a command line tool, as well as different access methods such as a REST API and multiple client libraries (Java, .NET or Python).
In this article, we are going to configure the Denodo Platform to access BigQuery using a JDBC driver. We have created some sample data for this article from the public BigQuery catalog. The sample data contains only the table outpatient_charges_2014 from the medicare dataset.
Outpatient_charges_2014 table in Google BigQuery
We can execute queries from the web interface to get the data we need and compare the results to those retrieved by Denodo Virtual DataPort.
For instance, the execution of:
SELECT a.* FROM [bigquery-156210:test.outpatient_charges_2014] as a LIMIT 1000 |
will look like:
Connecting to BigQuery using the JDBC driver
First of all, we need to download the JDBC driver to connect to BigQuery. It can be downloaded from here: https://cloud.google.com/bigquery/providers/simba-drivers
Once the JDBC driver is downloaded, please follow these steps to configure a data source in Denodo:
- Unzip the file ‘SimbaJDBCDriverforGoogleBigQueryxx_x.x.x.xxxx’, you will get a folder full of JAR files along with release notes.
- Go to the wizard "File"> "Extension management" of the Administration Tool.
- Go to the tab "Libraries" in the Extension management window.
- Select ‘JDBC Drivers’ in the Type drop down and click on ‘Import’.
- select the ‘jdbc’ in resource type drop down and then select "bigquery" in the Version drop down.
- Click ‘Add’ and then navigate to the unzipped directory from the 1st point, now you will see all the content in the directory listed in the Import resources window.
- Click Ok to start uploading resources.
Now, we also need to generate the credentials to access our BigQuery project.
Go to the Google Cloud Platform Dashboard and click on “Credentials”:
Once in “Credentials”, click on “Create Credentials” > “Service Account”.
Fill the form providing the name of the service account to create your credentials.
Next, select the role or roles for the new account according to your needs.
The following step is to generate a key to access the data.
For this particular case, we will generate a JSON key that will download a file with the key and the proper credentials.
Copy that file to your desired location. Now we can go back to Denodo to finish the configuration.
- Open Denodo Design Studio
- Create a JDBC Data Source (File > New > Data Source)
- You will see a ‘Select data source’ window in which you can find ‘Google BigQuery’ in the ‘Analytical & Big Data’ section.
- Database URI: jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<PROJECT_ID>;OAuthType=0;OAuthServiceAcctEmail=<EMAIL>;OAuthPvtKeyPath=<PATH_TO_CREDENTIALS_FILE>;Timeout=45
The values for PROJECT_ID and EMAIL can be obtained from the credentials JSON file created above.
- Click on “Save” to create the data source.
- Create a new base view .
- Denodo will automatically search for the available schemas and tables in BigQuery.
- In this case, the base view could be our “outpatient_charges_2014”.
Note: if you want to access Google BigQuery public Datasets, you will need to use the option “Create from Query” and provide the desired sentence.
- Click “Create selected” to save the view and the base view that we created looks like this:
Finally, the results of our execution will look like...
Conclusion
Using a JDBC data source is the best option to integrate Google BigQuery data into the Denodo Platform.
It is also possible to use the Google REST API and create a JSON data source.
The main inconvenience of using the REST API is that an additional flatten view be needed to flatten the array returned by the API, and then the fields in the base view will have to be projected manually as the field output of a derived view. However, once this flatten view is created, it can be queried normally and it will access Google BigQuery directly without any third party software in the middle.
The JDBC driver is a third-party driver that may not support all the features included in the REST API. When working with this driver in the JDBC data source the tables in BigQuery cannot be introspected and the “Create from query” option must be used to develop base views.
References
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.