Notebooks can be saved as files, checked into revision control just like code, and freely shared. They run anywhere, thanks to their browser-based user interface. Though the most influential notebook, Jupyter, has its origins in the Python programming language, it now supports many other programming languages, including R, Scala, and Julia. The popular Apache Spark analytics platform has its own notebook project, Apache Zeppelin, which includes Scala, Python, and SparkSQL capabilities, as well as providing visualization tools.
Querying Denodo with Zeppelin Notebook
Zeppelin is an open source notebook that is primarily used with Spark. Its backend is written in Java and front end in Angular.
Configuring Denodo as an Interpreter in Zeppelin
- Download Zeppelin and run it. It’s also shipped with standard distros like HDP and AWS EMR.
- Open Zeppelin’s home page (Default url: http://localhost:8080) and click on the user name drop down at the top and click on Interpreter.
- Click create and select JDBC for Interpreter group. Use denodo as Interpreter Name. Fill the values for JDBC driver class name, URL, user and password:
- Add the location of Denodo’s JDBC driver in the Dependencies > artifact field:
- In the notebook make sure denodo is selected as an interpreter and then you can use %denodo to run queries against the denodo interpreter.
- Run your queries.
Denodo as a source in Spark using DataFrames
A common way to use Denodo data from Spark is by moving the results of a Denodo query to a remote table in Spark.
However, in many cases it will be also interesting to ingest data directly from Denodo, without the need for replication. You can do so with Spark’s DataFrames API via JDBC
To connect to Denodo from a Spark interpreter, you have to first place Denodo’s JDBC driver jar in $SPARK_HOME/jars folder and restart Zeppelin. Once it is done, you can query a view in denodo with a code like below:
url = "jdbc:vdb://localhost:9999/machine_learning"
user = "admin"
password = "your_password"
dbtable = "text_prediction"
df = sqlContext.read.format("jdbc").option("url", url).option("user",user).option("password",password).option("dbtable", dbtable).load()
Querying Denodo with Jupyter Notebook
- The easiest way to start working with Jupyter is to Install Anaconda. Anaconda is a widely used python distribution with many libraries that are used by Data Analysts.
- Start Jupyter: jupyter notebook
Creating a Denodo DSN
Creating a notebook
A browser window will be automatically opened which will list the current working directory contents. Select New > Python
The recommended way to connect to a database using Jupyter is through ODBC with the pyodbc library. Once you have a DSN created, we can use a standard pyodbc program to query Denodo. For example:
# pyodbc library for the odbc connection
# pandas for table display
# The DenodoODBC DSN is defined in Windows DSN Manager
connection = pyodbc.connect("DSN=DenodoODBC")
# Execute the query and display using Pandas
df = pandas.read_sql("SELECT * 1 from dual()", connection)
You can also use pandas easily to draw charts: