This document instructs you on how to create an execution environment for the Data Science and Machine Learning Tutorial, that you are about to start.
It uses Vagrant as the provisioning platform and VirtualBox as the hypervisor.
Once the environment is provisioned, you'll have a running virtual machine (VM) with the Denodo Platform installed as well as the needed data sources to be able to do the tutorial exercices.
The virtual machine can be managed independently from Vagrant, like any other. This means that you can stop and reboot it as you like with your modifications persisted in it.
The VM should be assigned 12G of memory and about 25GB of storage.
You can use the latest versions of Vagrant and VirtualBox. The VM has been tested in the following environment:
Once you start the provisioning, Vagrant downloads to your machine a canonical virtual machine image to base the provisioning on. It then turns on that virtual machine as a VirtualBox guest and run the script specified in vagrant/install-files/setup.sh
that performs all the installation and configuration operations needed to get the environment ready to be used.
dstutorial-release-20210428.zip
denodo-systemd-services-release-20210408.tar
dstutorial-release-20210428.zip
and navigate to the folder dstutorial-release-20210428
.
vagrant/install-files/artifacts
:
Name of file | Description |
---|---|
Apache Zeppelin for Denodo - Standalone.zip
|
Apache Zeppelin for Denodo installer, version must be 20210113 |
denodo.lic
|
Denodo Standalone license |
denodo-install-8.0-ga-linux64.zip
|
Denodo Platform installer for Linux |
denodo-v80-update-20210209.zip
|
Denodo Platform update 8.0-20210209 |
denodo-systemd-services-release-20210408.tar
|
Tar archive of setup for Denodo systemd services (already downloaded in step 3) |
dstutorial-release-20210428.zip
|
Zip archive of repository dstutorial (already downloaded in step 3) |
mysql-connector-java-8.0.20.zip
|
The Mysql Connector for Java to be downloaded from here |
Adapter - Name: VirtualBox Host-Only Ethernet Adapter - IPv4 Addrees: 192.168.140.1 - IPv4 Network Mask: 255.255.255.0 DHCP Server - Enable Server: no
If you want to use an existing Host-only network adapter, you will need to add the property name
and change the property ip
for config.vm.network
in file vagrant/VagrantFile
accordingly. Networking configuration documentation is available here.
vagrant
(under dstutorial-release-20210428
). You'll see that there is a file called Vagrantfile
, that is the configuration file for the provisioning.
vagrant up > provisioning.log
This command stores the provisioning log into a file, provisioning.log
to monitor that everything is going as expected. This command is valid if you are using the Git Bash terminal or the Windows Command Prompt (CMD). The provisioning process lasts between 20 and 35 minutes, depending on your hardware and Internet bandwidth. If you open the VirtualBox Manager during the provisioning, you'll find a machine called denodo.dstutorial.com
being created and booted.
vagrant up
command ends it will return the cursor. Check that the tail of provisioning.log
contains the following lines:
default: check Meter Readings db (postgresql): Product Name : PostgreSQL default: Product Version : 12.6 (Ubuntu 12.6-0ubuntu0.20.04.1) default: check Weather db (mysql): Product Name : MySQL default: Product Version : 8.0.23-0ubuntu0.20.04.1 default: check Building location (xlsx over sftp): OK default: check Holidays data sources (web services): OK default: check file GHCN file (csv over sftp) site0_daily: OK default: check file GHCN file (csv over sftp) site2_daily: OK default: check file GHCN file (csv over sftp) site4_daily: OK default: check file GHCN file (csv over sftp) site13_daily: OK default: check file GHCN file (csv over sftp) site15_daily: OK default: check file GHCN file (csv over sftp) site0_monthly: OK default: check file GHCN file (csv over sftp) site2_monthly: OK default: check file GHCN file (csv over sftp) site4_monthly: OK default: check file GHCN file (csv over sftp) site13_monthly: OK default: check file GHCN file (csv over sftp) site15_monthly: OK default: check final prediction web service on dstutorial_sample_completed (v1): 200 default: check final prediction web service on dstutorial_sample_completed (v2): 200 default: This machine has IP: 192.168.140.100 default: You may want to add it, to your local hosts file with name denodo.dstutorial.com default: provisioning started: Thu Apr 8 11:09:54 UTC 2021 default: provisioning ended: Thu Apr 8 11:43:14 UTC 2021
C:\Windows\System32\drivers\etc\hosts
:
## Machine hosting the Data Science Tutorial Environment 192.168.140.100 denodo.dstutorial.com
The IP must match the one specified in config.vm.network
in the VagrantFile.
At this point you can already connect to the Denodo applications deployed in the VM:
If you want to start again from a clean virtual machine, you should:
vagrant
.vagrant destroy -f
. This command erases your VirtualBox guestvagrant up > provisioning.log
.
denusr
with password denusr
. This is an administrator (sudo) user.
/opt/denodo/8.0
.
denusr
.
Service Name | Systemd unit file |
---|---|
Virtual DataPort Server | denodo_vdp |
Web Design Studio | denodo_design_studio |
Data Catalog | denodo_data_catalog |
Apache Zeppelin for Denodo | zeppelin |
ML Web Service | flask_mlpred_rest |
Holiday Web Service | flask_holiday_rest |