In the current information technologies landscape there are a lot of locations that can act as sources of valuable data. Traditional enterprise sources include relational databases, data warehouses and web services among others. Data consolidation helps reducing the number of data silos whereas Data Virtualization can help to bridge the gaps where consolidation is not possible or desirable.
In some cases, data is siloed in external sources that we have no control over. In these scenarios our only choice is to consume the data in the format that is presented to us. Normally we will see this external information provided through APIs, but sometimes the only way of accessing it is using a web browser to access a web page - the data is meant to be consumed by humans, not machines. The amount of information that can be found in these formats is very high and it can provide a lot of value to most companies, either by itself or when used to enrich the data within the datacenter.
With traditional web scraping techniques the cost of actually get this data is usually higher than the real value that we get from it, as they are difficult to apply and maintain, with the resulting situation where most companies never get to implement a successful solution. Denodo ITPilot changes the balance by lowering the price of entry so that it becomes not only feasible but easy to integrate our internal company data with externally maintained information.
This tutorial will introduce Denodo ITPilot and show how to use it for performing several common tasks found in real-world Web Integration projects.
Denodo ITPilot (ITP for short) is the Web Integration component for the Denodo Platform. Using ITPilot we can create a web wrapper that will access a specific website for the purpose of extracting information (usually, although we can use ITPilot to do anything a user would do in a web browser) and then import said wrapper into the Denodo Platform as a data source. Once this is done we can then combine the data from the web with other views that we may have created within Denodo.
These web wrappers are created graphically using the integrated development environment that ITPilot provides. When deployed in the Denodo Platform they will retrieve and return information from the specific web site in real time.
ITPilot usage is split in two scenarios: development and execution. We first use the development tools to tell Denodo what website we want to access, what steps need to be performed, what format the information is in the page and how to extract it, what transformations we want to apply, etc.
ITPilot makes two tools available for these wrapper creation tasks:
Once the development and testing of a wrapper is finished the wrapper is ready to be deployed in the Denodo server. Alongside with it we will also need to start the Browser Pool server, which will manage the browser instances used to retrieve the data in real time during the normal operation of the web integration solution. ITPilot also has an Administration tool that we will not use in these tutorials, please check the ITPilot documentation for more details.
In this tutorial you will learn: