Wrapper development first steps
Let's review the initial steps and concepts that we need to create web automation solutions.
Starting the WGT
This is the simplest step: open the Denodo Control Center as you did in the other data integration tutorials, but this time click on the ITP tab on the left side. That will display all the components of ITPilot - ignore them for the moment, just click on the start button of the first component of the list, then launch the Wrapper Generation Tool.
Create a new wrapper
Projects and wrappers
The first Denodo Platform tutorial showed how to organize our work in databases and folders. The WGT provides a similar concept of projects that we will use to separate wrappers into different sets.
Let's start by creating a new project named “webautomationtutorial”:
- Right-click over “Projects”, then select
New ... > Project.
- Type the name of the project.
- We are also going to create our first wrapper, so leave “Create a process” selected and type a name for the wrapper in the “Initial process name” field (for example, "firstwrapper").
- Unselect the checkbox “Create process from template” so our new wrapper is blank, and click “Ok”.
After these steps are completed the WGT will open the recently created wrapper and display it in the main workspace.
The anatomy of a wrapper
A flow of components
Now that we have open a blank wrapper we can start building a web integration solution. A wrapper is defined as a flow of components; each component is responsible for executing a specific action. Usual components include:
- Init: the starting point of our wrapper, that defines the input parameters of the wrapper.
- Sequence: to browse to web pages and execute actions such as clicks or text typing.
- Extractor: used to retrieve information from a webpage in a structured format.
- Iterator: used to go through a list of results and execute actions for each one of them.
- Record constructor: used to create new data records from other values - useful for transforming the data that we extract from web pages.
- Output: returns data as the result value of the execution of the wrapper.
- End: specifies the finishing point of the execution of the wrapper.
Each component can be added to the wrapper by dragging it from the left menu onto the main workspace, and they are configured through a graphical wizard. Open this wizard by double-clicking on each component.
Components are linked to each other like so:
Drag the link from the little square (outbound knob) of the first component onto the second component (either the component itself or the circular inbound knob). This makes these components to be executed sequentially.
Some components have two nodes because they represent loops:
These components can link to a collection of components between these two nodes and that will make these inner components to be executed on each iteration. For example, in this situation:
- Component A will be executed before our Iterator.
- Components B and C will be executed in that order on each iteration of the loop.
- Component D will be executed after all the iterations of the loop are completed.
You may have noticed that the link between the two nodes of the Iterator component is automatically created for you when you drop the Iterator on the workspace. This link cannot be deleted, and represents the control flow from the end of an iteration to the beginning of the next.
Input and output values
Apart from the order of execution we can define for each component a set of input parameters and an output result. The input parameters are configured in the “Inputs” tab of the panel at the bottom of the workspace.
Inputs are split in two types, mandatory and optional; the way of telling them apart is that mandatory inputs will not have the “-” symbol on the right side, so they cannot be removed.
A component also has three states: configuration, inputs and output states. Each one of them will be marked as white if everything is ok, red if there is any problem or yellow if the configuration should be reviewed and confirmed.
Wrapper inputs and return values
The last concept that we need to review before starting to create wrappers is the inputs and outputs of the wrapper itself.
Wrappers can receive input parameters so the actual execution of the wrapper depends on the query we send to Denodo (for example, if we want a wrapper to execute a search on a web site, we can specify the search term as an input parameter of the wrapper).
The output result of a wrapper is specified as a list of records. Each record has a set of named fields, each one with a type and a value. The whole list of records can be seen as a relational table, and Denodo Virtual DataPort will display the wrapper as a table when deployed in the Denodo server.