Retrieving data from a single page
The previous wrapper introduced automated browsing through pages. That is the first half of what a web automation wrapper usually does. The other big task found in common wrappers is extracting information from a page and giving it structure. In this example we will build a wrapper that browses to the JKF airport website at https://www.airport-jfk.com/arrivals.php and pulls realtime information about arrival of fights in the JKF airport.
Create a new wrapper named "jkfarrivalinfo".
Add a Sequence component that navigates to https://www.airport-jfk.com/arrivals.php.
The component used to retrieve information from a web page is called Extractor. Drag an Extractor component onto the workspace and link it after the Sequence component.
The Extractor component will extract data from the page that we navigated to, so set the output of the Sequence component (Sequence_1_output) as the "Input page" field of the inputs of the Extractor_1 component.
Let's focus now on the MSIE browser. In this page we could see the list of flight arrival detials along with their current status, in table format. We are going to create field for this and extract the data from the web page.
ITPilot learns how and what to extract from a page from examples given by the user. To do so, click the toolbar button labeled
Assign Examplesto enter into the extractor mode.
Pick the one fight arrival that has all the fields we want to extract so we set a baseline with all the information present.
Highlight the text below the "Origin" word from the flight arrival list that you selected, then right-click the selection. This will bring a contextual menu for assigning examples; in our case, select
New exampleand in the submenu that appears select
The dialog for new field creation will be displayed. Type the name of the field ("origin") leave the rest of options with the default value and click "Ok".
Do the same for the other five fields. Highlight the text for the "airline", then right-click and select "Example 1 > new field" and type "airline"; do the same for the "flight", "arrival", "terminal" and "status" fields.
Now we have assigned an example of the data we want to extract. If you right-click and select the "Example 1" field you will see the values for all assigned fields.
As we saw in our first inspection of the page, some flight arrivals lack some parts of the data. For example, some of the "terminal" field has no value. We need to provide ITPilot examples of this type of field. If there were more examples of flight arrival information which lack some other fields we would need to provide ITPilot examples of these types of field so it can generate a wrapper that extracts all the different combinations of optional fields.
Select the "terminal" field which has lack of data and repeat the steps 8, 9 and 10 for the new flight arrival information. Notice that you need to select
New examplefor the first field and then
Example 2for the rest of them.
Do not assign the values to the first example! ("Example 1").
Example 2 will not have any value assigned to the "terminal" field. Do not use the
Example 2 > new fieldoption for this example; you are assigning the same fields that were created for the previous example but for a different record, so select
Example 2 > origin,
Example 2 > flight, etc.
Now that we have all examples we need, go back to the WGT. Double-click on the Extractor component to bring up its configuration wizard. At the top left corner of the dialog there is a button labeled
Import from browser. Click it to transfer the examples from within the browser into the Extractor's wizard. When you do so, you will see that the "Generation" pane brings a wait animation while it generates a pattern to extract all the data from the page with the specified structure.
After a while, the Extractor wizard should had generated the specification and should be ready for testing. Click on the
Specification testbutton and then
Refreshon the new pane that appears. Doing so will test the extraction process on the current page, and should display on the table below the button all the flight arrival information from the page.
Click "Ok" to save the extractor component's configuration.
Now we need to return the output of the Extractor component as the result of the wrapper. The easiest way of doing this is to add an Output component to the wrapper. Drag it onto the workspace now.
Link the Output component after the Extractor component, and link the End component to the Output component.
Select the Extractor component output ("Extractor_1_output") in the "Input records" input field of the Output component. Note that you may have to click the "+" icon to the right of the "Input records" field to enable the selector.
Save the wrapper and test it. You will see that a new browser window appears, navigates to the flight arrival information page and after a while all the data present in the page should appear in the results tab of the test wrapper dialog.
In this section we have created a web wrapper that navigates to a page and retrieves information from that page in real time, making useful work and allowing us to actually use web data in our web integration solutions. In the next sections we will review more advanced usages of ITPilot for dealing with more complex scenarios.