Generating the Data Extraction Specifications Manually

The process of generating a DEXTL program for data extraction can also be performed manually, instead of relying in the automated generation from examples assigned in the browser. This way needs a complete understanding of the DEXTL language (see ITPilot DEXTL Guide) and it is only recommended for advanced users.

The outline of the process is the following:

  1. Once decided what information is going to be extracted, create a structure for that data in the Structure pane.
  2. Highlight some data to be extracted in the browser and check what tags are present in that chunk, using the Token Viewer pane.
  3. From the tags of the web fragment, use the Generation pane to type the DEXTL program that extracts the desired data.
  4. Test the specification in the Specification Test pane, and repeat steps 3 and 4 until all the data is extracted correctly.