Tagsets and Scanners

Tagsets and scanners are key elements in the operation of the Extractor component (see section Configuration of the Extractor Component). Usually, the user does not need to create his/her own tagsets and scanners because: 1) the Extractor component includes an option (activated by default) to auto-generate the tagsets and scanners required by an Extractor from the examples provided by the user (see section Configuration of the Extractor Component), and 2) ITPilot includes a set of pre-generated tagsets and scanners that are enough in most situations (see section Tagsets and Scanners Included in the Distribution).

However, there are times in which advanced users may want to create their own tagsets, if the Extractor specification has been built manually instead of providing examples, or if the generated and/or included tagsets are not enough in some specific situation.

This section will start providing basic information in order to understand the tagset fundamentals (section Understanding Tagsets). It will then be explained how to create them by using the ITPilot generation tool (section Graphical Creation of New Tagsets). The section Tagsets and Scanners Included in the Distribution lists and describes the scanners and tagsets included in ITPilot. At last, section Lexer Types defines the concept of “lexer type”.

Add feedback