Most Relevant Terms of a Document

Denodo Aracne is capable of automatically generating the most relevant words of a field in the document, using the TFIDF (Term Frequency Inverse Document Frequency) relevance measure. These terms can be obtained as part of the result of a search carried out on an Aracne index.

During the search process, it is possible to specify from what fields the most relevant terms are to be obtained using an instance of the class com.denodo.arn.index.client.MainTermsConfig. This object will contain an instance of the class MainTermsFieldConfig for each field for which the relevant terms are desired. It is needed to specify:

  • Maximum number of relevant terms of the field which will be included for each document resulting from the search.
  • List of relevant terms to be rejected (optional). List of “usual words” (separated by commas) which should not appear amongst the most relevant terms in this field. If Aracne were to generate, amongst the most relevant terms of the field content, any appearing in this list, it would be eliminated from the list of relevant terms. It is important to realize that here it is necessary to specify only usual words of the specific application. The usual words of the language used, such as articles, pronouns, etc. (commonly referred to as stop words) are already deleted by Denodo Aracne.

Moreover, the MainTermsConfig class also allows specifying a list of usual words common to all the fields for which the most relevant terms are to be obtained. Once again, it is not necessary to be concerned with specifying usual words of the language used, such as articles, pronouns, etc. (commonly referred to as stop words).

The object which represents each of the results of the search, com.denodo.commons.Document, provides methods which allow obtaining the list of relevant terms for each field of the document as MainTerms objects. The MainTerms objects are stored in the MAINTERMS field of the Document.