Most Relevant Terms of a Document¶
Denodo Aracne is capable of automatically generating the most relevant words of a field in the document, using the TFIDF (Term Frequency Inverse Document Frequency) relevance measure. These terms can be obtained as part of the result of a search carried out on an Aracne index.
During the search process, it is possible to specify from what fields
the most relevant terms are to be obtained using an instance of the
class com.denodo.arn.index.client.MainTermsConfig
. This object will
contain an instance of the class MainTermsFieldConfig
for each field
for which the relevant terms are desired. It is needed to specify:
Maximum number of relevant
terms
of the field which will be included for each document resulting from the search.List of relevant terms to be rejected (optional). List of “usual words” (separated by commas) which should not appear amongst the most relevant terms in this field. If Aracne were to generate, amongst the most relevant terms of the field content, any appearing in this list, it would be eliminated from the list of relevant terms. It is important to realize that here it is necessary to specify only usual words of the specific application. The usual words of the language used, such as articles, pronouns, etc. (commonly referred to as stop words) are already deleted by Denodo Aracne.
Moreover, the MainTermsConfig
class also allows specifying a list of
usual words common to all the fields for which the most relevant terms
are to be obtained. Once again, it is not necessary to be concerned with
specifying usual words of the language used, such as articles, pronouns,
etc. (commonly referred to as stop words).
The object which represents each of the results of the search,
com.denodo.commons.Document
, provides methods which allow obtaining
the list of relevant terms for each field of the document as
MainTerms
objects. The MainTerms
objects are stored in the
MAINTERMS field of the Document
.