Virtual DataPort can create data sources that use search indexes built using Denodo Aracne.
To create a new Aracne data source, right-click on the Elements Tree and click New > Data source > Aracne.
The Tool will display the dialog to create the data source.
The following data are requested in this dialog:
- Name. Name of the new data source.
- Host name. Host where Aracne Index/Search Engine is running.
- Port. Port where the Aracne Index/Search Engine (not the Aracne Server) is listening to connections. The default port is 9000.
- Login and Password. Credentials to access Aracne Index.
In the Metadata tab, you can set the folder where the data source will be stored and provide a description.
When editing the data source, you can also change its owner by clicking the button . Click Save to create the data source.
Click Create base view to display the indexes contained in the Aracne server and their fields (see Description of an Aracne data source).
To search a schema, type its name or the name of one of its fields in the box located at the top of the dialog. The list will only show the elements whose name contains the text you entered.
Click Create Base View beside the index you want to extract data from (see Creating an Aracne base view). The Tool will display the schema that the new view will have. In this dialog, you can change the name of the base view, change the name and type of its fields and remove the fields you do not need.
The Aracne base views will have an attribute of the appropriate type for each field of the Aracne index. Usually, Aracne indexes are used to index the data extracted by Denodo Scheduler jobs and their schema depends on the type of executed job (see the Denodo Scheduler Administration Guide for further details). The figure above shows the fields in an index created by the documents exported by a Scheduler job of type ARN.
The following subsection (Accessing the Most Relevant Terms of a Document) explains how to use the “Create main terms” button, beside each field of the view.
In the Metadata tab, you can set the folder where the base view will be stored and provide a description.
When editing the base view, you can also change its owner by clicking the button .
Click Save to create the base view.
The most common way of querying the base relations built from Aracne
sources is using the
CONTAINS operator, which runs complex Boolean
searches on indexed textual data (see section Support for the Contains Operator of Each Source Type of the
Accessing the Most Relevant Terms of a Document¶
Denodo Aracne is capable of automatically generating the most relevant words of a document or one of its fields, according to the TFIDF (Term Frequency Inverse Document Frequency) relevance measurement. This section explains how to access these terms.
The most relevant terms the document are accessed as new fields of the base view. To create a new field in the base view containing the most relevant terms of the value of a document field, click Create main terms next to the type of the field.
For example, to add a new attribute containing the most relevant terms
searchablecontent field, click Create main terms
alongside its type. Adding an attribute with the most relevant terms of
the searchablecontent field shows the new attribute:
The new attribute
searchablecontent_MAIN_TERM will contain an array
of registers. Each register of the array contains two subfields:
The relevant term. The default name of this field is the name of the field with the suffix
_TERM. In this case, the name will be
Its position in the list of the most relevant. The default name of this field is the name of the field with the suffix
_SCORE. In this example, the name will be
The most relevant term takes position 1.
When creating these new attributes, you can specify two parameters:
- Number of main terms. Maximum number of relevant terms to be included for each document.
- Filter main terms words. This is the list of “usual words” (separated by commas) that must not appear among the most relevant terms for this field. If the list of relevant terms generated by Aracne includes any of those, they will be removed from the list. It is important to note that only usual words specific to the application must be specified. The usual words in the language used such as articles, pronouns, etc. (commonly known as “stopwords”) are already eliminated by Denodo Aracne. The list of usual words may be contained in a file. The file specified must be a text file, where the words will be separated by commas.
At the bottom of the dialog, in the General words to filter main terms box, you can enter a list of “usual words” common to all the fields of the base view. Once again, you do not have to worry about specifying usual words in the language such as articles, pronouns, etc. because they are already eliminated by Denodo Aracne.
The value of these parameters can be modified later in the “Wrapper specific configuration” dialog, in the “Search Methods” tab of the view’s “Options” (see section Configuration Properties for Specific View Types).