Aracne Sources

Virtual DataPort can create data sources that use search indexes built using Denodo Aracne.

To create a new Aracne data source, right-click on the Server Explorer and click New > Data source > Aracne.

The Tool will display the dialog to create the data source.

Creating an Aracne data source

Creating an Aracne data source

The following data are requested in this dialog:

  • Name. Name of the new data source.
  • Host name. Host where Aracne Index/Search Engine is running.
  • Port. Port where the Aracne Index/Search Engine (not the Aracne Server) is listening to connections. The default port is 9000.
  • Login and Password. Credentials to access Aracne Index.

In the Metadata tab, you can set the folder where the data source will be stored and provide a description.

When editing the data source, you can also change its owner by clicking the button image1. Click Save to create the data source.

Click Create base view to display the indexes contained in the Aracne server and their fields (see Description of an Aracne data source).

To search a schema, type its name or the name of one of its fields in the box located at the top of the dialog. The list will only show the elements whose name contains the text you entered.

Description of an Aracne data source

Description of an Aracne data source

Click Create Base View beside the index you want to extract data from (see Creating an Aracne base view). The Tool will display the schema that the new view will have. In this dialog, you can change the name of the base view, change the name and type of its fields and remove the fields you do not need.

The Aracne base views will have an attribute of the appropriate type for each field of the Aracne index. Usually, Aracne indexes are used to index the data extracted by Denodo Scheduler jobs and their schema depends on the type of executed job (see the Denodo Scheduler Administration Guide for further details). The figure above shows the fields in an index created by the documents exported by a Scheduler job of type ARN.

Creating an Aracne base view

Creating an Aracne base view

The following subsection (Accessing the Most Relevant Terms of a Document) explains how to use the “Create main terms” button, beside each field of the view.

In the Metadata tab, you can set the folder where the base view will be stored and provide a description.

When editing the base view, you can also change its owner by clicking the button image4.

Click Save to create the base view.

The most common way of querying the base relations built from Aracne sources is using the CONTAINS operator, which runs complex Boolean searches on indexed textual data (see section Support for the Contains Operator of Each Source Type of the VQL Guide).

Accessing the Most Relevant Terms of a Document

Denodo Aracne is capable of automatically generating the most relevant words of a document or one of its fields, according to the TFIDF (Term Frequency Inverse Document Frequency) relevance measurement. This section explains how to access these terms.

The most relevant terms the document are accessed as new fields of the base view. To create a new field in the base view containing the most relevant terms of the value of a document field, click Create main terms next to the type of the field.

For example, to add a new attribute containing the most relevant terms of the searchablecontent field, click Create main terms alongside its type. Adding an attribute with the most relevant terms of the searchablecontent field shows the new attribute: SEARCHABLECONTENT_MAIN_TERM.

Adding an attribute with the most relevant terms of the searchablecontent field

Adding an attribute with the most relevant terms of the searchablecontent field

The new attribute searchablecontent_MAIN_TERM will contain an array of registers. Each register of the array contains two subfields:

  • The relevant term. The default name of this field is the name of the field with the suffix _TERM. In this case, the name will be searchablecontent_TERM.

  • Its position in the list of the most relevant. The default name of this field is the name of the field with the suffix _SCORE. In this example, the name will be SEARCHABLECONTENT_SCORE.

    The most relevant term takes position 1.

When creating these new attributes, you can specify two parameters:

  • Number of main terms. Maximum number of relevant terms to be included for each document.
  • Filter main terms words. This is the list of “usual words” (separated by commas) that must not appear among the most relevant terms for this field. If the list of relevant terms generated by Aracne includes any of those, they will be removed from the list. It is important to note that only usual words specific to the application must be specified. The usual words in the language used such as articles, pronouns, etc. (commonly known as “stopwords”) are already eliminated by Denodo Aracne. The list of usual words may be contained in a file. The file specified must be a text file, where the words will be separated by commas.

At the bottom of the dialog, in the General words to filter main terms box, you can enter a list of “usual words” common to all the fields of the base view. Once again, you do not have to worry about specifying usual words in the language such as articles, pronouns, etc. because they are already eliminated by Denodo Aracne.

The value of these parameters can be modified later in the “Wrapper specific configuration” dialog, in the “Search Methods” tab of the view’s “Options” (see section Configuration Properties for Specific View Types).