Metadata Search¶
The search by metadata allows you to seek views or web services that contain the query terms in their metadata, considering an element’s metadata as:
Its name.
Its description.
The names of its fields.
The descriptions of its fields.
The values of any custom properties it has assigned.
To create a search by metadata you have to:
Type your query in the search bar located at the upper left corner.
On the left side (in the Filter results section), select Metadata.
Select your filtering options.
Launch the query, by one of the following ways:
Press ENTER right after you typed your search terms.
Click the icon from the search bar.
Click the Apply filters button at the bottom of the Filter results section.
Note
The first time you search some terms, the Data Catalog will perform a search by metadata. In addition, if the search by content is not available, all your searches will be by metadata.
The results list supports two display modes: list or grid. To have more information about each result, click the icon to enable the list mode. Instead, if you prefer more results in the page, click the icon to enable the grid mode.
For each result, you can check how many fields or custom properties matched with the search keywords and how many fields matched with VDP tags. If you want to know more information about them, just click the link with the number of matches and check the fields and custom properties that matched.
Take into account that the search by metadata supports multiple search terms and it is inclusive and case insensitive:
Multiple search terms. Search for an exact phrase consisting of one or more words is supported. It is also possible to search the elements that contain all the search terms entered or those that contain at least one of the search terms.
Inclusive. For a view or web service to appear in the results list, its metadata must contain the search terms. For instance, the search term
customer
will match with the viewscustomer
anddv_customers
, since the search term is included in their names.Case insensitive. Uppercase and lowercase letters are treated the same when the Data Catalog compares the terms in the query and the terms in the metadata. For instance, the search term
CUSTOMER
will match with the viewscustomer
andCustomer
, since their names are composed by the same letters.
Note
If the query is empty, all the views and web services will match with it.
In addition, you can refine your results applying some of the available filtering options under the Filter results section:
Element type. Select which type of elements you want in the results:
Views and web services.
Views only.
Web services only.
Search option. Select the desired search type:
Exact match. This is the default option. You can search for an exact phrase consisting of one or more words and only the elements containing the particular sequence of terms in the exact order you entered will be retrieved.
All the words. Allows a search including all the search terms. When searching all the words, only the elements containing all the words in some of the search fields will be returned (i.e. all the words are contained in the element name, all the words are contained in the description, all the words are contained in a field name, all the words are contained in a field description, or all the words are contained in a property value).
Any of the words. Allows a search including at least one of the search terms. When searching any of the words, only the elements containing at least one of the words in some of the search fields will be returned.
Note
The maximum number of words for All the words and Any of the words search options is 6. The seventh and any subsequent words will be ignored.
Search fields. By default all the fields in the metadata participate in the search. You can alter this set by selecting which ones should be taken into account:
Name.
Description.
Field name.
Field description.
Property value.
Collaboration. Only the views and web services with at least one of the collaboration you choose will appear in the results:
Endorsed.
Warned.
Deprecated.
Databases. By default, all the databases in the Virtual DataPort server participate in the search. You can refine your results list by selecting the list of databases to consider.
Categories. If you select a category, in the results list will only appear those views and web services that belong to that category.
Tags. If you select a tag, in the results list will appear those views and web services with that tag assigned. When VDP tags are imported, the results list will also shows those views containing some field assigned to the tag.
Important
An administrator of the Data Catalog should consider that:
In order to work, the search by metadata requires that the Data Catalog has been synchronized with the Virtual DataPort server at least once. Otherwise, the results list will be empty.
For a view or web service to appear in the results list, the user must be granted with the
CONNECT
privilege on its database and with theMETADATA
privilege on it.
Metadata Search in Index¶
The search by metadata in index is only available if a metadata index has been properly configured for the current Virtual DataPort server. You can find more details in the Search Engines Configuration - Metadata search section.
Do this to search by metadata in an index:
Type your query in the search bar located at the upper left corner.
On the left side (in the Filter results section), select Metadata.
Select your filtering options.
Launch the query, by one of the following ways:
Press ENTER right after you typed your search terms.
Click the icon from the search bar.
Click the Apply filters button at the bottom of the Filter results section.
The search by metadata in the index allows you to find views or web services that contain the query terms in their metadata. It also supports complex queries with advanced features such as:
Wildcards in the query terms.
Fuzzy searches, that allows for finding results that are similar to the search query, even if the query terms are not an exact match to the terms in the indexed metadata.
Proximity searches, which require that some terms are near each other at a certain distance.
Boosting some terms in the query.
Boolean expressions.
And more…
Since search engine supported for metadata index is Scheduler Index server which uses Lucene internally, the supported query syntax can be checked in Apache Lucene Search Syntax.
Note that the filter options for searching are the same as those explained in the previous section for metadata search in the database, with the exception of the search option, which is not applicable for metadata search in the index.
In the results list of a search by metadata in index, the query terms found in the metadata will be highlighted to indicate in a simple way where the search match has occurred.
Take this into account:
The search results displayed depend on the analyzer used to create metadata index when it is configured in Search engines section. This is important to understand how the index search works and also take into account all the processes involved.
Tokenization. This process involves dividing a text or document into individual units called tokens. Lucene uses a tokenizer to split the text into words based on language-specific rules. Common delimiters like whitespace, punctuation, and special characters are typically used to identify word boundaries.
Lowercasing. After tokenization, the text is converted to lowercase. This step ensures that the search is case-insensitive, meaning queries can match words regardless of their capitalization.
Stopword Removal is a step in the text analysis process that involves filtering out common words, known as stopwords, from the text before indexing. Stopwords are words that occur frequently in a language but often carry little semantic meaning. Examples of stopwords in English include “and,” “the,” “is,” “in,” etc. The purpose of stopword removal is to reduce the size of the index and improve search efficiency removing words that do not contribute to the relevance of search results.
Stemming is the process of reducing a word to its root form, ensuring that variants of a word match during a search. The algorithm applied for stemming also depends on the analyzer selected to index the metadata.
Consider this:
Lucene provides several built-in analyzers, and each of them has a corresponding stemmer (if applicable) for the specific language it supports.
The
StandardAnalyzer
does not perform stemming.Data Catalog has custom implementations of analyzers for English and Spanish languages, using
org.tartarus.snowball.ext.EnglishStemmer
andorg.tartarus.snowball.ext.SpanishStemmer
for this purpose. EnglishStemmer implements a modified version of the original Porter algorithm.