Aracne Custom Crawlers¶
To create a new custom crawler the interface
com.denodo.crawler.Crawler
needs to be implemented. This interface
has the following methods:
execute
. Method invoked by ARN to execute the crawler.stop
. Method invoked by Scheduler to stop the execution of the crawler.
The execution of the crawler must provide the results to Aracne in the
form of com.denodo.crawler.data.CrawlDocument
objects using the
add
methods from com.denodo.crawler.data.DataManager
.
package com.denodo.crawler.data;
public interface DataManager {
public void add(Collection documents);
public void add(Collection documents);
public void addEvent(CrawlEvent event);
public void addEvents(Collection events);
public void close();
public void setMappingWriter(MappingRepository writer);
public void setRepositoryWriter(FileRepository writer);
}
If during the execution of the custom crawler any event or error
occurs which Aracne needs to be informed about, the addEvent
or
addEvents
method from com.denodo.crawler.data.DataManager
must
be invoked.
The Aracne API for the creation of custom crawlers also allows a
repository to be built that stores copies of the data obtained by the
crawler. To do this, if the “binarydata” field from CrawlDocument
is
not empty, the contents of the document are stored in the repository.
The path for this repository would be that indicated by the “path”
field, if applicable; otherwise, that indicated by the encoded “url”
field.
For more information please refer to the Denodo Aracne Javadoc
documentation and the example of SalesforceCrawler in
DENODO_HOME/samples/arn/crawler-api
.