Splunk is a software for searching, monitoring and analyzing machine-generated data via a Web-style interface.
Splunk's mission is to make machine data accessible across an organization by identifying data patterns, providing metrics, diagnosing problems, and providing intelligence for business operations. You can use Splunk together with Denodo logs to take advantage of all these features.
Now you can add data to your Splunk deployment. The data is processed and transformed into a series of individual events that you can view, search, and analyze.
In order to add new data you must click on the Add Data option of the Splunk Home or click on Settings > Add Data on the Splunk bar. The Splunk platform works with both streaming and historical data.
As you can see in the image below, there are several options for getting data into your Splunk deployment with Splunk Web.
Regardless of the method used to add data, the process of transforming the data is called indexing. During indexing, the incoming data is processed to enable fast searching and analysis. The processed results are stored in the index as events. The index is a flat file repository for the data and, by default, all of your data is put into a single, preconfigured index.
Adding Denodo Logs
If you want to add events from Denodo logs you can use the upload option whether you work with historical data or the monitor option if you want to add the events as they are written in the log file. Both options are available to handle Denodo Server Logs (<DENODO_HOME>/logs/vdp/) or Denodo Monitor Logs.
Denodo Monitor Logs are available under <DENODO_HOME>/tools/monitor/denodo-monitor/logs or under <SOLUTION_MANAGER_HOME>/resources/solution-manager-monitor/work, in case you have launched the Denodo Monitor using the Solution Manager. In this second scenario, Denodo Monitor can collect the execution logs from a single Virtual DataPort server, or from all the servers of a cluster or environment and each log file name starts with the server name to which the data it contains belongs. See the Monitoring section of the Solution Manager Administration Guide for more information.
Upload Denodo Logs
The Upload option lets you upload a file or an archive of files for indexing. When you click Upload, Splunk Web goes to a page that starts the upload process.
After selecting a file, the Set Source Type page lets you set the source type of your data and preview how it will look once it has been indexed. This ensures that the data has been formatted properly and make any necessary adjustments. Note that some events could be omitted in this preview because there is a property called max_preview_bytes in the limits.conf file (it can be found at $SPLUNK_HOME/etc/system/default/ or $SPLUNK_HOME/etc/system/local/, local takes precedence) that establishes a maximum number of bytes to read from each file during preview. By default, max_preview_bytes is set to 2000000. Omitted events during the preview will be added and you can see them after the data addition process.
By assigning the correct source type to your data, the indexed version of the data (the event data) will look the way you want it to, with correct timestamps and event breaks. Therefore, you may need to create a new source type with customized event processing settings. See Creating a Source Type section for more information.
Note that once you have created a source type for your log, you can select it in the Source type drop-down list and if you want to discard some events, such as headers or CREATE JAR statements, you should have created the Source Type previously.
Monitor Denodo Server Logs and Denodo Monitor Logs
The Monitor option lets you monitor one or more files, directories, network streams, scripts, Event Logs (on Windows hosts only), performance metrics, or any other type of machine data that the Splunk Enterprise instance has access to. When you click Monitor, Splunk Web loads a page that starts the monitoring process.
Select Files & Directories and browse the file.
You also need to choose how you want Splunk Enterprise to monitor the file. You may choose the Continuously Monitor option that sets up an ongoing input. Splunk Enterprise monitors the file continuously for new data.
In the next step, the Set Source Type page provides an easy way to view the effects of applying a source type to your data and to make adjustments to the source type settings as necessary using the preview. Note that some events could be omitted in this preview because there is a property called max_preview_bytes in the limits.conf file (it can be at $SPLUNK_HOME/etc/system/default/ or $SPLUNK_HOME/etc/system/local/, local takes precedence) that establish a maximum number of bytes to read from each file during preview. By default, max_preview_bytes is set to 2000000. Omitted events during the preview will be added and you can see them after the data addition process.
You will need to create a new source type with customized event processing settings (see Creating a Source Type section for more information). Once you have created a specific source type for your log, it can be selected in the Source type drop-down list.
You should keep in mind that you can discard some events, like headers or CREATE JAR statements, but to carry out this action you must have previously created the source type (see Creating a Source Type and How to discard specific events sections for further information).
Monitoring directories and log file rotation
Select Files & Directories and browse the directory.
Denodo Server Logs are under <DENODO_HOME>/logs/vdp while Denodo Monitor Logs are under <DENODO_HOME>/tools/monitor/denodo-monitor/logs or, whether the Denodo Monitor is launched by the Solution Manager, under <SOLUTION_MANAGER_HOME>/resources/solution-manager-monitor/work. Take into account that working with the Solution Manager, the Denodo Monitor generates log files in different folders:
- Log files of environments: <EnvironmentName>/<timestamp>/logs
- Log files of clusters: <EnvironmentName>/<ClusterName>/<timestamp>/logs
- Log files of servers: <EnvironmentName>/<ClusterName>/<ServerName>/<timestamp>/logs
Due to the creation of a <timestamp> folder every time the Solution Manager starts the Denodo Monitor, you must use the asterisk wildcard in the directory configuration for monitor a directory with Splunk. The asterisk (*) matches anything in that specific folder path segment and therefore you can monitor a log file of an environment, cluster or server, disregarding the folder with the timestamp name:
- Log files of environments: <SOLUTION_MANAGER_HOME>/resources/solution-manager-monitor/work/<EnvironmentName>/*/logs/*
- Log files of clusters: <SOLUTION_MANAGER_HOME>/resources/solution-manager-monitor/work/<EnvironmentName>/<ClusterName>/*/logs/*
- Log files of servers: <SOLUTION_MANAGER_HOME>/resources/solution-manager-monitor/work/<EnvironmentName>/<ClusterName>/<ServerName>/*/logs/*
After selecting a directory to monitor, the Whitelist and Blacklist boxes are available to specify rules in order to define which files to consume or exclude.
When you define a whitelist, Splunk Enterprise only indexes the files you specify. When you define a blacklist, the software ignores the specified files and processes all other files. You can also define whitelists and blacklists in the input stanza in inputs.conf.
The whitelist allows you to add all the backup files when you start to monitor. From now on, whenever Denodo or Denodo Monitor create a backup, file monitoring will continue and Splunk will recognize that this file has been rolled and will not read the rolled file a second time.
Denodo whitelists examples:
This example concerns the scenario where you want to collect the <DenodoServerName>-queries.log files generated by the Denodo Monitor launched by the Solucion Manager for an environment. In this environment, servers are named server01, server02, etc.
Keep in mind that you can monitor the backup files and then disable or delete this monitor input (Settings > Data inputs) and Splunk Enterprise only stops checking those files again, therefore you can keep these events and perform a new monitor with the current file keeping all the data.
In the next step, you must select a source type. By assigning the correct source type to your data, the indexed version of the data will look the way you want it to. Therefore, you need to create a new source type with customized event processing settings in order to index Denodo Logs correctly. Note that once a source type has been created, it can be selected in the Source type drop-down list. You may want to discard some events such as headers or CREATE JAR statements you have to have created the Source Type previously. See the Creating a Source Type section for more information.
Adding Denodo Monitor Logs from database
Denodo Monitor can save the log register in a database instead of log files. In order to be able to read this log information, Splunk has an add-on called Splunk DB Connect that bridges Splunk Enterprise with relational databases via Java Database Connectivity (JDBC). Keep in mind that inputs from databases using the Splunk DB Connect need an execution frequency parameter, it is not a monitoring process.
You have to install Splunk DB Connect and configure it to access databases. Remember to install a suitable JDBC driver depending on the database selected to store the information generated by the Denodo Monitor, create an identity with your database credentials and create a new connection.
To create a new input, click Data Lab > Inputs and then New Input.
- On the Set SQL Query page:
- Choose the connection that you want to use for this input from the drop-down list under the Connection field and choose the Catalog, Schema, and Table (request_notification or cache_notification) that contain the data you want to pull into Splunk.
- After you choose the table, the corresponding SQL query will be displayed in SQL Editor, you can preview the result of the query.
- Specify an input type for your query:
- Rising will be the best option in the request_notification table because you can use the autoincrementid column to keep track of what rows are new from one input execution to the next. In this scenario you need to select the rising column (autoincrementid) and set a checkpoint value to select those rows that contain the value higher than the value specified in this column. Therefore, in order to add all the data, this value must be set to zero.
Each time the input is finished running, DB Connect updates the input's checkpoint value with the value in the last row of the checkpoint column.
You also have to write your own SQL directly in SQL Editor to accept the checkpoint value and make sure it works correctly. You need to filter the rising column with a WHERE statement and sort the results with ORDER BY.
WHERE autoincrementid > ?
ORDER BY autoincrementid ASC
After these steps you have to click Execute SQL to preview the result.
- The cache_notification table does not have an auto-incremental column so, depending on the data that you want to add to Splunk you can use the rising option or select the batch option.
A rising input needs a rising column to keep track of what rows are new from one input execution to the next and note that timestamps are not ideal in this scenario because a high rate of events generation or if the data is added to the database with a date before one already read can cause data loss. However, you might want to add only part of the events, for example those that have endRefreshCacheProcess value in the notificationtype field. In this case, you can configure your SQL query in the editor and use the id column as rising column:
WHERE id > ? AND
notificationtype = "endRefreshCacheProcess"
ORDER BY id ASC
Nevertheless, if you want to add all the events, use the batch option but bear in mind that it invokes the same query each time the input is run and return all results. Therefore, when you want to refresh the data you will need to previously delete those already added to avoid duplicates.
- On the Set Properties page:
- Configure the Name. If you leave the Source field blank, in the Metadata section, the input name will be used as source and you can use it in the Search & Reporting application to create searches.
- Select Search & Reporting as Application.
- Set an Execution Frequency value (the number of seconds or a cron expression). This field is mandatory but remember not to put a small value when you are using the batch option because you will need to disable the input once the first load has been finished in order to avoid duplicates.
- Select a Source Type value to assign to queried data as it is indexed or type a name to create a new one. See the Creating a Source Type section for more information.
- Enter an Index value for the index in which you want Splunk Enterprise to store indexed data. You can enter the index name or choose it from the typeahead menu.
- Click Finish
Creating a Source Type
You can create new source types in several ways but note that after creating a Source Type, and before assigning it in an ‘Add Data’ process, you should follow the How to discard specific events section if you want to reject some events.
- Create a source type in the Source types management page selecting Source types from the Settings menu and clicking on New Source Type (in the upper right corner).
Then you can create, view and edit all field extractions and field transformations.
Extracting Fields for Denodo Monitor logs
Select Settings > Fields > Field transformations. Click on New Field Transformation and create a transformation setting the proper parameters for the file you want to read. Remember to use search as the destination app.
Monitor files (<DENODO_HOME>/tools/monitor/denodo-monitor/logs or <SOLUTION_MANAGER_HOME>/resources/solution-manager-monitor/work) are tab delimited so you can select the delimiter-based type, set "\t" as a delimiter and use the headers included in these logs to get the field list or use the lists included in the Appendix A: Denodo Monitor Logs. Field lists.
For example, if you need a field transformation for a <DENODO_HOME</tools/monitor/denodo-monitor/logs/vdp-queries.log file you can see an example in the image below:
You also have to create a Field Extraction. To do this you should Select Settings > Fields > Field extractions and click on New Field Extraction. You must specify a name, the source type to which it applies (MySourceType in our example), Uses transform as type and the transform name (REPORT-MyFieldTransformation in our example).
Extracting Fields for Denodo Server Logs
Denodo Server Logs (<DENODO_HOME>/logs/vdp/) are not tab delimited so you must use an inline extraction and omit field transformations.
Select Settings > Fields > Field extractions. Click on New Field Extraction and create an extraction with the Inline type.
Finally, set the convenient regular expression. See Appendix B: Denodo Server Logs. Regular expressions to get examples of regular expressions for Denodo Server log files.
- Use the Set Source Type page in Splunk Web during the data addition process. The Set Source Type page forces you to create a new source type if you do not select one of the ones already created. Note that if you want to discard some events such as headers or CREATE JAR statements you have to have created the Source Type previously.
You can create a Source Type editing the props.conf file as mentioned in the following section or you can go to the Source types management page as explained in the previous section (Settings > Source types). Field extractions and transformations are not necessary because you can create them after the Review step in the Add Data process, clicking on Extract Fields in order to select a sample event to extract fields using a regular expression or delimiters (space, comma, tab, pipe or other that you can specify).
Extracting Fields for Denodo Monitor logs
You can extract fields using the delimiters option, with tab as delimiter.
You must use the headers that you can find in these logs in order to rename the extracted fields. The field lists are available in the Appendix A: Denodo Monitor Logs. Field lists too.
Extracting Fields for Denodo Server Logs
Denodo Server Logs are not tab delimited, you should create a regular expression in order to extract the events.
Clicking on I prefer to write the regular expression myself you can add and check your regular expression. See the Appendix B: Denodo Server Logs. Regular expressions to get examples of regular expressions to read events from Denodo Server log files.
Furthermore, you can create a regular expression in the Select Fields step which allows you to select one or more values in the sample event to create fields.
You can also use the Preview section and click an additional event to add it to the set of sample events and use it to improve the extraction.
Moreover, a Show Regular Expression link is available to check the created regular expression and also to edit it in order to modify the regular expression to adjust it to the log file events.
Finally, you should set a name and permissions to the extraction:
- Edit the props.conf configuration file directly. However, a best practice for creating source types is to use Splunk Web, to guarantee that source types are created consistently across your Splunk deployment. See Edit props.conf section in splunk documentation for more information.
How to discard specific events
You can eliminate unwanted data by routing it to the nullQueue, the Splunk equivalent of the Unix /dev/null device. When you filter out data in this way the data is not forwarded and doesn't count toward your indexing volume.
To discard some events you must edit $SPLUNK_HOME\etc\apps\search\local\props.conf (this file may be in other path depending on the App Context configured during the ‘Add Data’ process) to add a TRANSFORMS-null setting to determine queue routing based on event metadata. It must be added in the stanza of the created source type for the events that you want to filter:
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Custom
pulldown_type = true
TRANSFORMS-null = setnull_monitor_vdp-queries_source_type_01, setnull_monitor_vdp-queries_source_type_02
Edit $SPLUNK_HOME\etc\apps\search\default\transforms.conf (this file may be in other path depending on the App Context configured during the ‘Add Data’ process) to create the corresponding stanza (using the value given to TRANSFORMS-null attribute). Set the routing rules, using a regular expression, for the routing transform (DEST_KEY to “queue” and FORMAT to “nullQueue”):
REGEX = Query logging started at.*?
DEST_KEY = queue
FORMAT = nullQueue
REGEX = .*?CREATE JAR.*
DEST_KEY = queue
FORMAT = nullQueue
After these changes, restart Splunk Enterprise.
During the trial and configuration process you may need to delete events. Note that the monitoring processor picks up new files and reads the first 256 bytes of the file. The processor then hashes this data into a begin and end cyclic redundancy check (CRC), which functions as a fingerprint representing the file content. Therefore, if you want to add a file to the monitoring process after deleting it and its events, you have to change the file (adding blank rows at the beginning, for example) so that Splunk does not detect it as a file already added.
Keep in mind that deleting a data input (Settings > Data inputs) implies that Splunk will no longer index data from this source but it does not mean deleting the events associated with it. You can delete events using the delete command by running it with a user with the delete capability or the can_delete role.
Creating reports, charts and dashboards
After adding your data you have the possibility to save and share your searches. For detailed information concerning the ‘Add Data’ process, see the Adding Data section.
Example: CPU % used
You can monitor the <DENODO_HOME>/tools/monitor/denodo-monitor/logs/vdp-resources.log file., from the events you can get information regarding the CPU Percentage used.
You must define the fields in a new source type using the headers of the file. Then you will have a field that will correspond to the CPU% column of the file that represents CPU Percentage used (CPUpercentage in our example) and another file that will correspond to the Date column (Date in our example) and shows the moment in which this value was collected.
You can create a search in order to retrieve Date and CPU % in the last hour (setting the time range to Last 60 minutes):
If you save this search, a report can be created with the CPU % used in the last hour and its content is the statistic table that you can see as search result:
A Save As Report dialog box will appear and you must add a title and, optionally, a description.
The Time Range Picker gives you the option of running the report with a different time range.
You can create a search in order to retrieve Date and CPU % in the last hour (setting the time range to Last 60 minutes) and then select the Visualization option:
The content option allows you to save as report only a line chart (default option), a statistics tab like in the statistics table section or both. You may change the Content option to store line chart and statistics table:
Dashboards are views that are made up of panels. The panels can contain modules such as search boxes, fields, charts, tables, and lists. Dashboard panels are usually connected to reports.
After you create a search visualization or save a report, you can add it to a new or existing dashboard.
A new panel will be created in a new or existing dashboard following the dialog:
There is also a Dashboard Editor that you can use to create and edit dashboards. The Dashboard Editor is useful when you have a set of saved reports that you want to quickly add to a dashboard. You can click on Dashboards and then Create New Dashboard or click on Edit in an already created dashboard:
The editing options are also available in the upper right corner of the dashboards screen.
Editing options offer the possibility to change permissions, change the visualization type in the panel, and to specify how the visualization displays and behaves.
For example, you can use the Add Input option to add a time range picker:
The Time range picker input control appears on the dashboard:
Besides, you can edit the XML configuration (edit source) which provides access to features not available from the Dashboard Editor like specify a custom number of rows in a table.
Adding an alert
Splunk allows you to use alerts in order to monitor for and respond to specific events.
You can start with a search for the events you want to track and save the search as an alert. For example, you may want to track the CPU usage in the last hour and receive an alert when it exceeds 60%.
Alerts can be scheduled or real-time but the latter is more appropriate for the scenario of this example. The alert dialog will allow you to configure the settings, the trigger conditions and the trigger actions:
This alert will show a record on the Triggered Alerts page (Activity > Triggered Alerts) every time that the CPU% used (registered in the vdp-resources.log) exceeds 60%.
Appendix A: Denodo Monitor Logs. Field lists
Field list for vdp-connections.log or <DenodoServerName>-connections.log:
"ServerName", "Host", "Port", "NotificationType", "ConnectionId", "ConnectionStartTime", "ConnectionEndTime", "ClientIP", "UserAgent", "AccessInterface", "SessionId", "SessionStartTime", "SessionEndTime", "Login", "DatabaseName", "WebServiceName", "JMSQueueName", "IntermediateClientIP"
Field list for vdp-datasources.log or <DenodoServerName>-datasources.log:
"Date", "DatabaseName", "DataSourceType", "DataSourceName", "ActiveRequests", "NumRequests", "MaxActive", "NumActive", "NumIdle", "PingStatus", "PingExecutionTime", "PingDuration", "PingDownCause"
Field list for vdp-loadcacheprocesses.log or <DenodoServerName>-loadcacheprocesses.log:
"SessionId", "ServerName", "Host", "Port", "NotificationType", "NotificationTimestamp", "Id", "QueryPatternId", "DatabaseName", "ViewName", "SqlViewName", "ProjectedFields", "NumConditions", "VDPConditionList", "CacheStatus", "TtlStatusInCache", "TtlInCache", "QueryPatternState", "Exception", "NumOfInsertedRows", "NumOfReceivedRows", "StartQueryPatternStorageTime", "EndQueryPatternStorageTime", "QueryPatternStorageTime", "StartCachedResultMetadataStorageTime", "EndCachedResultMetadataStorageTime", "CachedResultMetadataStorageTime", "StartDataStorageTime", "EndDataStorageTime", "DataStorageTime"
Field list for vdp-queries.log or <DenodoServerName>-queries.log:
"ServerName", "Host", "Port", "Id", "Database", "UserName", "NotificationType", "SessionId", "StartTime", "EndTime", "Duration", "WaitingTime", "NumRows", "State", "Completed", "Cache", "Query", "RequestType", "Elements", "UserAgent", "AccessInterface", "ClientIP", "TransactionId", "WebServiceName"
Field list for vdp-resources.log or <DenodoServerName>-resources.log:
"ServerName", "Host", "Port", "Date", "Metaspace", "PS Survivor Space", "PS Old Gen", "PS Eden Space", "Code Cache", "HeapMemoryUsage", "NonHeapMemoryUsage", "LoadedClassCount", "TotalLoadedClassCount", "ThreadCount", "PeakThreadCount", "VDPTotalConn", "VDPActiveConn", "VDPActiveRequests", "VDPWaitingRequests", "VDPTotalMem", "VDPMaxMem", "CPU%", "GC_CC:PS MarkSweep", "GC_CC:PS Scavenge", "GC_CT:PS MarkSweep", "GC_CT:PS Scavenge", "GC%:PS MarkSweep", "GC%:PS Scavenge", "GC%"
Appendix B: Denodo Server Logs. Regular expressions
Regular expression for vdp.log:
^(?P<Id>\d+)\s+(?P<Thread>\[.*?\]+)[^ \n]* (?P<Level>\w+)\s+(?P<Date>[^ ]+)[^ \n]* (?P<Category>[^ ]+)[^ \n]* (?P<NDC>\[.*?\])\s+\-\s+(?P<Message>.*)
Regular expression for vdp-cache.log:
^(?P<Id>[^ ]+)\s+(?P<Thread>\[.*?\]+)\s+(?P<Date>[^ ]+)\s+(?P<NDC>\[.*?\])\s+\-\t(?P<Message>.*)
Regular expression for vdp-queries.log:
^(?P<Id>[^ ]+)\s+(?P<Thread>\[.*?\]+)[^ \n]* (?P<Date>[^ ]+)\s+(?P<NDC>\[.*?\])\s+\-\t(?P<QueryId>[^\t]+)\t(?P<Database>[^\t]+)\t(?P<UserName>[^\t]+)\t(?P<NotificationType>[^\t]+)\t(?P<SessionId>[^\t]+)\t(?P<StartTime>[^\t]+)\t(?P<EndTime>[^\t]+)\t(?P<Duration>[^\t]+)\t(?P<WaitingTime>[^\t]+)\t(?P<NumRows>[^\t]+)\t(?P<State>[^\t]+)\t(?P<Completed>[^\t]+)\t(?P<Cache>[^\t]+)\t(?P<Query>[^\t]+)\t(?P<TotalRequests>[^\t]+)\t(?P<TotalJDBCConnectionTime>[^\t]+)\t(?P<PartialJDBCConnectionTime>[^\t]+)\t(?P<TotalJDBCResponseTime>[^\t]+)\t(?P<PartialJDBCResponseTime>[^\t]+)\t(?P<FeedCache>[^\t]+)\t(?P<CachedResultsViewNames>[^\t]+)\t(?P<TotalCacheResultsCheckTime>[^\t]+)\t(?P<PartialCacheResultCheckTime>[^\t]+)\t(?P<JDBCConnectionIds>[^\t]+)\t(?P<RequestType>[^\t]+)\t(?P<Elements>[^\t]+)\t(?P<UserAgent>[^\t]+)\t(?P<AccessInterface>[^\t]+)\t(?P<ClientIP>[^\t]+)\t(?P<TransactionID>[^\t]+)
Regular expression for vdp-requests.log:
^(?P<Id>\d+)\s+(?P<Thread>\[.*?\]+)\s+(?P<Level>[^ ]+)\s+(?P<Date>[^ ]+)[^ \n]* (?P<Category>[^ ]+)\s+(?P<NDC>\[.*?\])\s+\-\s+(?P<Request>.*)