» Documentation

Overview

Solution installation & Download.... Check RSS Feed online demo.

OLAP Cube

The solution creates a Cube, containing: a metadata, the sources (DW facts) and multidimensional index. The system can host more than one cube, but we focus on a single cube with a number of sources. The picture below illustrates relations between cube's components:

OlapCube.png

Sources

In order to search and analyze application's log or other data, we need to create a mirror storage in OLAP cube. It's achieved by creating the database for each source. The source database must create 6 tables named: Log1 to Log6 (think of them as Facts tables). The fact tables are periodically synchronized with the source. Each Log table will store X days of historic data; by default 30 days. It means that the system can maintain up to 180 days of recent history, the older than 180 data (180 = 6 tables * each 30 days) will be overwritten. The age of historic data is customizable in a metadata.

Synchronization

The metadata manages synchronization of sources and multidimensional indexing. First, the new data will be stored in dedicated Facts' databases. Then, it will be indexed by multidimensional index to allow search and aggregate by the source's dimension(s).The granularity of the facts data is one to one with the source provider, meaning, all facts will be stored and indexed, allowing to drilldown to the specific row.

Sources use plugins to read application logs, event logs, windows event log, databases (through OLEDB), or, using custom .NET plugin (RSS feed example uses .NET plugin to read feeds; source code included).

Metadata & Multidimensional index

The METADATA is the "LogSearch" database with a set of SQL stored procedures dedicate to synchronize sources and populate multidimensional index. The index is global for all sources. The dimensions are search or measure attributes. Each dimension maps the field(s) in the source(s) or by SQL expression (e.g. regular expressions' output). A dimension can be used by multiple sources. For example, consider dimension ClientID that maps to regular expressions' output in event log source and business application record related to the ClientID.

Dimension.png

Metadata has built-in Created, EntityPath and LogText dimensions:

  • Created. Time dimension. When indexing, the Datetime value formatted as YYYYMMDDTHH. As we can see the granularity is by hour, meaning the search and aggregation calculated down to an hour. Examples, of how to format search criteria in search:
    • 20131001T10 - find all events at 10 o'clock in 1st of October 2013.
    • 20131001* - show all events at 1st of October 2013.
    • 201310* - show all events in October 2013.
    • [20131001T00 TO 20131003T00] - all event between 1st and 2nd October 2013.
  • LogText. When binding the facts column / expression with LogText dimension, the content will be freetext searchable.
  • EntityPath. EntityPath value represents the facts hierarchy in the source. The format: sub-categ/sourceName. The level could be deeper than 2, however, in such case the search UI visualization needs modification. By default(and always) EntityPath should be mapped to Source name as well. Optionally, EntityPath can be expanded to identify the fact's hierarchy. For example, Event Log source categorize facts by severity, and value of EntitytPath will be respectively: 'error/eventlog', 'warning/eventlog', 'information/eventlog'.

The solution exposes 2 UI interfaces. One for searching against all sources, other for drilling down a single source.

When searching all sources, the output will be categorized and represented by TreeMap (using "JavaScript InfoVis Toolkit" by Nicolas Garcia Belmonte). NOTE: it works with chrome and IE10.

TreeMap.png

Each box's title represents the source and sub-boxes the sub-category within (e.g. error, warning or information for EventLog source). To view sources' rows/facts click on the title or inner box to access the sub-category.

OLAP queries & reports

The OLAP cube can perform the basic aggregate & measure operations (sum, min, max, avg, count). For example, consider SQL:

SELECT avg(dim4), sum(dim5)
FROM Cube
WHERE dim5 in (v1,v2...)
GROUP BY dim1, dim2, dim3 

Can be expressed as:

exec LogSearch.groupby '+dim5:(v1 v2 ..)', 'dim1,dim2,dim3', 'dim4#avg,dim5#sum', null 

Where parameters:

  • '+dim5:(v1 v2 ..)' - search criteria.
  • 'dim1,dim2,dim3' - grouping key.
  • 'dim4#avg,dim5#sum' - measures, translates to avg(dim4), sum(dim5).
  • The last parameter is additional filter.

Is it tested?

The solution has been running in production for 2 years, indexing 1.5 billion of rows from 15 sources. Its running distributed scimoreDB database of 5 nodes on a virtual machine (4 core XEON cpu, 32 GB ram, 600 GB hdd/san).

Support & Documentation

The solution is free. We offer paid services to integrate and maintain solution (for marketing pinch read our brochure).

For documentation, details in the samples and SQL scripts. Less is more ...:)

With suggestions, ideas or questions, please send us a mail to s u p p o r t at s c i m o r e dot c o m.