The solution creates a Cube, containing: a metadata, the sources (DW facts) and multidimensional index. The system can host more than one cube, but we focus on a single cube with a number of sources. The picture below illustrates relations between cube's components:
In order to search and analyze application's log or other data, we need to create a mirror storage in OLAP cube. It's achieved by creating the database for each source. The source database must create 6 tables named: Log1 to Log6 (think of them as Facts tables). The fact tables are periodically synchronized with the source. Each Log table will store X days of historic data; by default 30 days. It means that the system can maintain up to 180 days of recent history, the older than 180 data (180 = 6 tables * each 30 days) will be overwritten. The age of historic data is customizable in a metadata.
The metadata manages synchronization of sources and multidimensional indexing. First, the new data will be stored in dedicated Facts' databases. Then, it will be indexed by multidimensional index to allow search and aggregate by the source's dimension(s).The granularity of the facts data is one to one with the source provider, meaning, all facts will be stored and indexed, allowing to drilldown to the specific row.
Sources use plugins to read application logs, event logs, windows event log, databases (through OLEDB), or, using custom .NET plugin (RSS feed example uses .NET plugin to read feeds; source code included).
The METADATA is the "LogSearch" database with a set of SQL stored procedures dedicate to synchronize sources and populate multidimensional index. The index is global for all sources. The dimensions are search or measure attributes. Each dimension maps the field(s) in the source(s) or by SQL expression (e.g. regular expressions' output). A dimension can be used by multiple sources. For example, consider dimension ClientID that maps to regular expressions' output in event log source and business application record related to the ClientID.
Metadata has built-in Created, EntityPath and LogText dimensions:
The solution exposes 2 UI interfaces. One for searching against all sources, other for drilling down a single source.
Each box's title represents the source and sub-boxes the sub-category within (e.g. error, warning or information for EventLog source). To view sources' rows/facts click on the title or inner box to access the sub-category.
The OLAP cube can perform the basic aggregate & measure operations (sum, min, max, avg, count). For example, consider SQL:
SELECT avg(dim4), sum(dim5) FROM Cube WHERE dim5 in (v1,v2...) GROUP BY dim1, dim2, dim3
Can be expressed as:
exec LogSearch.groupby '+dim5:(v1 v2 ..)', 'dim1,dim2,dim3', 'dim4#avg,dim5#sum', null
The solution has been running in production for 2 years, indexing 1.5 billion of rows from 15 sources. Its running distributed scimoreDB database of 5 nodes on a virtual machine (4 core XEON cpu, 32 GB ram, 600 GB hdd/san).
The solution is free. We offer paid services to integrate and maintain solution (for marketing pinch read our brochure).
For documentation, details in the samples and SQL scripts. Less is more ...:)
With suggestions, ideas or questions, please send us a mail to s u p p o r t at s c i m o r e dot c o m.