» Documentation

Freetext Log Store example

Installing 7 nodes on a single machine

Before installing make sure the machine has 8GB of RAM, and 64 bit OS. The easiest way to install multiple nodes cluster on a single machine is to use tpcckit's install.cmd batch script. After unpacking the zip content, run install.cmd. The script will ask for configuration parameters, install nodes and join them to a cluster:

NOTE: cancel batch script, when it asks for:

TPCC database populate tool
Create TPCC data CSV files? (Running for the first time or altering number of warehouses - requires to create.) y/n 

We don't need to install TPCC database.

Bellow is the example of execute install.cmd batch file:

D:\src\v4\test\trunk\tpcc\TPCCKit\tmp>install.cmd
=============================
= Installing ScimoreDB nodes
=============================
How many nodes to deploy? [1-128] (default: 7): 7
=============================
Set the root folder to deploy nodes (e.g.: c:\data): c:\data
A subdirectory or file c:\data already exists.
=============================
Set cache buffer size. The cache size measured in the number of DB pages. Each page use 16K. For example, 1.000.000 pages correspond to 16GB of RAM. 
Note: the configuration is per node (default: 25000):25000
=============================
Set TCP begin listen port. For example, if installing two nodes, and. begin port
=1000, then 1st node listens on 1000 and the second node on 1001 (default: 999): 999
=============================
Set number of CPU's (cores) utilized per node [2-64]. Note: the value must be pr
ime number: 1,3,5,7,11,... (Default: 7): 7
=============================
= Installing ScimoreDB nodes...
=============================
A subdirectory or file c:\data\db0 already exists.
Starting DB node instance:0
start chk: flush <47> pages
completed chk: flush <47> pages, time 16
Starting DB node instance:1
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
start chk: flush <47> pages
completed chk: flush <47> pages, time 93
Starting DB node instance:2
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
start chk: flush <47> pages
completed chk: flush <47> pages, time 94
Starting DB node instance:3
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
start chk: flush <47> pages
completed chk: flush <47> pages, time 109
Starting DB node instance:4
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
start chk: flush <47> pages
completed chk: flush <47> pages, time 109
Starting DB node instance:5
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
start chk: flush <47> pages
completed chk: flush <47> pages, time 110
Starting DB node instance:6
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
expanding table:
start chk: flush <47> pages
completed chk: flush <47> pages, time 156
=============================
= Success installing ScimoreDB nodes, starting up nodes...
=============================
The scimoredb-0 service is starting.
The scimoredb-0 service was started successfully.

The scimoredb-1 service is starting.
The scimoredb-1 service was started successfully.

The scimoredb-2 service is starting.
The scimoredb-2 service was started successfully.

The scimoredb-3 service is starting.
The scimoredb-3 service was started successfully.

The scimoredb-4 service is starting.
The scimoredb-4 service was started successfully.

The scimoredb-5 service is starting.
The scimoredb-5 service was started successfully.

The scimoredb-6 service is starting.
The scimoredb-6 service was started successfully.

=============================
= Finally, joining nodes to the cluster...
=============================
Scimore Command line SQL utility

140 rows affected
Time elapsed 169ms
TPCC database populate tool
Create TPCC data CSV files? (Running for the first time or altering number of wa
rehouses - requires to create.) y/n:^CTerminate batch job (Y/N)? y

D:\src\v4\test\trunk\tpcc\TPCCKit\tmp> 

Installing Manager GUI

First, download x64 MSI package and run. In the Features tab, unselect "Server" feaure. When installed, start manager, connect to database (server:localhost, port:999) and execute SQL query:

select * from system.sysinstances 

The query returns a list of cluster nodes. It must be 7 rows, each representing the nodes' details.

Create Log database

We use 2 tables: TestLog and LogTypeTexts. TestLog stores the log records and is partitioned by LogID column. The LogTypeTexts hasn't defined the PARTITION column, meaning the content of the table will be replicated to all nodes. We choose to replicate the table, since it will not have many rows, neither frequently updated. Being replicated table will benefit distributed join TestLog => LogTypeTexts, as every node can access the local copy of LogTypeTexts for each Testlog table's row.

To create the database, copy/paste SQL to manager's query window and execute:

CREATE DATABASE LogSearchDemo
go;
use LogSearchDemo
go;
--Create tables
CREATE TABLE [TestLog] -- partitioned table
(
[LogID] BIGINT NOT NULL  PARTITION, -- partition rows among scimoreDb nodes by hash value of LogID
[CreationDate] DATETIME NOT NULL ,
[LogTypeId] BIGINT NOT NULL,
[Param1] NCHAR NULL,
[Param2] NCHAR NULL,
[Param3] NCHAR NULL,
[Param4] NCHAR NULL,
[Param5] NCHAR NULL,
[Param6] NCHAR NULL,
[Param7] NCHAR NULL,
CONSTRAINT [PK_Log] PRIMARY KEY([LogId])
)
go;
CREATE TABLE [LogTypeTexts] -- replicated table (missing PARTITION keyword, i.e. the same content on all nodes)
(
[LogTypeId] BIGINT NOT NULL,
[Text] NCHAR NOT NULL,
CONSTRAINT [PK_Log] PRIMARY KEY([LogTypeId])
) 

In the next step, populate rows to both tables (copy/paste SQL and execute).

Populate log type text:

NOTE: it is not fast to insert, because, it's not a bulk load (where SQL Server import, XML or CSV import is). Each insert has to be optimized for distributed execution and sent to the right node for insert.

use LogSearchDemo
go;
declare @i bigint = 0
begin tran
while @i < 1000
begin
insert into [LogTypeTexts] values(@i,concat('log type Id',@i))
set @i = @i +1
end
commit 

Pupolate log type Id's:

use LogSearchDemo
go;
declare @i bigint = 0
begin tran
while @i < 100000
begin
insert into [TestLog] values(@i,getdate(),mod(@i,1000),concat('p1V',@i),concat('p2V',@i),concat('p3V',@i),concat('p4V',@i),concat

('p5V',@i),concat('p6V',@i),concat('p7V',@i))
set @i = @i +1
end
commit 

Create and populate freetext index

Now, we are ready to define the search patterns for the log search. The freetext index doesn't require matching one to one definition with the source table.

We define the freetext fields as:

use LogSearchDemo
go;
CREATE FREETEXT INDEX [TestLog_index] (
[LogID] BIGINT KEYWORD STORE UNINDEX,       -- store LogID, but do not index it
[CreationDate] DATETIME KEYWORD UNSTORE,    -- index creation date, but do not store it
[LogText] NCHAR UNSTORE,                -- Logtext is the matching [LogTypeTexts].[Text] field by [TestLog].[LogTypeID] 
[Parameters] NCHAR KEYWORD UNSTORE,     -- index multiple fields: [Param1] to [Param7]
CONSTRAINT DocumentIdent UNIQUE([LogID])
) 

DocumentIdent constraint sets the freetext index behaviour as a Document store: when indexing, multiple rows, containing the same LogID value, will be treated as a single document. And, repeated inserts will create hierarchy of the fields' values for the same attribute. To make more clear: Parameters field will index TestLog fields Param1,..,Param7 - one field, multiple values per single document.

To index multiple value of the same field, we combine multiple SQL selects and UNION ALL. Each select must be sorted by LogID field. It is because freetext index has declared unique constraint on LogID field, and, when inserting, it expects the input stream to be sorted by LogID. This is needed for freetext index to identify when a new document starts and ends, as multiple selects in a union will return multiple rows per document (with the same logID value).

ORDER BY 1 tells ScimoreDB optimizer to sort output by LogID. The sort order is not preserved per single SQL, but, instead, for the entire UNION ALL (ScimoreDB feature). In current example, the order operator will not force to sort the rows physically, instead, it will use the merge sort algorithm (since TestLog table has clustered index on LogID field). Merge sort operation is very light, and can be applied to vast amounts of data without performance penalties.

CreationDate will be indexed as text, formating the date as yyyymmddThh ; Note: minutes and seconds are not indexed! The field can be used to find exact date down to hour or the time period. For example:

  • CreationDate:201202* - find all log entries created in February 2012.
  • CreationDate:20120228* - find all log entries created at 2012/02/28.
  • CreationDate:20120228T01 - find log entries created at 2012/02/28 at 1 o'clock.

Indexing SQL script:

use LogSearchDemo
go;
insert into [TestLog_index]
(
select * from 
(
select [LogID],[CreationDate],[LogTypeTexts].[Text] as [LogText], cast(null as nchar) as [Parameters] 
from [TestLog] join [LogTypeTexts] on ([LogTypeTexts].[LogTypeId] = [TestLog].[LogTypeId])
order by 1

union all

select [LogID],null as [CreationDate], null as [LogText], [Param1] as [Parameters] 
from [TestLog]
order by 1

union all

select [LogID],null as [CreationDate], null as [LogText], [Param2] as [Parameters] 
from [TestLog]
order by 1

union all

select [LogID],null as [CreationDate], null as [LogText], [Param3] as [Parameters] 
from [TestLog]
order by 1

union all

select [LogID],null as [CreationDate], null as [LogText], [Param4] as [Parameters] 
from [TestLog]
order by 1

union all

select [LogID],null as [CreationDate], null as [LogText], [Param5] as [Parameters] 
from [TestLog]
order by 1

union all

select [LogID],null as [CreationDate], null as [LogText], [Param6] as [Parameters] 
from [TestLog]
order by 1

union all

select [LogID],null as [CreationDate], null as [LogText], [Param7] as [Parameters] 
from [TestLog]
order by 1
)
) 

In addition, you may apply WHERE clause and limit the amount of the data to index. For example, limit by CreationDate:

.
.
.
union all

select [LogID],null as [CreationDate], null as [LogText], [Param7] as [Parameters] 
from [TestLog]
where CreationDate BETWEEN '2012/02/01' and '2012/03/01'
order by 1 

Searching

First, we query freetext index. As a result, the freetext index outputs a list of LogID's, matching a given the search criteria. Next, we join output stream of LogID's with TestLog table.

The search string syntax follows Lucene standard:

For example:

  • '+Parameters:(P5V0 P5V1)' , find all matching rows, where field Parameters is either P5V0 or P5V1.
  • '+Parameters:P5V0 -LogText:val3' , find all matching roes, where Parameters = P5V0 and LogText not = val3

NOTE: in Lucene, the field name is case sensitive: '+Parameters:(P5V0 P5V1)' will find rows , and, '+parameters:(P5V0 P5V1)' will not find.

Example of JOIN'ing freetext search results with TestLog table. The ##query attribute is used to define the Lucene query string:

use LogSearchDemo
go;
declare @query nchar
set @query= '+Parameters:(P5V0) +LogText:"log type Id0"'

select  score,[testlog].* from 
execute freetext
(
select * from [TestLog_index] where ##query = @query
) [search] join [testlog] on ([testlog].[LogID] = [search].[LogID]) 

To find the log entries created at 28th of February 2012, execute:

use LogSearchDemo
go;
declare @query nchar
set @query= '+CreationDate:201202*'

select  score,[testlog].* from 
execute freetext
(
select * from [TestLog_index] where ##query = @query
) [search] join [testlog] on ([testlog].[LogID] = [search].[LogID]) 

OLAP/BI queries

Additional advantage of freetext index is that it can be used for BI queries as well. Think of Freetext index as multiple dimension tables in one, and TestLog table as Facts table.

For example the query bellow will count unique log type Id's for a given period (February 2012):

use LogSearchDemo
go;
declare @query nchar
set @query= '+CreationDate:201202*'

select  [testlog].[LogTypeId],count() from 
execute freetext
(
select [LogID] from [TestLog_index] where ##query = @query
) [search] join [testlog] on ([testlog].[LogID] = [search].[LogID])
group by [testlog].[LogTypeId]