» Scimore News Blog

A step by step example of v4.0 distributed/elastic database

2010-12-27 - Marius Slyzius

ScimoreDB is shared nothing distributed database system. It can run 100s' of nodes, and, is highly available with toleration to the hardware failures. Database horizontally distributes data among partitions. Each partition has multiple data nodes (scimoredb.exe), where data is replicated within nodes in the partition. The partition is also called partition group to indicate there are group of the nodes in a partition. If one or more node(s) become unavailable, the remaining partition nodes will vote to take them offline in the cluster (as long as voters can make the majority).

The limitations: cluster can have maximum 1024 nodes; the partition can have max 32 nodes. The maximum partitions are limited only by max nodes. So, if partitions have only 1 node, there can be max 1024 partitions per cluster.

How partitioning works? The cluster horizontally partition rows according partition columns' hash value (partition columns set in CREATE TABLE command). The cluster uses hash range to identify the partition owning the row. The maximum clusters' hash range is [0, 1024[, therefore hash value is additionally modulus divided by 1024. Each partition group is responsible for the particular hash range. For example, a cluster with a single partition group, the partition hash range will be [0, 1024[. With 2 partition groups, the ranges are [0,512[ and [512, 1024[. To see partition hash ranges, execute SQL: select * from system.sysinstances, columns v_lower,v_upper.

Cluster management

In order for cluster to work at least 1 partition must be created. To create partition, install scimoredb node(s) and then join to the cluster using "CREATE CLUSTER" SQL command.

Here is a step by step example:

Installing GUI and DB nodes:

Download the latest 4.* version MSI. Download path: http://www.scimore.com/download/release/4.0/latest choose either x86 or x64 platform MSI to install. During installation in server configuration dialog set your machine name and port 999. After installing, on your machine will be installed: a single node (scimoredb.exe runs as service), administrator/manager GUI and .NET provider. Next, install 5 more scimoredb nodes. To do so , create "c:\data\db1","c:\data\db2", "c:\data\db3", "c:\data\db4", "c:\data\db5" folders. Then, in command prompt (NOTE: for win7/vista command prompt run as administrator) run following commands:

scimoredb.exe -startup/instance=1 -startup/console=1 -startup/mode=1 -db/syslog="C:\data\db1" -db/systables="C:\data\db1" -db/data="C:\data\db1" -net/endpoint="localhost:1000" 

scimoredb.exe -startup/instance=2 -startup/console=1 -startup/mode=1 -db/syslog="C:\data\db2" -db/systables="C:\data\db2" -db/data="C:\data\db2" -net/endpoint="localhost:1001" 

scimoredb.exe -startup/instance=3 -startup/console=1 -startup/mode=1 -db/syslog="C:\data\db3" -db/systables="C:\data\db3" -db/data="C:\data\db3" -net/endpoint="localhost:1002" 

scimoredb.exe -startup/instance=4 -startup/console=1 -startup/mode=1 -db/syslog="C:\data\db4" -db/systables="C:\data\db4" -db/data="C:\data\db4" -net/endpoint="localhost:1003" 

scimoredb.exe -startup/instance=5 -startup/console=1 -startup/mode=1 -db/syslog="C:\data\db5" -db/systables="C:\data\db5" -db/data="C:\data\db5" -net/endpoint="localhost:1004" 

The commands will create additional db nodes on the local machine. The "-net/endpoint="localhost:xxx" parameter indicates the db nodes' connection attributes. When installing node usually connection parameter is "machineName:port", such any node from any machine can access it. In the current example, we use "localhost:xxx" since all cluster nodes will be running on a single machine (which is only for demo purposes). Additionally, parameter –db/cachePage=x sets how many pages (page size is 8kb) database will cache. Default is 3000, which is low for big tables.For more how to setup cluster node and startup parameters check: http://www.scimore.com/blog/post/2010/02/Distributed-database-Installation.aspx Scimoredb.exe requires "scimore.data.scimorenativeclient.dll" and "scimoreagent.exe", so, before installing node on other machine, copy 3 binaries together. Next, start the new nodes, by executing in command prompt:

net start scimoredb-1

net start scimoredb-2

net start scimoredb-3

net start scimoredb-4

net start scimoredb-5 

Creating cluster.

Open managers ‘query window connected to "localhost", port "999". The managers' query window may return errors (.. node is in stand-by mode..), ignore it, since the most SQL command will fail, if node is not joined to partition group. In query window execute SQL command:

create cluster endpoint('localhost:999')
go;
add partition group(      
add endpoint('localhost:999'),     
add endpoint('localhost:1000'));
go;

commit cluster; 

The command will create a cluster with a single partition containing 2 nodes. To verify cluster state execute:

select * from system.sysinstances; 

or

select * from system.syscluster; 

syscluster table is a logging table, that shows all cluster changes. It can also be used to track failover actions, by checking the most recent cluster against previous version.

Altering cluster

Allows adding new partitions, add/remove nodes or splitting existing partition. a. For example, in a single SQL command, we add 2 new nodes to existing partition and, add new partition with another 2 nodes. Execute following SQL:

alter cluster;
go;
add partition group repartition with 0
(     
add endpoint('localhost:1001')   
,add endpoint('localhost:1002'));

alter partition group 0(      
add endpoint('localhost:1003')    
,add endpoint('localhost:1004'));

commit cluster; 

"add partition group..." will create a new partition (new id 1). "...repartition with 0" indicates that partition (0) will split data with the new partition (1), so partitions' (0) hash range will change. b. Splitting partition.

In the current example, the partition (0) manages 4 nodes (can be seen: select * from system.sysinstances):

localhost:999
localhost:1000
localhost:1003
localhost:1004 

We can split the partition (0) to 2 partitions. Where nodes:

localhost:999
localhost:1000 , remains on partition (0). 

and,

localhost:1003
localhost:1004 

nodes will belong to the new partition (2). Note, the command is instant, no data movement is involved. Simply, the nodes will not "see" rows that don’t belong anymore to the altered partitions' hash range. The SQL command:

alter cluster;
split partition group 0
commit cluster; 

Dropping a node from partition.

To remove a node from the cluster, execute drop node SQL command:

alter cluster;
alter partition group 0 (     
drop endpoint('localhost:1000'));
commit cluster; 

Drop the node from partition (0). The dropped node goes to stand-by mode and will shut down shortly. If node should be added to cluster again, full node reinstall is required.

Creating databases and tables.

To create database, tables read: [Data_Definition]

New feature - allow relation partitioning. This is used when parent table is partitioned on one column and linked with child table(s) partitioned on parents' auto identity column.

En example, execute SQL:

create database test_hierarchy_dist;
go;
use test_hierarchy_dist;
go;
create table t3(name char not null primary key partition,id autobigint not null unique );
go;
create table t3_1(id bigint not null partition,info char,uid autobigint not null primary key,constraint sk foreign key (id) references t3(id));
go;
create table t3_1_1(id bigint not null partition,info2 char,uid autobigint not null primary key,constraint sk foreign key (id) references t3_1(uid)) 

In the example above, T3 table partitioned by the column "name". The "AutoBigInt" column , generates auto unique values, the value is is unique and, additionally, it encodes "name" columns' hash value. Therefore, child table T3_1 will be as well partitioned by T3(name), even if it is set to partition by ID field which is linked with T3(ID) "AutoBigInt" field. For example, execute script to populate rows:

use test_hierarchy_dist;
go;
declare @i int, @name char, @r1 bigint, @r2 bigint 
begin tran
set @i = 0
while @i < 10000
begin
    set @name = concat('m',@i)
    insert into t3(name) values(@name)
    set @r1 = scope_identity()
    insert into t3_1(id,info) values(@r1,concat('v',@r1))
    set @r2 = scope_identity()
    insert into t3_1_1(id,info2) values(@r2,concat('z',@r2))
    insert into t3_1_1(id,info2) values(@r2,concat('z',@r2))
    set @i = @i +1
end
commit 

Next, verify SQL:

use test_hierarchy_dist;
go;
declare @total int
set @total =  select count(*) from t3 join t3_1 on (t3.id = t3_1.id) join t3_1_1 on (t3_1.uid = t3_1_1.id)
if @total <> 20000 THEN RAISE EXCEPTION 'test failed' 

Third, check if the following query will be executed only on a single node (the partition calculated by the WHERE clause name=value condition).

use test_hierarchy_dist;
go;
select t3.* from t3 join t3_1 on (t3.id = t3_1.id) join t3_1_1 on (t3_1.uid = t3_1_1.id)where t3.name = 'm1' 

.NET provider connection string.

Client may connect to any node in a cluster. Here is a link about connection strings: http://www.scimore.com/howto/getstarted/connection-strings/ (skip embedded DB part). To make a proper and scalable connection, we recommend combining all top nodes from all partitions in connection string. For example, for cluster:

create cluster endpoint('localhost:999');
go;
add partition group(
add endpoint('localhost:999')
,add endpoint('localhost:1000'))
;go;
add partition group(
add endpoint('localhost:1001')
,add endpoint('localhost:1002'));
go;
add partition group(
add endpoint('localhost:1003')
,add endpoint('localhost:1004'));
go;
commit cluster 

The connection string should be: "server=localhost:999&localhost:1001&localhost:1003; database=mydb"

The .NET client will load balance the connections over 3 nodes of 6 node cluster.