Introduction

Scimore behaves as a standard SQL server. It can be embedded in your application, run as a single server or run a distributed SQL cluster on many machines. The most important feature we focus - simple to use and maintain. The database solution works out-of-box; no third party tools or special OS requirements.

Here we will explain how the distributed cluster works:

Cluster design

Scimore clusters works like a single SQL instance. You can connect to any instance in the cluster and it will execute your SQL query. Scimore is using sharding (or consistent hashing) to distribute data into groups. Each group is responsible for holding a subset of data of a table. Each group can consist of multiple nodes, to enable failover. A node is "scimoredb.exe" executable running as a NT service.

cluster.jpg

Using this simple schema it's possible to design clusters for high read loads, high update loads, for large data sets, or for high availability requirements. You can dynamically add machines to a cluster, while the cluster is operational.

Cluster communication

The communication protocol between nodes uses technique so-called "conquer and divide". Let's say, a single node communicates with 100 nodes to answer your query. That will not be a scalable solution. Instead, each node communicates only with 2-4 neighboring nodes, and those with their neighbors in parallel.

communication.jpg

Partitioning tables

When creating tables on a standard SQL server you execute the CREATE TABLE statement. On Scimore you do the same. The only difference on Scimore you are given the option of expressing how you wish to distributed the table over the groups in the cluster.

Before executing SQL query, distributed query optimizer, calculates the best way to process. For instance, if the table has been partitioned by column "myfield", and, select query has where clause "where myfield=val", the optimizer most likely will execute the query on a single node. When 2 tables joined ON() fields that are partition fields on both tables, the join operation will be independent, i.e. there will be no inter-communication between nodes. Instead, each node will join tables locally and send the results back to the client.

paritioning.jpg

ScimoreDB supports 4 ways how to partition the data among the groups. Often, partitioning by the columns' hash value is the right choice. It is the most widely adopted partitioning around, often, referred as sharding. Yet, there are cases when other partitioning techniques benefit the scalability. For detail explanation read the distributed schema page.

Query prioritization

A developer made a bad query, and, the production database became slow. The DBA quickly started database profiler, and identified that the new stored procedures did the table scans. He quickly added missing indexes, and, after few hours, operations back to the normal level. Nice work DBA! Is it? The damage has been already done. This is absolute no go for the systems with the high availability requirements.

Let's take the other case. A developer made a bad query. The operations report that something is wrong in the current release. The DBA takes a look, and, discovers newly added procedures missing index. He adds indexes to fix the slow queries. And things back to normal.

Both cases are similar, except, for the second case existing business logic has not been compromised, only recent changes.

ScimoreDB has been designed to deal with the issue: one query kills all. The database dynamically prioritize queries, and, when query becomes too aggressive, the engine will decrease ("throttle") the priority, preventing to capitalize the database power for a single query.