» Documentation

DQL Reference

DQL Introduction

Every executing SQL query will be converted to internal query execution plan, expressed as Scimores' DQL language. DQL (Distributed Query Language) is a near-intuitive language that provides experienced developers with a very powerful tool for maximizing the efficiency of database updates or queries. Less experienced or novice developers will also appreciate that it's easy to learn and quick to get started with. SQL (Structured Query Language) defines the methods used to create and manipulate relational databases so that queries can be performed, but DQL gives developers the power to decide how these queries are best performed. This can greatly improve the efficiency of performing distributed queries involving databases hosted on multiple machines or, for example, when tables are opened on one machine but sorted on others. Such scenarios are perfectly normal in today's world of web-orientated database-driven applications, and judicious use of DQL can dramatically improve their performance.

DQL is a pipeline-based query language. "Pipelining" means that multiple inputs (or processes) are transparently combined to produce one output or result. In this context, a process is a C function or class identified by a name, for example, "Join".

In order to execute DQL using SQL interface, use EXECUTE DQL command:

execute dql
begin

...place here DQL text....

end 

DQL Process

DQL is essentially a set of processes which have different functions to produce results, and which can be combined according to one or more query definitions.

A process represents a relatively complex communication between standard well-defined classes that control process behaviour. Almost all classes are hierarchically related to each other, but it is the top-level classes that really define the output. All classes communicate internally through a standard representation of a communication layer.

DQL Notation

DQL includes an industry standard left-to-right bottom-up parser that conforms to standard practice. It takes full advantage of plug-in technology, which makes it easy to distribute newly developed functions/processes throughout an enterprise, and to develop more advanced functions. DQL syntax descriptions rely on the following notation:

Token

Production

batch

command * command batch;

command

function * function to * function operators * function to operators

command

command on

function

text funcparams;

funcvalue

handle * constant;

funcparams

funcvalue * funcvalue ',' funcparams;

on

'on' constant;

to

'to' handle;

handle

'@' text;

operators

operator * operator operators;

operator

filter * project * restrict * index * with * 'set' set;

with

'with' conjunction * 'with' order * 'with' constant;

set

condition * condition ',' set;

filter

'filter' 'by' disjunction;

project

'project' 'by' projectlist;

restrict

'restrict' 'to' projectlist;

index

'index' 'by' number;

disjunction

conjunction * conjunction 'or' disjunction;

conjunction

condition * condition 'and' conjunction * '(' conjunction ')';

condition

column comp column;

projectlist

column * column ',' projectlist;

column

handle '.' text * constant * parameter;

constant

'\'' text '\'';

parameter

'?' text;

order

column ordertype * column ordertype ',' order;

ordertype

ASC * DESC

comp

'=' * '!=' * '<' * '<=' * '>' * '>=';

Process Operators

The process supports several predefined process operators. Process behaviour is controlled by one or more operators, which in DQL are called Process Operators. These are provided from DQL process definitions and made available during the process initialisation phase.

:operators.jpg

If you wish to extend the functionality of the process, you can add more defined operators, but this does require detailed knowledge of writing C++ functions. The DQL provides operators for modify, query records:

Operator

Function

project by (projectlist)

The project by operator is used to list the attributes desired in the output stream of a query. The attributes can be fields of any input/output streams, functions or constants. More simply, "project by list of columns to return to output" (like select colx ... in SQL).

restrict to start, number of rows

Restrict allows you to limit output to a specific number of records. If query performs sorting, the "restrict to" operator can have very dramatic effect on query cost. Judicious selection of good parameters can significantly reduce the time taken to process queries. The restrict to operator is convenient to use for navigational queries to retrieve desired window of records of processes output stream. Consider the results of search page. First time user execute query, page retrieve [0,10] records of 1000 found, second ]10,20] and so on. The goal is to avoid transfer records not relevant to user.

filter by (disjunction)

Filter rows according to conditions

with (conjunction)

Seek matching rows using index , the index id must be defined using operator. Example: with myfield > 0

index by number

optimizer hint that index to use. Identified by index id (sysindexes(id)) + 100.

set (set)

Modify fields in a stream

ON 'partition group id'

Identify the partition group by (sysinstances). The partition group set by distributed query optimizer. When executing DQL from SQL interface, skip declaration.

Please note an asterisk corresponds to DQL notation of detailed syntax.

DML Statements

DQL Server highlights the following Data Manipulation Language (DML) topics.

Modifying Data

DQL Server supports the following data modification statements.

The Insert Process

To insert data into a table, we either specify a record to be inserted or an input stream whose result set of records to be inserted. The syntax of the insert process, which cause data values to be inserted as one row:

Statement

Function

insert 'path'

Specify the path of DQL table to insert a single row.

set (set*)

Assign values to the collection of all columns of the table.

The following 'insert' process inserts multiple records into a table:

Statement

Function

insert_bulk @T1,'path'

More generally, we might want to insert records based on the result set of an input stream @T1.

set @this.FieldName1 = @T1.Field1, ¦, @this.FieldNameN = @T1.Field2

Value to be assigned to column can be either field of input stream, constant or function expression.

Example1 Inserting a single row to the CUSTOMER table, database MYDB:

execute dql
begin

insert 'mydb.customer'
set
@this.Name = 'Marius',
@this.LastName = 'Slyzius',
@this.Birthdate = '19/01/1972'

end 

Example2

Inserting a set of rows (bulk insert).

execute dql
begin

open_table 'mydb.account' to @T1 

insert_bulk @T1, 'mydb.customer'
set
@this.Name = @T1.Name,
@this.FullName = concat(@T1.Name,'-', @T1.LastName),
@this.Birthdate = '19/01/1972'

end 

For each row of ouput @T1, new row will be inserted into mydb.customer table with the values according set operator.

The Delete Process

The 'delete' process is expressed in the much same way as an open_table. Except, all successor records will be deleted instead pushed to an output stream. In addition, the 'delete' process can accept an input stream(s) that references the table and a result set which records are to be deleted. In this case, we can delete records that match join condition with other table and to perform multitable deletion in single operation as well. The simple 'delete' clause:

Required

Statement

Function

[yes]

delete 'table path'

Specify internal table to delete row(s).

[no]

with (conjunction)

Delete rows using index search.

[no]

filter by (disjunction)

Search condition is table scan

[no]

restrict to 'start', 'size'

Limit the number of the rows to update.

Another approach of 'delete' process:

Statement

Function

delete_view @T1, ..., @TN oid @T1.[primarykeyfield1],..,@T1.[primarykeyfieldN],'mydb.mytable1' oid @TN.[primarykeyfield1],..,@TN.[primarykeyfieldK],'mydb.mytableN'

Deletes from single or multiple tables. [OID] operator references each deleting table. The operators' fields must define tables' be primary key value(s).

Example1 Delete first 100 customers who has born between ]1940, 1960[:

delete 'mydb.customer' on 'marius2k'
with @this.Birthdate < '01/01/1960' and @this.Birthdate > '01/01/1940'
restrict to '0', '100' 

Example2 Delete rows from 2 tables matching criteria: orders.product_id = products.product_id SQL:

delete from products, orders
where orders.product_id = products.product_id 

DQL:

execute dql
begin

open_table 'mydb.orders(xlock_row)' to @TT1
index by '100'
project by @this.[order_id],@this.[product_id],@this.[##rid]

open_table 'mydb.products(xlock_row)' to @TT2
index by '100'

join_inner 'TT1','TT2' to @TT3
with  @TT1.[product_id]=@TT2.[product_id]
filter by  ( @TT1.[product_id]=@TT2.[product_id] )
project by @TT1.[order_id],@TT1.[product_id],@TT1.[##rid],@TT2.[product_id] as u1,
@TT2.[##rid] as u2

delete_view 'TT3'
oid @TT3.[u1],@TT3.[u2],'mydb.products'
oid @TT3.[order_id],@TT3.[##rid],'mydb.orders'

end 

You must specify magic field [##rid] , that is RID. Because of MVCC, row id's are used to identify the row version.

The Update Process

In certain situation, we may wish to change value of the column in a record. For this purpose the update process can be used. We can choose the records to be updated using conditional operators 'with' and/or 'filter by' applied to the process. The syntax of the update process is:

Required

Statement

Function

[yes]

update 'table path'

Specify internal table to update rows with new values set by 'set' operator.

[yes]

set @this.Field1='value1', ..., @this.FieldN='valueK'

Assign new values to the fields we are updating. Value to be assigned to a column can be either any field of record we updating, constant or function expression.

[no]

with (conjunction)

search rows by index

[no]

filter by (disjunction)

search rows by table scan, or within range returned by [with] operator.

[no]

restrict to 'start', 'size'

Limit the number of the rows to update. More...

The 'update_view' process can modify multiple pipelined input records in the same manner like 'delete_view' process. Syntax of 'update_view' process is:

Statement

Function

update_view @T1, ..., @TN oid @T1.[primarykeyfield1],..,@T1.[primarykeyfieldN],'mydb.mytable1' oid @TN.[primarykeyfield1],..,@TN.[primarykeyfieldK],'mydb.mytableN'

The rules for input stream(s) appears to be the same as for delete_view process. [OID] operators define the updating tables.

set @T1.Field1 = 'value1', ..., @TN.FieldN = @T1.Field1

Assign new values to the fields we are updating on any input stream. Value to be assigned to a column can be either any field of any input stream record we updating, constant or function expression.

Example

Update [products] and [orders] tables matching [orders.product_id = products.product_id] criteria.

SQL:

update products, orders
set products.name = 'new name', orders.ammount = 0
where orders.product_id = products.product_id 

DQL:

execute dql
begin

open_table 'mydb.orders(xlock_row)' to @TT1
index by '100'
project by @this.[order_id],@this.[product_id],@this.[ammount],@this.[##rid]

open_table 'mydb.products(xlock_row)' to @TT2
index by '100'

join_inner 'TT1','TT2' to @TT3
with  @TT1.[product_id]=@TT2.[product_id]
project by @TT1.[order_id],@TT1.[product_id],@TT1.[ammount],@TT1.[##rid],@TT2.[product_id] as u1,@TT2.[name],@TT2.[##rid] as u2

update_view 'TT3'
set @TT2.[name]='new name', @TT1.[ammount]=0
oid @TT3.[u1],@TT3.[u2],'mydb.products','TT2'
oid @TT3.[order_id],@TT3.[##rid],'mydb.orders','TT1'

end 

Query fundemantals

The Open Table Process

To retrieve single or multiple rows from the table use 'open table' process:

Required

Statement

Function

[yes]

open_table 'path' to @myoutput

This opens a specified internal table to retrieve the selected output. The selected records are pushed to output stream, named '@myoutput'. Thus, the use of @myoutput.FieldName would indicate a reference to value of column used by operators defined within process definition or by adjacent pipeline process pair. The result will be sent to the client, when output stream name is @Results.

[no] with (conjunction)

The 'with' operator corresponds to the retrieval of matching records to output stream by using index.

[no]

filter by (disjunction)

Apply filter condition. When 'with' operator is skipped, the full table scan is performed to push to output stream records thus satisfying the filter's selection criteria, causing the query execution cost to be quite costly. In practice combination of 'with' and 'filter by' operators indicate much better performance than use only single filter operator.

[no]

restrict to 'start', 'size'

Restrict output to desired window of records of processes output stream. By skipping this operator all records will be returned.

[no]

project by (projectlist*)

Use this operator to build result set that contain only fields you want. Field names can be repeated or omitted. There are performance benefits to excluding field names. This is especially true if only a few columns are needed from a table with a large number of fields. If 'project by' operator has not been declared within process, all fields of table are included into output stream.

Examples: Consider the following collection of a table and indexes: table: {Id, Name, LastName, Birthdate, Occupation}, located at 'd:\memodb\data\customers.dat' with indexes: primary index1: {Id} secondary index2: {Name} secondary index3: {Birthdate}

Example1 Similar to the SQL statement: select * from mydb.customers, that display all attributes and records of the table:

execute dql
begin

open_table 'mydb.customers' to @Results  

end 

Example 2 Specifies filter criteria which must meet for rows to be included in the query results. A range search returns all records where birthday between [1972, 1980]. Note: the full table scan is performed even if there is an appropriate secondary index:

execute dql
begin

open_table 'mydb.customers' to @Results
filter by @this.Birthdate < '01/01/1980' and @this.Birthdate > '01/01/1972'
project by @this.Name, @this.LastName  

end 

Example 3 Demonstrates similar to (example 2) a range query by using secondary index on {BirthDate}:

execute dql
begin

open_table 'mydb.customers' to @Results
with @this.Birthdate < '01/01/1980' and @this.Birthdate > '01/01/1972'
index by 102
project by @this.Name, @this.LastName  

end 

Example 4 Limit query output to 10 rows, starting from 5th row:

execute dql
begin

open_table 'mydb.customers' to @Results
with @this.Birthdate < '01/01/1980' and @this.Birthdate > '01/01/1972'
index by 102
project by @this.Name, @this.LastName
restrict to '5', '10'  

end 

The Sort Process

The DQL offers the user some control over the order in which records in the process output are displayed. The 'sort' process causes the rows of an input stream to appear in sorted order on the processes output stream.

Required

Statement

Function

[yes]

sort @T1 to @output

Sort the input stream @T1 to temporary stream @output.

[yes]

with (order*)

The 'with' operator enlist the attributes to sort of an input's stream. To specify the sort order, we may specify DESC for descending order or ASC for ascending order. Furthermore, ordering can be performed on multiple attributes.

[no]

restrict to 'start', 'size'

Restrict output to desired window of records of processes output stream. By skipping this operator all records will be returned. Note: that the 'restrict to' operator can have very dramatic effects on query optimisation. Judicious selection of good parameters can very significantly reduce the time taken to sort input stream.

Sort Examples Consider the table: customer on {Name, LastName, Birthdate}.

Example1 Suppose we wish to list entire customer table in ascend order of Name.

execute dql
begin

open_table 'mydb.customer' to @T1
project by @this.Name, @this.LastName, @this.Birthdate

sort @T1 to @Results
with @T1.Name ASC 

end 

Example 2 If several customers have the same Name, we order them in descending order by LastName.

execute dql
begin

open_table 'mydb.customer' to @T1
project by @this.Name, @this.LastName, @this.Birthdate

sort @T1 to @Results
with @T1.Name ASC, @T1.LastName DESC 

end 

The Group Process

The 'group_sorted' process causes the rows of an input stream to appear in the groups into which output rows are to be placed and calculates a summary value of an aggregation function for each group. The group attribute is a list of columns on which grouping is performed. In addition, 'group_sorted' process expects the input stream to be sorted on a grouping attribute(s).

Required

Statement

Function

[yes]

group_sorted @T1 to @T2

Group the input stream @T1 by simple sorting approach - we receive stream @T1 sorted on the grouping attribute and then scan it to compute the result of each aggregate operation for each step and then push it to output stream @T2.

[yes]

project by (projectlist)

The 'project by' operator declares grouping attribute(s) and aggregate function(s) to be computed for each group. The grouping attribute may contain multiple columns, functions or constants. Once again, @T1 input stream MUST be sorted by grouping attribute(s). The sort order can be any combination of ascending or descending orders of grouping attributes.

[no]

filter by (disjunction)

At times, it is useful to state a condition that applies to groups rather than to records. To express such a query, we use the 'filter by' operator (like in SQL - having by clause). Predicates in the 'filter by' operator are applied after the formation of groups, so results of aggregation functions may be used to validate if groups can appear into output stream.

[no]

restrict to 'start', 'size'

Restrict output to desired window of records of processes output stream. By skipping this operator all records will be returned.

Aggregate Examples Consider the table: customer on {Name, LastName, Birthdate} with primary index on {Name}.

Example1 At times, we wish to treat entire table as a single group. In such cases, we do not use a group_sorted process. The following query retrieves count of rows of the customer's table:

execute dql
begin

open_table 'mydb.customer' to @Results
project by count()  

end 

Example2 Consider the query (Fetch all distinct names of customers and the count of customers share that name):

execute dql
begin

open_table 'mydb.customer.dat' to @T1

group_sorted @T1 to @Results
project by @T1.Name, count('*') 

end 

In the example above the grouping attribute is a column Name. As the rows are already clustered by Name, we can skip heavy sort process and immediately group @T1 stream rows. Sometimes, when table contains large number of records and group queries are often, you can benefit query performance by choosing a primary index on grouping attribute(s) and therefore skipping temporary sort.

Example3 Consider another query (Fetch all distinct names of customers, shared it more than once):

execute dql
begin

open_table 'mydb.customer.dat' to @T1

group_sorted @T1 to @Results
project by @T1.Name, count('*') as total
filter by @this.total > 1

end 

The total attribute is a reference to value of a count aggregate function per each group. The behavior is similiar to SQL- HAVING BY clause.

Example 4

Consider the query (Fetch all distinct birth dates of all customers and the number of customers share it):

execute dql
begin

open_table 'mydb.customer'  to @T1
project by @this. Birthdate

sort @T1 to @T2 
with @T1.Birthdate ASC 

group_sorted @T2 to @Results
project by @T1.Birthdate, count('*') 

end 

In the example above the grouping attribute is a column Birthdate. The @T1 stream rows are sorted by Name, but the group process expects rows to be sorted by Birthdate, therefore, in addition, we sort @T1 stream to temporary stream @T2 by the column Birthdate, and then group it.

The Union Process

The 'union' process combines multiple input streams into a single output stream. It is the simplest union operation without eliminating any duplicates. The number of records in output stream is equal to sum of number records in each input stream.

Required

Statement

Function

[yes]

union @T1, .., @T(N) to @output

There can be up to 8 input streams for 'union' process where they will be combined by simple approach - read all rows from first input until reached end, then from the second input stream and so on. Also if some records are identical, they will not be removed by 'union' process and, therefore, we have to know in advance that all records are distinct in order to return correct result set. Note: the 'project by' operator is missing, but it is used internally and constructed by inheriting all attributes from the first input's 'project by' operator. Therefore, all input streams must have equal number of attributes and of the same data type according to their positions in the stream

[no]

restrict to 'start', 'size'

Restrict output to desired window of records of processes output stream. By skipping this operator all records will be returned.

Union Examples

The SQL query:

select * from customer where Name in ('Marius', 'Peter') 

in the DQL semantic:

execute dql
begin

open_table 'mydb.customer'  to @T1
with @this.Name = 'Marius'
project by @this.Name, @this.LastName, @this.Birthdate

open_table 'mydb.customer' to @T2
with @this.Name = 'Peter'
project by @this.Name, @this.LastName, @this.Birthdate

union @T1, @T2 to @Results 

end 

Both @T1 and @T2 fetch matching rows by using primary B+Tree index on {Name}. When they deliver result set, we combine them into single output @Results, knowing, in advance, that all records are distinct.

Join Fundamentals

The join process dedicated to combine the contents of two tables and produce a result set that incorporates rows and columns from each table. Tables are typically joined using data that they have in common, by, so called, a join condition. The comparison operator in a join condition determines how the two sides of the condition are evaluated and which records returned. The most common comparison operator is equivalence (=), however, range (<, >) operators can be part of join condition too.

The DQL supports a set of join processes, suitable for a specific type of query:

Indexed Nested-Loop Join

If an index is available on the inner loop's join condition, 'join_inner' process can be used.

Required

Statement

Function

[yes]

  • join_inner @T1, @T2 to @output
  • join_outer @T1, @T2 to @output

The left-hand stream @T1 must be pipelined and for each record in @T1, the index is used to look up rows in right-hand stream @T2 that will satisfy the join condition. In case of 'join_outer', if the matching rows not found in @T2, the null values for @T2 will be projected.

[yes]

with (conjunction)

The 'with' operator corresponds to the join condition of 'join_inner' process. As record is received for the left-hand side of the join, it is used to create the index key value for the right-hand side stream @T2 and to fetch matching join condition rows by using index on @T2. The join condition right-hand side attributes must satisfy index selection rules and use only AND statements, exactly like in the open_table process.

[no]

filter by (disjunction)

For each matched pair records of @T1 and @T2, the 'filter by' operator evaluates attributes of both streams if they satisfy filter condition. Any attribute of both streams can be a part of filter's condition.

[no]

restrict to 'start', 'size'

Restrict output to desired window of records of processes output stream. By skipping this operator all records will be returned.

[yes]

project by (projectlist*) Use this operator to build result set @output that combine any attributes of both inputs.

Inner join example

Indexed Nested-Loop Join (using index, the most common join algorithm): Consider two tables:
customer {Name, LastName, accId}, with two indexes on :{accId} and {Name}
account {accId, balance}, with index on {accId}

For SQL query:

select c.Name, c.LastName, c.accId, a.balance
from customer c, account a 
where c.Name='Marius' and a.balance < '2500' and c.accId = a.accId 

DQL:

execute dql
begin

open_table 'mydb.customer' to @T1
with @this.Name = 'Marius'

open_table 'mydb.account' to @T2 
index by 100

join_inner @T1, @T2 to @Results
with @T2.accId = @T1.accId
filter by @T2.balance < '2500'
project by @T1.Name, @T1.LastName, @T1.accId, @T2.balance 

end 

Note: the (@T2.balance < '2500') condition cannot be declared within with operator, otherwise it will violate index selection rules for primary key. Also NEVER declare filter operator within right-hand stream process (@T2), always use filter condition with join_inner process.

Merge-Join

The merge join requires that both inputs be sorted on the merge columns, which are defined by the equality clauses of the join condition. Because each input is sorted, the Merge Join operator gets a row from each input and compares them. For example, for inner join operations, the rows are returned if they are equal. If they are not equal, whichever row has the lower value is discarded and another row is obtained from that input. This process repeats until all rows have been processed. Merge join itself is very fast, but it can be an expensive choice if explicit sorting is required. However, if the inner input is large and they are sorted by join columns, merge join is often the fastest available join algorithm.

Required

Statement

Function

[yes]

join_merge @T1, @T2 to @output

Both @T1 and @T2 inputs must be sorted by join fields.

[yes]

with (conjunction) Define the join condition. Only (=) operator is permitted.

[no]

filter by (disjunction)

For each matched pair records of @T1 and @T2, the 'filter by' operator evaluates attributes of both streams if they satisfy filter condition. Any attribute of both streams can be a part of filter's condition.

[no] restrict to 'start', 'size' Restrict output to desired window of records of processes output stream. By skipping this operator all records will be returned.

[yes]

project by (projectlist*)

Use this operator to build result set @output that combine any attributes of both inputs.

Merge join example

Since both inputs are sorted by join column {accId}, we can rewrite DQL to use merge join process instead.

execute dql
begin

open_table 'mydb.customer' to @T1
with @this.Name = 'Marius'

open_table 'mydb.account' to @T2 

join_merge @T1, @T2  to @Results
with @T2.accId = @T1.accId
filter by @T2.balance < '2500'
project by @T1.Name, @T1.LastName, @T1.accId, @T2.balance 

end