» Documentation

Freetext DVD

Free text search queries in SQL

With ScimoreDB it is possible to do free text searches. Merging the unstructured search queries of free text search, and the structured relational database queries gives interesting solution options.

In this brief introduction to free text search, we will show how to make a DVD database searchable.

First we restore a backup of a sample DVD database.

1 restore database dvd from share 'c:\\backups\\'

restore database dvd from share 'c:\\backups\\' 

The restored data looks like:

freetext1.jpg

From the original 37Mb CSV file, we get 34236 rows.

Next step is to create the freetext index. This is very similar to creating a table or an index.

drop table if exists dvds_text 
create freetext index dvds_text 
(     
SKU int unindex,
Title varchar store,
Price FLOAT store,
TheFormat varchar unstore,
Category varchar store,
Language varchar store keyword,
Rating varchar store keyword,
Actors varchar store,
Directors varchar store,
ShortDescription text unstore,
LongDescription text unstore 
) 

In the above text, three special keywords are used. unindex, store, unstore, and keyword.

  • unindex The freetext index should not index the column
  • store The column data is indexed and stored in the freetext index. This the data will be searchable, and available when selecting from the freetext index.
  • unstore The column data is indexed, but not stored in the freetext index. Thus is searchable, but when selecting from the freetext the orginal data will not be retreived. To retreive the data, a join with the original dvds table is required. This option uses less space.
  • keyword The column data is considered one long string, and will only be found if searching for the full text of the column.

The options are thoroughly described in the documentation. A reference to the documentation is found at the bottom of this introduction.

Now its time to populate the free text index. This is done by linking the sql "data" table and the freetext index. After have linked the tables, the freetext is populated, and future changes to the SQL table DVDs automatically pushed to the freetext index.

alter table dvds (
add constraint dvds_text_constraint freetext( sku, title, price, theformat, category, language, rating, actors, directors, shortdescription, longdescription)     references dvds_text( sku, title, price, theformat, category, language, rating, actors, directors, shortdescription, longdescription) 
) 

We can now query the freetext. First search, we search for movies in which "client eastwood" appears as an actor:

execute freetext( select * from dvds_text where actors = 'clint eastwood') 

freetext2.jpg

The results are from the freetext index, and the Scimore DVDS table is not accessed to provide the results. Note that items with "nostore" property in the freetext index creation contains null values. They are still searchable, but the content is not available in the output.

As when querying sql tables, it is possible to restrict out rows and query over multiple rows.

execute freetext(      
select title, category, language, rating, directors from dvds_text     
where title = 'kuro*' and Category='material arts') 

freetext3.jpg

It is possible to do approximative searches using the tilde sign. Here for directors that approximate like "takesi".

execute freetext( 
select score, directors, sku from dvds_text  
where directors = 'takesi~') 

freetext4.jpg

Note the score column. It's a column containing a weight from the search engine ranged from 0 to 1. It denotes the precision or quality of the results. Results from the freetext index is by default sorted by weight descending.

Using the special keyword "##query", we can also compose queries similar to query strings in google:

execute freetext(  
select score, title, category, language, rating, actors, directors from dvds_text  
where ##query = '+Directors:eastwood Actors:Cli*') 

freetext5.jpg

Note that the results are sorted by "score", the weighted score from freetext - on how good each row matches the search criterias.

Paging and other sorting are possible using "restrict" and "order by". In this example we order by rating, and ask for 2 results starting from result 4.

execute freetext(  
select score, title, category, language, rating, actors, directors from dvds_text  
where ##query = '+Directors:eastwood Actors:Cli*'
restrict 4, 2 
order by rating) 

freetext6.jpg

We can join the free text results with the normal scimoredb sql tables, to provide additional output or filtering.

select freetextindex.score, dvds.* from (     
execute freetext(          
select SKU, score from dvds_text          
where ##query = '+(Directors:eastwood) Actors:Clint')) freetextindex 
inner join dvds on freetextindex.sku = dvds.sku order by dvds.price desc 

freetext7.jpg

With ScimoreDB its possible to do facet searches. A drill down of a result set. You can read more about in this short introduction to facet [search in sql].

Further references: ยขFreetext Documentation.