SQLite FTS contains and suffix matches

移动互联 2013-09-03

SQLite is used byDash to search throughdocset indexes. Originally, Dash used LIKE
queries which were fast enough, but became increasingly slower as more docsets were added.

SQLite FTS
is amazingly fast, but allows only prefix (e.g. query*
) matches by default. For Dash, I needed to persuade it to also perform contains matches (e.g. *query*
) or suffix matches (e.g. *query
).

How it works

It’s simple, for each term I want to be able to search, I store all of its suffixes.

First of all, the table structure:

CREATE VIRTUAL TABLE searchIndex USING FTS4(suffixes)

Add the term NSString
:

INSERT INTO searchIndex(suffixes) VALUES("NSString SString String tring ring ing ng g")

Search using suffix queries:

SELECT * FROM searchIndex WHERE suffixes MATCH 'string';

Or contains queries:

SELECT * FROM searchIndex WHERE suffixes MATCH 'str*';

Downsides

The only downside I could find was that the database got too large. To avoid this, I compress
the data into its actual term.

The compress and uncompress functions behave in this way:

compress("NSString SString String tring ring ing ng g")
-> NSString

uncompress("NSString")
-> NSString SString String tring ring ing ng g

This compression reduces the database size to what it would be if only the actual terms were added (without all the suffixes).

Speed results

Searching over 1,110,381 terms (in 102 docsets) using contains queries:

Search for "string" using LIKE:    3.22 seconds
Search for "string" using FTS:     0.18 seconds

Search for "s" using LIKE:         6.87 seconds
Search for "s" using FTS:          0.22 seconds

Alternatives

I chose SQLite FTS because I was already familiar with SQLite and I also needed to work around some Dash-specific edge cases (e.g. how symbols are treated).

Depending on your project, these may be suitable alternatives:

  1. PostgreSQL’s wildspeed module
  2. For OS X or iOS apps: Search Kit

责编内容by:Kapeli Blog (源链)。感谢您的支持!

您可能感兴趣的

Android sqlite / BLOB Performance Issue Ever since I moved my data from ArrayList to a sqlite database on Android, I have a serious performance drop. There are no cursors left open that coul...
为什么 SQLite 用 C 编写? 简评:SQLite 官方出品。 C 是最好的选择 从 2000 年 5 月 29 日开始,SQLite 就选择了 C 语言。直到今天,C 也是实现 SQLite 这样软件库的最佳语言。 C 语言是实现 SQLite 最好的语言的原因包括: 性能。 兼容性。 ...
Selecting a row from the sqlite database using raw... I've created an external SQLite database that looks something like this: What I want to do in my Android application is to load all the values fr...
Handling Invalid SQLite Queries in Android I have a simple code that manages to successfully query an SQLite Database and convert that result from cursor to string in order to display it on scr...
sqlite 的分布式实现方案:rqlite rqlite 相关操作说明项目总为实现 sqlite 添加账号和密码,以满足信息安全的需求。 鉴于对 sqlite 实现添加账号和密码比较复杂和困难,项目中使用了分布式 rqlite ,故转而研究 rqlite 如何添加账号和密码。 项目概况: 开发语言:Python...