Full-text search with Cocoa
When in java land the answer to searching is usually Lucene. When building a Mac OSX or iPhone application unfortunately the answer is not that simple.
Recently I had the need to build a search index of some data for an iPhone project and was a little surprised about the lack of options. Again my first thought was Lucene – more
specifically the C port of it. But unfortunately it was abandoned somewhere along the way. A
new try has not even reached the alpha phase. So what to do? Port the Lucene java code to Objective-C? That sounded like a bit out of scope for the iPhone project. I found two
options.
LuceneKit
Good someone else already ported Lucene 2.x to Objective-C – for GNUstep though. But with only little work I got it working for both Mac OSX and the iPhone. I’ve forked the official svn repository via git-svn, applied my changes and added some examples for Mac OSX and iPhone. It’s available on github. Here is how you use it.
LCFSDirectory *rd = [[LCFSDirectory alloc] initWithPath: @"/path/to/index" create: YES];
LCSimpleAnalyzer *analyzer = [[LCSimpleAnalyzer alloc] init];
LCIndexWriter *writer = [[LCIndexWriter alloc] initWithDirectory: rd
analyzer: analyzer
create: YES];
while(..) {
NSString *content = "..."
LCDocument *doc = [[LCDocument alloc] init];
LCField *f1 = [[LCField alloc] initWithName: FIELD_CONTENT
string: content
store: LCStore_NO
index: LCIndex_Tokenized];
LCField *f2 = [[LCField alloc] initWithName: FIELD_ID
string: @"the id"
store: LCStore_YES
index: LCIndex_NO];
[doc addField: f1];
[f1 release];
[doc addField: f2];
[f2 release];
[writer addDocument: doc];
[doc release];
}
[writer close];
[writer release];
[analyzer release];
The above source code is an example on how to create the index. Of course that’s something you should not be doing at runtime (if possible). Then all you need to do in your application is to open the index in read-only mode.
LCFSDirectory *rd = [[LCFSDirectory alloc] initWithPath: @"/path/to/index" create: NO];
And you are ready to do some searching.
LCTerm *t = [[LCTerm alloc] initWithField: FIELD_CONTENT text: searchText];
LCTermQuery *tq = [[LCTermQuery alloc] initWithTerm: t];
LCHits *hits = [searcher search: tq];
LCHitIterator *iterator = [hits iterator];
while([iterator hasNext]) {
LCHit *hit = [iterator next];
NSString *id = [hit stringForField: FIELD_ID];
NSLog(@"%@ -> %@", hit, id);
}
int results = [hits count];
Unfortunately the Objective-C port is still quite alpha. I ended up having some problems when indexing bigger chunks of data. It doesn’t look like it’s a really big thing to fix but I didn’t have the time to look into it.
sqlite
So what about using sqlite? While it does provide full-text searching the version on the iPhone does not have the feature compiled in. Bummer! But no problem – you can just use your own version of sqlite and you are good to go. I found the easiest way to do this is to download the amalgamation source and add it directly to the Xcode project. It’s really just one large .c file. To enable full-text search all you need to do is to add a define to the head of the file.
#define SQLITE_ENABLE_FTS3
While you are now ready to go on the iPhone you still need to build the db itself. For that you also need a sqlite on Mac OSX that supports the virtual table syntax. Again just use the amalgamation source and build it with
CFLAGS="-DSQLITE_ENABLE_FTS3=1" ./configure make install
In order to run the new sqlite you need to set the DYLD_LIBRARY_PATH to point to the folder that has the shared libraries (the libsqlite3.dylib file)
export DYLD_LIBRARY_PATH=/path/to/dylib:$DYLD_LIBRARY_PATH
Now you create your SQL that creates and fills the database for you. Sqlite has a special table syntax for full-search indexes.
CREATE VIRTUAL TABLE content_search using FTS3(id,content); INSERT INTO content_search VALUES ('someid', 'content without stopwords');
For better performance and efficiency you should remove stopwords first.
sqlite3 content.db < content.sql
So once you have the database files make sure it's included in your project's bundle. That's where you will open the database from
NSString *filePath = [[NSBundle mainBundle] pathForResource:@"content" ofType:@"db"];
Then on application launch you prepare the statements
sqlite3 *database;
sqlite3_stmt *statement;
if (sqlite3_open([filePath UTF8String], &database) == SQLITE_OK) {
const char *sql = "select id, snippet(content, '[', ']', '...' ) as extract from content_search where content match ?";
if (sqlite3_prepare_v2(database, sql, -1, &statement, NULL) != SQLITE_OK) {
NSLog(@"failed to prepare statement");
}
}
that you can then use to search inside the content and step through the result set.
NSString *searchText = "...";
sqlite3_bind_text(statement, 1, [searchText UTF8String], -1, SQLITE_TRANSIENT);
int success = sqlite3_step(statement);
if (success == SQLITE_ROW) {
char *str = (char *)sqlite3_column_text(statement, 0);
NSLog(@"found id '%s'", str);
// step for more results
} else {
NSLog(@"not found");
}
sqlite3_reset(statement);
So as a final word: I was really impressed by sqlite. But the full-text search engine is quite limited. If you need some more flexibility (like a different stemmer or search ranking) LuceneKit might be the way to go. I bet the fixes are not that hard. And it would be great to see the code maybe find it's way "back" to the Lucene project. At least it is already released under Apache License 2.0.