Sphinx – An Opensource free text search engine
November 24, 2008
Ever came across a situation where you have multi-million rows with a text field lying around in your MySQL or Postgre database and you need to perform a text search? The most straightforward option is to use the good old SQL construct ‘LIKE’ but thats not practical if you have tons of rows. For instance, consider a situation where you have, say 2 Million rows each with one text field averaging 12Kb of data per text field.
MySQL has a built-in free text search which is pretty cool, but even that becomes slow on large quantities of data and doesn’t works well with special characters, in fact it doesn’t works at all with special characters except for single quotes (‘) and underscore (_). I was using free text search but had to perform alot of formatting on the existing data to make it compatible for searching with MySQL free text since the data I had included different Ids of the form p1,p2 etc and non english words.
I was pretty happy with MySQL free texts performance (compared to the LIKE keyword) but even then it was annoying to wait for a single search which was taking ~100 seconds on ~2M rows. Therefore our quest for improving performance continued until we found Sphinx.
Sphinx is Godsend really. It has reduced our search time from ~100 seconds to less then 1 second. Isn’t that amazing?
Sphinx is really easy to install and implement. You can find a comprehensive documentation here: http://www.sphinxsearch.com/docs/
Download Sphinx from here: http://www.sphinxsearch.com/downloads.html
Sphinx runs as a daemon and offers a really cool PHP API to connect to the daemon, which is pretty simple to use. All you need to do is install Sphinx, write up a configuration file which points to your data, includes a query to fetch data and other Sphinx configurations. Once installed and configured, you need to run the Sphinx indexer tool which will index your data. Indexing with Sphinx is real quick, it took Sphinx 10 minutes to index 2M rows taking up ~1.2GB of Hard disk space. You can even work around with live indexing to index data as it is generated but i have yet to figure out a decent way to re-index changing data, hence for now I have my data re-indexed every week (10 minutes is no big time).
There is a Windows version available but I haven’t used it, I have worked on the Linux version and I am all thumbs up for it.
I will be posting a simple installation guide and how I implemented it pretty soon.
If anyone else has used Sphinx to solve their searching problems, please enlighten me as to how we can more effectively utilize the power of Sphinx
hey dude, i came across this when i was helping a friend of mine and it reminded me of when you told me about it. i tried it out and i agree, its really easy to use and really helpful. hope you’re doin well, take care
Thanks for sharing. I will definitely give it a try!