Blog: Getting To Know Propel 1.5: A Search Engine In One Line

The Propel Team – 05 May 2010

The ability to reuse elements of a query was dramatically improved with Propel 1.5. A great example of this new flexibility is how easy it is to build a full-text search engine. Or, to use the 1.5 vocabulary, a text filter. Let’s see how to allow full-text search to a Bookstore with a single line.

The Naive Approach

A full-text search engine for books should return results where the search input appears in the book title, or in the book summary. Databases support simple regular expression comparison, so the fastest way to implement the search engine looks like:

class BookQuery extends BaseBookQuery{  public function filterByText($text)  {    $pattern = '%' . $text . '%';    return $this      ->where('Book.Title like ?', $pattern)      ->orWhere('Book.Summary like ?', $pattern);  }}

The new filterByText() method uses where() and orWhere(), which are part of the relational API, the secondary set of methods offered by ModelCriteria. It also returns the current object, so it can be chained together with other query methods. Now, adding a full-text search to a query is really a one-liner:

$books = BookQuery::create()  ->filterByText('pride')  ->orderByTitle()  ->find();

The $books collection gets hydrated from the results of the following SQL query:

SELECT book.* from `book`WHERE (book.TITLE LIKE '%pride%' OR book.SUMMARY LIKE '%pride%')ORDER BY book.TITLE ASC;

You can use the new filterByText() method to count books matching the string, or to look for authors of books matching the string:

$authors = AuthorQuery::create()  ->useBookQuery()    ->filterByText('pride')  ->endUse()  ->orderByLastName()  ->find();

Tip: If you’re a symfony user, you can even use the new filter method as an admin generator filter with sfPropel15Plugin. Just add the text filter to your list, and you’re done:

# in modules/book/config/generator.ymlconfig:  filter:    display: [text]

Using An Index

The previous approach is naive, because it doesn’t scale. When the book table reaches more than a few thousand rows, a SQL query using LIKE and OR is likely to hit the slow query limit. The usual workaround is to build a table of searchable words, and to use this (indexed) table for full text searches.

So let’s add a book_index table, related to the book table by a book_id foreing key. The index table also features a word column, with an index.

<table name="book_index" phpName="BookIndex">  <column name="id" type="INTEGER" required="true" primaryKey="true" autoIncrement="true" />  <column name="book_id" required="true" type="INTEGER" />  <foreign-key foreignTable="book" onDelete="cascade">    <reference local="book_id" foreign="id" />  </foreign-key>  <column name="word" type="VARCHAR" required="true" primaryString="true" />  <index>    <index-column name="word" />  </index></table>

This table should fill up automatically each time a new Book is added. The simplest way to do this is to use a postSave() hook in the Book ActiveRecord class:

class Book extends BaseBook{  public function postSave(PropelPDO $con = null)  {    // delete previous words from this book    BookIndexQuery::create()      ->filterByBook($this)      ->delete($con);    // build the list of words for this book    $titleWords = preg_split('/\W/', $this->getTitle(), null, PREG_SPLIT_NO_EMPTY);    $summaryWords = preg_split('/\W/', $this->getSummary(), null, PREG_SPLIT_NO_EMPTY);    $words = array_unique(array_merge($titleWords, $summaryWords));    // Save the words for this book    foreach ($words as $word) {      $index = new BookIndex();      $index->setBook($this);      $index->setWord($word);      $index->save($con);    }  }}

Tip: All the database operations inside postSave() use the connection object ($con), to guarantee transactional integrity and better performance.

Now the BookQuery::filterByText() method is even easier to write:

class BookQuery extends BaseBookQuery{  public function filterByText($text)  {    return $this      ->useBookIndexQuery()        ->filterByWord($text)      ->endUse();  }}

And the full-text search engine scales. Besides, the syntax to use the engine didn’t change:

$books = BookQuery::create()  ->filterByText('pride')  ->orderByTitle()  ->find();

Query methods also bring the benefit of easy refactoring, since the actual implementation of a filter is encapsulated inside a method.

Using A Search Engine

The new approach won’t give very accurate results on a text search - don’t use it for a real world application. Most notably, there is no notion of relevancy that could be used to order the results, placing the books most likely to match the query at the top. Also, the index table fills up with lots of useless (because non-discriminating) words like “the”, or “and”. And a search for “man” won’t return any book featuring “men”. And the index soon grows very large, so that even a solid MySQL server can’t handle it with reasonable response time.

This is because there is much more to full-text search than just building an index. From stop words to stemming, the domain of text search is very complex. You can safely assume that you won’t be able to build a good quality search engine on your own - and that means that you should use an existing solution instead.

And there are a lot of cheap search solutions, including many open-source ones. One could use PostreSQL’s excellent Full Text Search capabilities, or Zend_Search_Lucene, or even Google, to provide the search feature for a bookstore. It doesn’t really matter for the present exercise. Let’s just assume that the external search engine supports queries through a Web Service.

Such search engines often return results in XML, including references to the unique identifiers of the indexed documents. In case of a bookstore, the search engine result would probably feature book ids, among other data like an excerpt from the matching content, or the matching accuracy. That would make the filterByText() method look like:

class BookQuery extends BaseBookQuery{  protected static $searchUri = 'http://myengine.mydomain.com/search?q=';  public function filterByText($text)  {    $sxe = new SimpleXMLElement(self::$searchUri . $text, NULL, true);    $bookIds = array();    foreach ($xse->results->book as $book) {      $bookIds []= (int) $book->id;    }    return $this      ->filterById($bookIds);  }}

BookQuery::filterById() accepts an array of primary keys, which translates into a SQL IN (). So the filterByText() method still returns a modified BookQuery matching the $text pattern, even though the actual searching took place somewhere else. And the search API for the books didn’t change:

$books = BookQuery::create()  ->filterByText('pride')  ->orderByTitle()  ->find();

Conclusion

Even if the code samples presented here are more a proof-of-concept than real life implementations, this is a good example of the flexibility of the new Query API introduced in Propel 1.5. Not only can you build complex logic inside a simple query method, you can also reuse this logic very easily, without adding complexity to the public API of the model. And if you can build a search engine in a few minutes, imagine the wonders that can come out if you spend as much time with Propel as you used to do with SQL…