Blog: Introducing archivable behavior, and why soft_delete is deprecated

The Propel Team – 29 August 2011

Propel 1.6 introduces a new behavior called archivable. It gives model objects the ability to be copied to an archive table. By default, the behavior archives objects on deletion, acting as a replacement of the soft_delete behavior. So why is it exciting?

The idea behind the soft_delete behavior

The soft_delete behavior promises to "override the deletion methods of a model object to make them 'hide' the deleted rows but keep them in the database." In order to achieve this soft deletion, the behavior adds a deleted_at column to the table it is applied on, and sets this column to the current date whenever an object is deleted:

$book->delete();
// translates into MySQL as
// UPDATE book SET book.deleted_at = CURTIME() WHERE book.id = 123;

Also, it modifies the read queries to ignore all objects having a not null deleted_at column value, so only not deleted objects show up.

$books = BookQuery::create()->find();
// SELECT * FROM book WHERE book.deleted_at IS NULL;

It makes it easy to recover deleted objects (by setting the deleted_at date to NULL again), and to select the soft deleted objects for further treatment.

soft_delete behavior shortcomings

It looks smart at first sight, but in reality the soft_delete behavior is flawed. If you have been using it a lot, you may have already spotted some of it shortcomings:

  • Performance: Deleted objects are still present in the table. The table size increases even when you delete objects to clean it up. So read queries become slower when the number of records in the table grows. Adding new indexes to the table to optimize queries can be counter-productive if you forget to put the deleted_at column in the indexes, because every query includes a snippet looking like WHERE deleted_at IS NULL.
  • Doesn't work all the time: Read queries sometimes return objects that have been soft deleted. This usually happens when using paginate(), or joinWith(). It all comes down to the algorithm used to "hide" deleted objects: add a WHERE deleted_at IS NULL condition to the next query. It doesn't work if there is more than one query (which is the case for paginate()) or for joined tables (which is the case for joinWith()).
  • Not compatible with unique constraints: Whether you have a non-autoincremental primary key, or a unique index, the addition of a delete_at column breaks you model.
  • Not consistent across database vendors: If two tables have the soft_delete behavior, and share a foreign key with ON DELETE CACADE, you may expect that soft deleting in the main table also soft deletes related records in the second table. It works on SQLite, or if you set propel.emulateForeignKeyConstraints to true in the build.properties. But it doesn't work by default on MySQL for instance.

There are more shortcomings in some very specific use cases. For instance, should the postDelete() event be fired after a soft deletion? One could say yes, one could say no, and both would be right in their particular use case.

All these shortcomings are not specific to the Propel implementation. In fact, the soft_delete behavior for Doctrine 1, the paranoid behavior for Propel in symfony 1, various attempts at doing the same for Doctrine2, and the source of them all, the acts_as_paranoid behavior for Rails ActiveRecord all have the same flaws. The shortcomings come from the principle of the added deleted_at column, and can't be fixed by implementation.

I'll write that again: the soft_delete behavior can't be fixed. It's a leaky abstraction. To achieve the same functionality, the paradigm must change.

Introducing the archivable behavior

If you want to be able to recover deleted objects, a better idea would be to put these into another repository. In database terms, this translates to "copy records to another table". This is the idea behind archivable. It provides a new ActiveRecord method, archive(), to persist a copy of the current object into an archive table.

$book = new Book();
$book->setTitle('War and Peace');
$book->save();
// INSERT INTO book (title) VALUES ('War and Peace');
$book->archive();
// INSERT INTO book_archive (id, title) VALUES (123, 'War and Peace');

Here, the archive() method first checks the existence of a book_archive record with the same primary key. If found, it updates it; if not found, it inserts a new one. In both cases, the book_archive record columns copy the values from the book record.

The archive() method is easy to trigger before the deletion; as a matter of fact, the archivable behavior does that on delete():

$book->delete();
// INSERT INTO book_archive (id, title) VALUES (123, 'War and Peace');
// DELETE FROM book WHERE book.id = 123;

Even if delete() triggers archive(), it is easy to bypass the archivable and do a hard delete:

// hard delete a book
$book->deleteWithoutArchive()

None of the shortcomings described above plague the archivable behavior, because a call to delete() actually deletes records from the main table. So the archivable behavior can be seen as a fixed soft_delete behavior providing the same functionality.

The archivable behavior landed in the Propel 1.6 branch last week, and will be bundled in the next stable version (1.6.2). As for every behavior included in the Propel core, it is heavily (unit) tested, and fully documented (see the archivable behavior documentation).

More than soft deletion

The merit of a good idea is that it offers more benefits than just the reason for its creation. archivable covers the requirements of soft_delete, and much more.

First, you can archive an object without deleting it. Whether you need a copy for important actions, or to get a backup for dangerous operations, archiving is often a good idea.

// archive an existing book
$book->archive();
// the book still exists in the main table
echo $book->isDeleted(); // false

Of course, you probably already have a backup of the whole database somewhere to avoid data loss. But archivable provides a more atomic way to archive data, and to recover it:

// find the archived version of an existing book
$archivedBook = $book->getArchive();
// populate a book based on an archive
$book = new Book();
$book->populateFromArchive($archivedBook);
// restore an object to the state it had when last archived
$book->restoreFromArchive();

The Query class also gets an archive() method to archive a set of objects:

// archive all books by Leo Tolstoi
BookQuery::create()
  ->useAuthorQuery()
    ->filterByName('Leo Tolstoi')
  ->endUse()
  ->archive();

You can override the archive() method to add custom logic, for instance to also archive related objects:

class Book extends BaseBook
{
  public function archive(PropelPDO $con = null)
  {
    // archive the book reviews
    BookReviewQuery::create()
      ->filterByBook($this)
      ->archive($con);
    // archive the current object
    return parent::archive($con);
  }
}

This method is called both when archiving a single object, and when archiving a set of object through the Query's archive() method.

You can even store the archive table in another database than the one containing the main table. The archive database can be in another server, or even in another datacenter, providing additional security to your archived data.

Why did you delete objects in the first place?

"Archive" is a better abstraction than "soft delete". But not all the time. Why did you want to keep deleted objects in the first place? You probably used the "delete" verb because your model didn't provide a better verb to suit your requirements. In that case, you should enhance your model to give it the ability you need, and leave delete() alone. In other terms, improve your domain model. Udi Dahan, who often blogs about Service-Oriented Architectures and Domain Models, puts it this way:

Orders aren’t deleted – they’re cancelled. There may also be fees incurred if the order is canceled too late.

Employees aren’t deleted – they’re fired (or possibly retired). A compensation package often needs to be handled.

Jobs aren’t deleted – they’re filled (or their requisition is revoked).

So there are many ways to circumvent the soft_delete shortcomings, and probably not good reason to keep using this behavior. For the better, soft_delete is now deprecated in Propel 1.6.