Blog: If your model classes are empty, you didn't get the best of ORM

The Propel Team – 27 August 2010

I see a lot of projects carrying very lightweight model classes: the stub ActiveRecord and Query classes generated by Propel remain empty after a while. This is not only a sign that the developers put the code in the wrong place, but also that they still don’t get a grasp of the ORM paradigm. Let’s see a few examples that illustrate that.

The ActiveRecord Classes Are Where The Record Manipulation Should Be

In a phone book application, a Person has a first name, a last name, a gender, and a marital status. In the view layer of the application, the developer wrote a helper function that displays a Person’s identity, which applies a few presentation rules according to the available data:

function getPersonIdentity($person)
{
  if ($gender = $person->getGender()) {
    if (strtolower($gender) == 'male') {
      $title = 'Mr.';
    } else {
       if ($person->getMaritalStatus() == 'married') {
         $title = 'Mrs.';
       } else { 
         $title = 'Miss';
       }
    }
  } else {
    $title = '';
  }
  if ($person->getFirstName() && $person->getLastName()) {
    return $title . $person->getFirstName() . ' ' . $person->getLastName();
  } elseif ($person->getLastName()) {
    return $title . $person->getLastName();
  } elseif ($person->getFirstName()) {
    return $title . $person->getFirstName();
  } else {
    return 'Mr. Nobody';
  }
}

This helper function could be broken down into several smaller functions to increase reusability. For instance, the line that determines the title of a person could be turned into a standalone getTitle() function:

function getTitle($person)
{
  if(!$gender = $person->getGender()) {
    return false;
  }
  if (strtolower($gender) == 'male') {
    return 'Mr.';
  } else {
     if ($person->getMaritalStatus() == 'married') {
       return 'Mrs.';
     } else { 
       return 'Miss';
     }
  }
}

In this last function, the line that determines if a woman is married could also be isolated:

function isMarried($person)
{
  return $person->getMaritalStatus() == 'married';
}

Note that this piece of code can now be reused for men, while it was only used to determine if a woman was married or not.

The process of isolating functions is good for reusability, but quite bad for code maintenance. All these standalone helper functions pollute the global namespace, and even if they are in a common helper file, they don’t share anything in common with each other. Or do they?

They actually share one important thing: their parameter, a Person instance. This should ring a bell and draw your attention to the ActiveRecord model classes. With a little refactoring, the first helper function can be entirely moved into the Person class, to make it fully reusable – including in parts of the application that don’t have access to the helper functions of the view layer:

class Person extends BasePerson
{
  const SINGLE = 'single';
  const MARRIED = 'married';

  public function hasMaritalStatus()
  {
    return null !== $this->getMaritalStatus();
  }

  public function isMarried()
  {
    return $this->getMaritalStatus() == self::MARRIED;
  }

  public function isSingle()
  {
    return !$this->isMarried();
  }

  const FEMALE = 'female';
  const MALE = 'male';

  public function hasGender()
  {
    return null !== $this->getGender();
  }

  public function isFemale()
  {
    return strtolower($this->getGender()) == self::FEMALE;
  }

  public function isMale()
  {
    return !$this->isFemale();
  }

  public function getTitle()
  {
    if(!$this->hasGender()) {
      return false;
    }
    if ($this->isMale()) {
      return 'Mr.';
    } else {
       if (!$this->hasMaritalStatus() || $this->isMarried()) {
         return 'Mrs.';
       } else { 
         return 'Miss';
       }
    }
  }

  public function getFullName()
  {
    if ($this->getFirstName() && $this->getLastName()) {
      return $this->getFirstName() . ' ' . $this->getLastName();
    } elseif ($this->getLastName()) {
      return $this->getLastName();
    } elseif ($this->getFirstName()) {
      return $false->getFirstName();
    } else {
      return false;
    }
  }

  const UNKOWN_NAME = 'Mr. Nobody';

  public function getIdentity()
  {
    if (!$fullName = $this->getFullName()) {
      return self::UNKOWN_NAME;
    }
    if ($title = $person->getTitle()) {
      return $title . ' ' . $fullName;
    } else {
      return $fullName;
    }
  }
}

That’s a lot of new methods, but now they are bundled together into a single place, and this is where they belong. The developer can unit test them, and reuse them very easily across all the application.

These new methods make the code much easier to read. Even if you don’t know how the gender is stored in the database, you can use the isMarried() method. In practice, these methods abstract the storage structure, and offer an easy-to-use interface to the stored data.

This refactoring to the ActiveRecord class is a pretty basic OOP technique, but many developers tend to oversee it. Some of them come to ORMs with a simple PHP background, and they are not used to spotting which code is part of the model. Some others take the ORM classes as a place to put database queries and nothing else.

The Query Classes Are Where The Queries Should Be

In a CMS application, a Section has many Articles. In order to display the list of latest articles, the developer wrote a getPublishedArticles() in the Section ActiveRecord class:

class Section extends BaseSection
{
  public function getPublishedArticles()
  {
    return ArticleQuery::create()
      ->filterBySection($this)
      ->filterByPublishedAt(array('max' => time())
      ->orderByPublishedAt('desc')
      ->find();
  }
}

But the piece of logic that determines if an article is published or not must be repeated to count the published articles. Or, it could be required in another model, for instance to find the published articles by an author. Therefore, this piece of logic should be written in the ArticleQuery class. After all, it’s an Article filter:

class ArticleQuery extends BaseArticleQuery
{
  public function published()
  {
    return $this->filterByPublishedAt(array('max' => time());
  }
}

This new method already shows a great virtue : you can unit test it. Also, it has a meaningful name, that expresses domain logic rather than storage logic. Imagine if the published_at column was named art_pub_date, and you will get a better idea of the benefit of a meaningful name.

Now the ActiveRecord method is easier to write and read:

class Section extends BaseSection
{
  public function getPublishedArticles()
  {
    return ArticleQuery::create()
      ->filterBySection($this)
      ->published()
      ->orderByPublishedAt('desc')
      ->find();
  }
}

The developer should even go further and package all the filtering logic into the Query class:

class ArticleQuery extends BaseArticleQuery
{
  public function published()
  {
    return $this->filterByPublishedAt(array('max' => time());
  }

  public function recent()
  {
    return $this->orderByPublishedAt('desc');
  }

  public function recentlyPublished()
  {
    return $this->recent()->published();
  }
}

Also, the generated BaseSection::getArticles() method already filters by the current Section, and terminates the query: these pieces of code should be reused rather than rewritten. And since the generated Foreign Key getters accept a Query object as parameter, you could write the getPublishedArticles() method in a single line:

class Section extends BaseSection
{
  public function getPublishedArticles()
  {
    return $this->getArticles(ArticleQuery::create()->recentlyPublished());
  }
}

So what happened here? The ActiveRecord class got stripped of most of its code in favor of the Query class. And the ActiveRecord class no longer manipulates columns. It deals with expressive filters rather than database conditions.

Here is a good rule of thumb: If you’re using filterByXXX(), orderByXXX(), useXXXQuery(), or find() in an ActiveRecord class, you should probably move some code to the Query class. Your ActiveRecord classes should only use meaningful filters, and let the Query classes offer reusability and testability to filtering logic.

The Model Classes Are Where The Model Logic Should Be

Breaking down large methods is just an OOP technique that favors reusability. Moving methods to the model classes is just an OOP technique to package code in a logical way. But there is more to ORMs than simple programming techniques.

The result of the refactorings illustrated in this post is a set of classes that carry domain logic. They translate a set of rules – how to form a title, how to extract published articles – into simple methods with expressive names. Names that even the final customer can understand.

The refactorings actually ended up into an API to the project’s domain logic. This is what make a true Domain Model.

After a while, developers who repeat this kind of refactoring change their coding habits. They don’t start with data in a table with a PHP interface ; instead, they start by designing an object model to the customer’s domain. They see the database storage of domain objects as a simple consequence of a need for persistence.

An ORM is just a set of tools helping developers to write their domain logic more easily. Don’t let the relational databases get in your way; think Object-Oriented Programming, and embrace the Domain-Driven Design paradigms.