Realsolve Logo
Articles
 

Avoiding ORM Performance Pitfalls

Tips for making your Hibernate applications perform well.

By Phil Zoio, 4 September, 2006

Object Relational Mapping (ORM) libraries such as Hibernate and Toplink provide powerful tools for bridging the divide between your Java objects and their persistent representation in a relational database. By freeing you from the responsibility of handcrafting SQL and other data access code, ORM can provide a tremendous productivity boost for your project. But this act of shielding the developer from having to deal directly with data access logic brings a significant danger. The use of ORM entails a loss of visibility, and to an extent control, over the precise nature of the interactions between your application code and the database. The result can be a substantial reduction in your application's performance. On the other hand, with an intelligent and careful approach, you can not only mitigate against this risk, but take advantage of ways to improve performance which are not easily available to JDBC-based applications.

Relational Tables and Object Graphs

At the heart of ORM performance problems is a fundamental conceptual difference in the way persistent entities are represented in the relational world versus the object world. In the object world, entities are accessed by navigating object associations.

Lets consider a domain most of us are familiar with, that of a large organisation, consisting of individuals belonging to departments grouped within divisions. Our relational model of this domain can be expressed through tables related using foreign key constraint. For example, in our employee example, the EMPLOYEE table has a foreign key DEPARTMENT_ID which correlates to primary key entries in the DEPARTMENT table.

Suppose we wanted to list employees in the organisation. In a relational world, we do this by issuing a select statement on the EMPLOYEE table:

select emp.name from employee emp order by emp.name

If we wanted department or division details for employees, we do a join on the DEPARTMENT and DIVISION tables.

select emp.name, dept.name, div.name, org.name
from employee emp
inner join department dept on employee.departmentId=dept.id 
inner join division div on dept.divisionId=div.id 
inner join organisation org on div.organisationId=org.id 
order by emp.name

In a pure object world, we'd have a different approach. We'd use object association (expressed in the form of a getter) to retrieve a collection of divisions for the organisation. We'd iterate through each collection to obtain the collection of departments. From each department we'd then obtain the list of employees for a similar iterative operation. In code it would look something like this.

for (Division division : divisions)
{
    Set<Department> departments = division.getDepartments();
    
    for (Department department : departments)
    {
        Set<Employee> employees = department.getEmployees();
        for (Employee employee : employees)
        {
            doSomethingWithEmployee(employee);
        }
    }
}

From a performance point of view, this method of data retrieval can be very inefficient in a relational world, if applied naively. If each entity retrieval involves a separate SQL select operation, we'd need a very large number of SQL statement to retrieve employee data for even a relatively small organisation.

Fortunately, good ORM solutions provide powerful techniques for allowing object data to be retrieved efficiently from a relational database. How to use these techniques and avoid the surrounding pitfalls is what this article is all about.

The Example Application.
For concrete examples of the some of the points described below, take a look at the example application. The examples use Hibernate and Spring DAOs, and include a number of JUnit test cases.

Tip 1: Use and Master The Query API

ORM libraries provide two mechanisms for data retrieval. The one we have already seen is retrieval through mapped associations. With Hibernate, these are configured using mapping files which map Java getter and setter methods to database relationships. For example, the one to many relationship of departments to employees is specified using the following entry:

<set name="employees">
    <key column="departmentId" />
    <one-to-many class="Employee" />
</set>

and the corresponding entry for employee end of the relationship is shown below:

<many-to-one name="department" column="departmentId" class="Department" 
    not-null="true" lazy="true">
</many-to-one>

Data retrieval through relationship navigation is used to retrieve the employees in the example we saw in the previous section. In fact, in this example it is possible to navigate to any object using associations alone. However, data retrieval using association mappings alone is not efficient. For this reason, good ORM libraries provide powerful query APIs whose sole role in life is efficient data access. Hibernate's query API, for example, offers much simpler mechanisms for commonly performed operations while also allowing you to leverage most if not all of the power of the underlying database's SQL API.

The Query API of all good ORMs allow you to selectively retrieve collections of persistent objects from an arbitrary point in the relationship hierarchy, rather than having to navigate to it via associations. For example, to retrieve the list of employees in our organisation, we could use the following code:

Session session = null;
try
{
    session = getSession();
    List<Employees> employees = session.createQuery("from Employee").list());
}
finally
{
    closeSession(session);
}

For OO purists, the query API is a compromise, because it departs from the oft-stated goal of transparent persistence. However, for efficient data access some mastery over the query API is necessary. Query APIs typically offer a number of capabilities which can have performance benefits, such as:

Fetching using joins deserves particular attention as one of the most important capabilities of the query API, which should be put to use in any non-trivial ORM application that expects decent performance. We discuss this next.

Good ORM libraries have sophisticated query APIs which allow you to selectively and efficiently retrieve and update objects and their underlying data. Prefer to use queries over mapping relationships for data retrieval, and put in the effort required to become proficient in the query API.

Tip 2: Use The Fetch Join

SQL joins are commonly used to efficiently retrieve data from different tables of a relational database. Good ORM solutions support joins by allowing objects from multiple persistent classes or entities to be loaded using a single query. Hibernate calls this a Fetch Join, while Toplink calls this Join Reading.

The fetch join allows fetching a significant portion of the object graph using a single SQL select. Once this has occurred, the objects can be accessed in a way that is natural in the object world, as shown in the previous example.

In our previous employee example, the fetch join can be used to optimise data access. Using Hibernate, the full list of employees, together with the associated department and division data can be retrieved using the HQL:

from Organisation organisation 
join fetch organisation.divisions division 
join fetch division.departments department 
join fetch department.employees 
where organisation.id = ?

The corresponding SQL for the query would be:

select organisati0_.id as id0_, divisions1_.id as id1_, department2_.id as id2_, 
employees3_.id as id3_, organisati0_.name as name0_0_, 
divisions1_.name as name1_1_, divisions1_.organisationId as organisa3_1_1_, 
divisions1_.organisationId as organisa3_0__, 
divisions1_.id as id0__, department2_.name as name2_2_, 
department2_.divisionId as divisionId2_2_, 
department2_.departmentId as departme4_1__, 
department2_.id as id1__, employees3_.name as name3_3_, 
employees3_.departmentId as departme3_3_3_, 
employees3_.employeeId as employeeId2__, 
employees3_.id as id2__ from organisation organisati0_ 
inner join division divisions1_ on organisati0_.id=divisions1_.organisationId 
inner join department department2_ on divisions1_.id=department2_.departmentId 
inner join employee employees3_ on department2_.id=employees3_.employeeId 
where organisati0_.id=?

Certainly not pretty, which explains a good part of the appeal of the much more concise HQL!

A key to efficient ORM applications is that the object subgraph populated using the fetch join contains exactly the data that the application will subsequently use. If the fetch retrieves more than is required, this means that the database is doing more work than is necessary for the requirements of the application. If it retrieves less than is required, then extra selects will be needed to retrieve the missing data.

Again, this requirement is a something of a compromise, because the data access layer needs to be written with one eye on how the end user will use the retrieved objects. For efficient ORM applications, this compromise is necessary.

The HQL shown above would not be necessary if you were only interested in retrieving divisions and departments for the organisation. In this case, the HQL used would be of the form:

from Organisation organisation 
join fetch organisation.divisions division 
join fetch divisions.departments 
where organisation.id = ?
Use the fetch join (as it is called in Hibernate) to selectively retrieve the data your application needs on a per use case basis while avoiding the N+1 selects problem.

Tip 3: Use Lazy Initialisation

Implementing efficient fetch strategies using ORM relies on the ability to lazily initialise associations. Without lazy initialisation, it would not be possible to retrieve subgraphs of associated objects. Let's return to our previous example, where we only want to retrieve the divisions and departments from an organisation. Without lazy initialisation, all the employees for each department would be loaded, even though they are not required.

Hibernate supports lazy initialisation of associations through an attribute in the mapping file. For example, the one to many relationship of departments to employees can be explicitly specified as lazy as shown below:

<set name="employees" inverse="true" lazy = "true">
    <key column="departmentId" />
    <one-to-many class="Employee" />
</set>

The corresponding entry for employee end of the relationship is shown below:

<many-to-one name="department" column="departmentId" class="Department" 
    not-null="true" lazy="true">
</many-to-one>

In Hibernate, lazy initialisation involves the use of a proxy to represent an uninitialised association. For collection associations, such as the employees property of Department, is an instance of a Hibernate Collection class. For single value associations, such as Employee.department, the proxy takes the form of an a subclass of the related persistent class (Department), generated at runtime using the CGLIB byte code enhancement library.

You may be asking when should you use lazily initialized associations, and when you should use eagerly initialised associations. For performance reasons, the answer is straightforward. Apart from one notable exception, which we discuss in the Cacheing section, lazy initialisation should always be used.

Versions of Hibernate to version 2.x used eager initialisation of associations by default. This was a mistake, which was corrected from version 3 and later, which use lazy initialisation by default. Hibernate 3.x makes it simpler to implement efficient data access as long as you follow the mantra: avoid using the lazy = "false" attribute. There is one exception to this rule, when cacheing is used. As we will discuss, cacheing allows eager initialisation of proxies to be achieved via in-memory lookups, instead of SQL selects, bringing performance gains in certain circumstances.

Apply lazy initialisation of object relationships, except where there is a very good reason to do otherwise.

Tip 4: Avoid The Open Session In View Pattern

When using lazily initialised associations, it is only a matter of time before you encounter a LazyInitializationException. This is especially true when developing web applications which use Hibernate persistent objects in the web presentation tier. LazyInitializationExceptions arise when you attempt to use objects which have not been initialised. In our sample application, an example would occur if we attempted to display employee data after returning from the business tier a collection of departments without the employee association initialised.

From a users point of view, the result is pretty ugly; developers will understandably look for ways to avoid this occuring. A commonly used technique used to "solve" the LazyInitializationException pattern is the Open Session In View Pattern. To understand how it works, consider what happens when your application code attempts to use an uninitialized proxy. If the Hibernate session which initially loaded the object containing the proxy is still open, then an extra SQL select is issued to load the associated object or collection. This will not be the case in the web tier of multi-tiered application with a strict separation of tiers. The session that originally loaded the entity is no longer available to finish the job, and the result is a LazyInitializationException. The Open Session In View Pattern works by holding the session open during web view rendering. It is typically implemented using a Servlet filter. An example is Spring's OpenSessionInViewFilter.

Open Session In View should be regarded as an anti-pattern; not only because it breaks the encapsulation of data access from the web tier, but because it allows for inefficient data access if not used sparingly. Reliance on Open Session in View can easily result in large number of SQL selects being executed from the web tier, a reappearance of the N+1 Selects problem. If used, the Open Session In View should be considered a last resort, with resulting data access in the web tier considered a bug in the application.

Do not place too much reliance on the Open Session in View pattern. Use it only as a last resort to avoid more embarrassing user interface bugs.

Use Appropriate Cacheing

One of the most powerful features of good ORM implementations is cacheing. Cacheing allows retrieval of certain data via SQL calls to the database to be replace with in-memory lookups. In large applications, the database is typically hard to scale and is often a performance bottlenec, so the performance benefits of cacheing can be considerable. Broadly speaking, there are two types of cacheing. Distributed cacheing can be used to cache objects across multiple JVMs, and is suitable to for read-write data. Local, or process level cacheing, is more suitable for read-only or infrequently updated data, and is only suitable for cacheing frequently updated data in the context of a single JVM.

The really low hanging fruit is to be found with process level cacheing of read-only data, or data which is only updated periodically. Virtually every database applications contains reference tables, whose data is read-only or infrequently data. Let's consider the employees in our Hibernate example application. Each employee has a nationality which is recorded as one of the entries in the Country table. The Hibernate mapping entry in Country.hbm.xml is shown below:

<many-to-one name="country" column="countryId" class="Country" not-null="false" lazy = "false">
</many-to-one>

Now suppose we now want to populate the nationality association in our earlier query which returned employee data. We could achieve this using the extra fetch join condition:

... join fetch employee.country

The result, of course, is an extra join in the executed SQL select statement. There is a problem with this solution. The database is having to do more work than is necessary, because each time it performs a join on the Country table it returns the same data. A better solution is to turn on cacheing for the Country table, and to eagerly load nationality association. We do the latter by adding the lazy = "false" attribute to the association's mapping file entry, as shown below:

<many-to-one name="country" column="countryId" class="Country" 
    not-null="true" lazy = "false">
</many-to-one>

Setting up cacheing requires a few more entries, but is very straightforward:

With cacheing set up, the first time the employee data is retrieved, a number of SQL selects will be issued to retrieve the Country data. However, subsequent requests will result in no additional recursive SQL, since the eagerly loaded association data will be retrieved from memory, specifically, from Hibernate's "second-level cache".

Use of cacheing in this case allows us to simplify our query without suffering from the N+1 selects problem (once the first load has occurred). For entities which have relationships with many reference data entities, the benefits in terms of performance and simpler queries can be significant.

Simplify your queries and improve performance at little cost by locally cacheing reference object, that is, data backed by read-only or infrequently updated objects.

The usage of the second form of cacheing, that of distributed read-write cacheing, can provide additional benefits for certain applications. A full discussion is beyond the scope of this article. It is worth noting that maintaining a distributed cache can involve significant overhead, and may involve substantial effort, know-how and testing to configure correctly for best performance.

Tip 5: Use Logging

Because ORM tools do a significant amount of data access behind the scenes, interactions with the database are less visible than with JDBC applications. As a developer you can respond in two ways. Either you leave the job to the ORM library, trusting it to do the job for you, or you can take a close interest in what it is doing, and make sure it is working correctly and in the way you expect it to. For applications that require good performance, the only option is to turn on logging during development.

With Hibernate, the simplest way to enable logging is to use the hibernate.show_sql property. This prints to the console, line-by-line, the SQL executed by Hibernate. Enabling SQL logging can be particularly useful in identifying situations in which an unexpected large number of queries are being executed.

More sophisticated logging, which also captures input parameters, is possible through a tool such as p6spy.

Turn on logging, and use this to understand better how the ORM library is accessing the database, and in particular, how often it is doing so.

Tip 6: Optimise Selectively

The tips described in the previous sections describe some habits that are helpful if adopted as part of the standard development process. Using these strategies, it should be possible to develop applications which with decent performance, whose performance should match or at worst not significantly lag behind that of applications created using JDBC. At times, special tuning strategies may be necessary to achieve higher levels of performance. It may be necessary, for example, to tune individual queries. Mapping and configuration files entries can be adjusted. Further discussion on these kinds of changes is beyond the scope of this article. It is worth noting that the benefits of advanced optimisation techniques will be of little use unless they are backed up by sound development practice.

Get the basics right first, and only then attempt to apply more esoteric performance tuning optimisations.

Summary

Object Relational Mapping tools can offer a tremendous boost in productivity to Java projects which talk to a database over plain JDBC methods. However, if applied without thought to performance implications, they results can be disastrous in terms of the application's ability to perform well and scale. In this article we've recommended some practices which can be very helpful in allowing you to recognise the performance pitfalls often associated with ORM applications, and in doing so, develop ORM applications with performance which rivals or even exceeds that of JDBC-based applications.

This is a working document. If there are any errors, or if you disagree with anything which I have said, or have any suggestions for improvements please email me at philzoio@realsolve.co.uk.