I really like Hibernate. Sure, it can be frustrating at times, the learning curve can be steep, and some of the exceptions you get when you do stupid things aren’t all that helpful. But the fact that it’s frustrating and hard to learn are manifestations of the fact that it’s tackling some really thorny problems. And the cryptic-exception issue does get a little better with each release.
There is one thing, however, that has bothered me since I started using Hibernate several years ago. A deep bother. Not a “this is a nuisance, but I can live with it” bother, but more of a “this is so fundamental an issue that there just has to be a better solution to it than is being offered by Hibernate’s authors” bother.
The problem: bridging the gap between object identity and database identity. In most modern database schemas, tables are defined with a numeric primary key column, which is populated through some unique key assignment mechanism (Identity columns in SQL Server or MySQL, sequences in Oracle, etc).
This works great at the database layer. But problems arise in the JVM (or CLR for you NHibernate folks). In order for objects representing database entities (those managed by Hibernate) to be handled properly in collections (including Hibernate cache), objects that have the same database identity need to be recognized as equals as well. In Java, this means overriding equals() and hashCode() such that any two objects that represent the same database entity will be equal, and return the same hash code.
The “obvious” solution, at first blush, would be to base equals() and hashCode() on entity ID values. This works great for entities created by Hibernate, which represent existing rows in a database, all with unique ID’s. The problem has to do with new entites that you create programmatically, which haven’t yet been saved. Those entities will typically have a null ID value – and if you base equality on ID values, any two unsaved instances will be considered equal.
There are actually two problems when dealing with unsaved entity objects. The first, as stated above, is that they don’t yet have a unique identity – so multiple new instances will collide in collections like Maps and Sets. The other problem is that the identity of these objects will be set at some point, which violates part of the contract of hashCode() – the part that says its value must not mutate over time.
To avoid all of thes problems, Hibernate’s authors have long recommended a workaround – use “natural keys” instead of identity values for equals() and hashCode(). The problem with this is that many entities simply don’t have a natural key – that is, a set of one or more attributes whose values uniquely identify the entity. And those that do often fail the hashCode() qualification that the value should not change over time.
There are other possible solutions I’ve seen or come up with myself, but all had at least one glaring weakness. It seemed that there was no silver bullet for dealing with database-generated identity values in Hibernate.
And therein lies the silver bullet. Don’t use the database (or Hibernate) to generate identity values.
I stumbled across a wonderful article on O’Reilly’s OnJava site that quite clearly describes this problem (in more detail than I do here), and offers what I believe to be the only real solution. Check it out at http://www.onjava.com/pub/a/onjava/2006/09/13/dont-let-hibernate-steal-your-identity.html.
And now, as I look out the window at the late February snowstorm, I think maybe it’s time for me to hibernate…
Good post. I feel like defining the identities one’s self might handcuff Hibertate. I’ll have to read the article you referenced to learn more.
Hibernate does provide the flexibility to do this, but for some reason it seems not to have become standard practice. It is, admittedly, a bit more complicated to set up, and you could introduce some performance problems if you aren’t careful. But it does seem to be the only bulletproof solution to the problem.
Excellent article, although I do not view this as specific to Hibernate. UUID’s are a cool solution. Gotta love the math.