A month back we started migrating an enterprise time keeping and invoicing system to GAE. Initially, the application was deployed at the enterprise on tomcat. The opportunity was to port the application to GAE and utilize the public cloud space. The choice about GAE was based on the expected ease of migration and then not having to worry about any infrastructure maintenance. As you would have read multiple times across multiple sites, the google datastore is not an RDBMS. The App Engine datastore is a schemaless object datastore, with a query engine and atomic transactions. The people behind GAE have done a good job of abstracting out the details of the underlying implementation by supporting JDO and JPA. However, the truth still remains that not all the functionality of JPA/JDO can be supported because it is not an RDBMS under the hood. Keeping this in mind, we went ahead with the migration.
Let us look at what the partial relationship model looked like before we started migrating. This is a bare minimum stripped down version of the model relevant for this post and easy to understand since it is something that we all can associate with.
As you would notice there is a 1:n relationship between the Department and the User. Likewise there is a m:n between the User and the UserRole. Again the User can have multiple ProjectAssignment(s) and so on. With JPA/JDO it would be easy to map these relationships and then ensure that the entities are fetched when a query is done.
So far so good with the traditional way of thinking. But, if you apply the same thinking with the datastore world, then expect trouble.
The datastore has the concept of Entity groups. Entity groups are hierarchy of relationships between entities.
To create an entity in a group, you declare that the entity is a child of another entity already in the group. An entity all by itself without a parent is a root entity. A root entity without any children exists in an entity group by itself.
Each entity has a path of parent-child relationships from a root entity to itself (the shortest path being no parent).
So for starters let us consider this, what if you want to get the list of all the active Users in the system with the traditional query like
getEntityManager.createQuery(“SELECT FROM User WHERE active=true”);
With datastore you cannot do this. Why? Because you are trying to refer to more than one Entity group within a single transaction. The beauty of Entity groups is that since datastore has to support a massively scalabale architecture, one Entity group could be present on one node and the other one miles apart on a different node.
There are other issues too with the relationships. For example if the Department has an Id which is Long, can the User also have an Id which is Long?
Yes, you would say instantly is you have just coded an application for JPA/JDO. The answer for the datastore is NO. Why?
The datastore has to store a reference of the parent-child relationship in the child. Hence it needs to have a field which is an Id in the child where it can store the Long value of the parent. So what could this field for the child be? Well, it could either be a String encoded key or the Key of the datastore.
Ok, let us look at more. If you look at the ProjectAssignment entity in the above diagram, then you would see that it has a relationship with both the User and the Project. It has a 1:n relationship with both of them. Sounds fine, right?
Not in the datastore world. In the datastore world, a child can only have one parent. What this translates to is that the ProjectAssignment can either hold the key for the parent User or the parent Project.
So how do we resolve these issues with the relationships?
The simple answer to the hard question is that you dont solve for relationships, you simply remove them. Now, if you look at the whiteboard image again, you would see a lot of places where we have a double line on the lines showing the relationship. These are the relationships that we had to break because of one issue or the other.
And then how do we maintain relationships?
Datastore provides an easy mechanism called unowned relationships where you can maintain relationships manually. You can still manage these relationships using Key values in place of instances (or Collections of instances) of your model objects. The caveat is that the datastore does not guarantee referential integrity with these Key references, but the use of Key makes it very easy to model (and then fetch) any relationship between two objects.
So you might ask what about the relationships that we have not discarded, the ones between Department and User say for example. Well we broke that too today.
The reason is that once you have associated a User to the Department they belong to the same entity group. Once you have a entity group relations you cannot change them. Aargh! So if for hypothetical reasons you are transferred from the Development department to the HR department then we cannot do that with the current mapping structure in datastore. That is the reason that we had to break that relationship too and make it unowned.
The conclusion is that we are quite certain that for a non-trivial application like the one that we are migrating, involving invoices and payments too, unowned relationships is the way to go. Our bet is that with the complex queries that you would be firing and entity groups that you would like to traverse, it would make more sense to plan your model with unowned relationships rather than owned irrespective of whether you want to use JPA or JDO.
Please leave your thoughts on this if you have been able to work with relationships in a non-trivial manner without the need to break them.