Monday, November 02, 2009

Databases and Separation of Concerns

A continuing source of pain on projects is Object Relational Mappings and databases. A contributor to this pain is the mixture of two concerns, this mixture seems to occur on nearly every project that uses a database. I think spending a moment to think about these two different kinds of usage is worthwhile. So what are these two concerns? A. Persist State Needed for Recovery This is just working state saved so that when the application restarts we can continue processing. For example perhaps the current state of a customer order or work in progress on a very long running calculation. B. Data saved for Reporting and Querying This is data saved so it can be queried later on, perhaps to allow end of month reports to be generated or tracking of user trends. It is not needed to recover the working state of an application. Many teams try to overload (A) to achieve (B), the sorts of problems this can cause are i) Data Volume - the volumes of data needed for (A) tend to be smaller, it's current working data as opposed to historical data. This can show itself as performance issues for the application as queries become slow over time. ii) The object design needed for (A) and (B) is not necessarily the same. This often shows itself as fields or object relationships being created with names like "history" or "recordOf", so an object design created for (A) becomes overloaded with things needed for (B). This again causes performance issues as the number of objects and data getting pulled into memory by the ORM can increase. It also means a simple state update can touch a lot of tables as we try to achieve (B) at the same time, chances are the indexes needed for historical queries can start to impact the update speed for these. iii) Confusing Code As with any other area where we fail to achieve separation of concerns the code can become confusing, for example state change operations become implicitly overload to create and persist data needed for historical reasons. iv) Archive of historical data becomes problematic, so you can't cleanly identify what data in the DB can be safely moved out to a historical database or deleted without impacting the functionality of the application itself. It wont always help but separation of these two concerns can provide clarity and address some kinds of performance issues. I certainly think it's worth calling out these concerns as different kinds of requirement even if you end up using the same implementation for both.

0 comments: