architecturemessagingshared-data

Shared Database vs. Messaging Architecture


I was down the pub with a friend of mine yesterday and we started discussing the architecture in use at the company he works at. The conversation basically surrounded the pros/cons of a shared database architecture against a distributed independent application architecture - we couldn't get to a consensus in which case I'd like to hear people's opinions on the pros/cons of both approaches.

Basically, the company that he works for has a large architecture with many different applications. Some applications have a single database that they share between them. For example, there is 1 application which provides a UI for users to alter reference data. This reference data is used by another application which also accesses the same data. I believe the code is actually written as shared libraries (i.e. both applications will use a common code set that is redeployed for each (one has it as a dependency)).

There are also other applications with a database that is also used by other applications by direct JDBC connection with data access code (not common between the two apps - duplicated!! erghh!).

My question is around the pros/cons of this architecture vs. an architecture where each application contains it's "master" data in silo. If an application x requires data from application y they use web services or some messaging technology to receive that data.

The messaging approach would introduce a problem whereby reference data 'codes' (or foreign keys) which are used within the db's of other applications currently now have to be fetched from another source. In the current architecture the 'decodes' for these can change at any time and be reflected in the external application immediately, rather than having to have a master/slave relationship where data is copied - or an alternative where application x has to query application y just to display the decode values.

I had read Enterprise Integration Patterns and whilst it does give some examples of the advantages of messaging - i'm not so convinced.

Thanks Iain


Solution

  • The advantages of message-based integration over a shared database are difficult to articulate, but will here be attempted:

    There is the inevitable argument where the DBAs want to model all the relationships between the entities so that the data is 100% consistent at all times. On the other hand, you have the developers warning the DBAs about tight-coupling that emerges from monolithic architecture and how applications bound to master tables cannot be changed easily.

    I think both of these arguments are kind of scratching the surface and building a system which is easy to change is challenging, regardless of how you do the integration. I want to put forward another kind of argument for SOA and message-based integration.

    What it comes down to is this:

    1. Shared database integration is generally driven by a "big system" view of the world.
    2. Message-based integration is generally driven by a "small system" view of the world.

    How many times have you come across large systems with hundreds of users which do many, many different jobs supporting multiple and diverse business functions? I come across them all the time. They are the staple of enterprise software at the moment it seems.

    One thing all these systems seem to have in common is that they are very expensive to change. And one of the reasons for this is, as Joe R says in his answer, tight coupling.

    However, coupling is something of a loaded term and I think there are two very different types of coupling we need to consider.

    The first can be thought of as technological coupling and this means vertical coupling inside the technology stack, usually n-tiered, between one tier and another tier.

    So we have coupling between the database and data access layer of an application, coupling between the data access layer and business logic layer, etc. To regard such coupling as bad or wrong seems to be generally accepted, but thinking rationally should we not expect, or even welcome, a high degree of coupling between, say, the User DTO, the UserRepository class, and the User database table?

    Let's consider what coupling actually means at the implementation level. Coupling happens when concepts which "belong" to one thing leak into another thing. This leakage is inevitable when you have multiple layers basically talking to each other about the same business entity.

    The second kind of coupling, and the one I'd like to address, can be thought of as business capability coupling, also known as horizontal coupling. This is where we have concepts belonging to one business capability leaking into another business capability.

    It is my assertion that this horizontal coupling is encouraged by the use of databases as an integration platform.

    As an example, imagine a typical back-end system supporting an e-commerce website system. You would generally have inventory, ordering, pricing, and CRM as your core capabilities.

    If we model this domain inside a single database, we are in effect coupling different capabilities together. Every foreign key constraint potentially increases the degree of coupling between these capabilities. In fact, the system by this point can already be thought of as several different "services" integrated across a shared database.

    This is the "big system" picture of the world, which is supported and encouraged by linking different areas of your enterprise together using huge 500+ table databases.

    Contrast this with the "small system" picture of the world, where in our example back-end web application inventory, ordering, pricing, and CRM are completely separate applications, with their own technology stacks, their own project teams, their own release schedules, and their own databases.

    Each application, or service, will have its own understanding of what a given entity is, and that will fit the definition of that entity according to the business capability it is supporting.

    An example of this is the "User". CRM are going to have a very different definition of User than ordering or fulfilment. Ordering only cares about the user in terms of what the user is buying. CRM cares about other stuff like customer buying patterns, and fulfilment cares about name, address, etc. This is not easily achieved with a single User table in a shared database.

    This picture to me is preferable to the shared database route and the main reason is that the resulting system will better model the actual business processes it is supposed to be supporting. One of the main tenets of DDD is that a system should resemble as much as possible the business who owns it.

    In a typical business, these various capabilities are not implemented across the layers of big, enterprise-spanning teams, but instead by small vertical teams, often completely autonomous, who communicate among themselves and with other vertical teams often by sending requests, directives, or by letting other teams know that a certain process or task has been started/completed etc.

    OK, but without the shared database, the website now relies on data from all of these different services for it's UI. It still needs to display this stuff together on the same screen. How can the website "presentation" layer assemble all this and render it to the UI?

    More importantly, what if CRM wants to know when a customer orders something? What if ordering wants to know when the prices of products change, or when products are out of stock in the inventory? If these services are completely separate then how can they exchange data?

    Addressing the UI question first, this can be done with composite UIs. There are many techniques for this, but suffice to say it's a relatively well-known landscape and not really our focus here.

    The second question of how do these services communicate is, well, they exchange messages. What kind of messages? Events. Events are published by one system in order that they are consumed by any other system which is interested in that event.

    In our e-commerce example, kinds of events could be:

    1. OrderPlaced
    2. CustomerUpgradedToGold
    3. ProductDiscounted
    4. StockExhausted

    These events have business meaning. That means we can get an additional benefit with the small system approach which is that the integration medium itself has business meaning, and can be expressed in business language, which lends itself well to scrum and agile methodologies.

    So, to finally answer the OP's question, I don't think that from a technological perspective there is much difference between Shared Database vs Messaging integration approaches. Both approaches require the same kind of abstractions and semantics. But I do think there is a huge difference in the driving forces behind them, and the outcomes of adopting more of a small systems mindset provide better business value overall.