Jazoon 2012: CQRS – Trauma treatment for architects

4. July, 2012

A few years ago, concurrency and scalability were a hype. Today, it’s a must. But how do you write applications that scale painlessly?

Command and Query Responsibility Segregation (CQRS) is an architectural pattern to address these problems. In his talk, Allard Buijze gave a good introduction. First, some of the problems of the standard approach. Your database, everyone says, must be normalized.

That can lead to a couple of problems:

  • Historic data changes
  • The data model is neither optimized for writes nor for queries

The first problem can result in a scenario like this. Imagine you have a report that tells you the annual turnover. You run the report for 2009 in January, 2010. You run the same report again in 2011 and 2012 and each time, the annual turnover of 2009 gets bigger. What is going on?

The data model is in third normal form. This is great, no data duplication. It’s not so great when data can change over time. So if your invoices point to the products and the products point to the prices, any change of a price will also change all the existing invoices. Or when customers move, all the addresses on the invoices change. There is no way to tell where you sent something.

The solution is to add “valid time range” to each price, address, …, which makes your SQL hideous and helps to keep your bug tracker filled.

It will also make your queries slow since you will need lots and lots of joins. These joins will eventually get in conflict with your updates. Deadlocks occur.

On the architectural side, some problems will be much easier to solve if you ignore the layer boundaries. You will end up business logic in the persistence layer.

Don’t get me wrong. All these problems can be solved but the question here is: Is this amount of pain really necessary?

CQRS to the rescue. The basic idea is to use two domain models instead of one. Sounds like more work? That depends.

With CQRS, you will have more code to maintain but the code will be much more simple. There will be more tables and data will be duplicated in the database but there will never be deadlocks, queries won’t need joins in the usual case (you could get rid of all joins if you wanted). So you trade bugs for code.

How does it work? Split your application into two main parts. One part takes user input and turns that into events which are published. Listeners will then process the events.

Some listeners will write the events into the database. If you need to, you will be able to replay these later. Imagine your customer calls you because of some bug. Instead of asking your customer to explain what happened, you go to the database, copy the events into a test system and replay them. It might take a few minutes but eventually, you will have a system which is in the exact same state as when the bug happened.

Some other listeners will process the events and generate more events (which will also be written to the database). Imagine the event “checkout”. It will contain the current content of the shopping cart. You write that into the database. You need to know what was in the shopping basket? Look for this event.

The trick here is that the event is “independent”. It doesn’t contain foreign keys but immutables or value objects. The value objects are written into a new table. That makes sure that when you come back 10 years later, you will see the exact same shopping cart as the customer saw when she ordered.

When you need to display the shopping cart, you won’t need to join 8 tables. Instead, you’ll need to query 1-2 tables for the ID of the shopping cart. One table will have the header with the customer address, the order number, the date, the total and the second table will contain the items. If you wanted, you could add the foreign keys to the product definition tables but you don’t have to. If that’s enough for you, those two tables could be completely independent of any other table in your database.

The code to fill the database gets the event as input (no database access to read anything from anywhere) and it will only write to those two tables. Minimum amount of dependencies.

The code to display the cart will only need to read those two tables. No deadlocks possible.

The code will be incredibly simple.

If you make a mistake somewhere, you can always replay all the events with the fixed code.

For tests, you can replay the events. No need to a human to click buttons in a web browser (not more than once, anyway).

Since you don’t need foreign keys unless you want to, you can spread the data model over different databases, computers, data centers. Some data would be better in a NoSQL repository? No problem.

Something crashes? Fix the problem, replay the events which got lost.

Instead of developing one huge monster model where each change possibly dirties some existing feature, you can imagine CQRS as developing thousands of mini-applications that work together.

And the best feature: It allows you to retroactively add features. Imagine you want to give users credits for some action. The idea is born one year after the action was added. In a traditional application, it will be hard to assign credit to the existing users. With CQRS, you simply implement the feature, set up the listeners, disable the listeners which already ran (so the action isn’t executed again) and replay the events. Presto, all the existing users will have their credit.

Related:


Jazoon 2012: Building Scalable, Highly Concurrent and Fault-Tolerant Systems: Lessons Learned

29. June, 2012

What do Cloud Computing, multi-core processors and Big Data have in common?

Parallelism.

In his presentation, Jonas Bonér showed what you should care about:

  • Always prefer immutable
  • Separate concerns in different layers with the minimum amount of dependencies
  • Separate error handling from the business logic
  • There is no free lunch: For every feature, you will have to pay a price
  • Avoid using RPC/RMI. Try lure you into “convenience over correctness”
  • Make sure you handle timeouts correctly
  • Use CALM if you can
  • Not all your data needs ACID.
  • Know about CAP and BASEDrop ACID And Think About Data
  • Get rid of dependencies by using event sourcing/CQS/CQRS
  • Frameworks like Hibernate always leak in places where you can’t have it. KISS.

Longer explanation:

Immutables can always be shared between threads. Usually, they are also simple to share between processes, even when they run on different computers. Trying locks and clever concurrency will only get you more bugs, unmaintainable code and a heart attack.

Dependencies kill a project faster and more efficiently than almost any other technique. Avoid them. Split your projects into Maven modules. You can’t import what you don’t have on the classpath.

Error handling in your business logic (BL) will bloat the code and make it harder to maintain. Business logic can’t handle database failures. Parameters should have been validated before they were passed to business logic. Business logic should produce a result and the caller should then decide what to do with it (instead of mixing persistence code into your business layer). The BL shouldn’t be aware that the data comes from a database or that the result goes back into a database. What would your unit tests say? See also Akka 2.0 and “parental supervision.”

Obvious programming has a value: You can see what happens. It has a price: Boiler plate code. You can try to hide this but it will still leak. Hibernate is a prefect example for this. Yes, it hides the fact that getChildren() needs to run a query against the database – unless the entity leaks outside of your transaction. It does generate proxies to save you from seeing the query but that can break equals().

Same applies to RMI. When RMI decides that you can’t handle the message, then you won’t even see it. In many cases, a slightly “unusual” message (like one with additional fields) wouldn’t hurt.

As soon as you add RMI or clustering, you add an invisible network in your method calls. Make sure you have the correct timeouts (so your callers don’t block forever) and that you handle them correctly. New error sources that are caused adding the network:

  1. Failure to serialize the message
  2. Host unreachable
  3. Packet drops
  4. Network lag
  5. Destination doesn’t accept message because of configuration error
  6. Message is sent to the wrong destination
  7. Destination can’t read message
Claim checks allow to resend a message again after a timeout without having it processed twice by the consumer.

CALM and BASE refer to the fact that you can only have two of the tree CAP characteristics: Consistency, Availability and Partition Tolerance. Since Partition Tolerance (necessary for scaling) and Availability (what’s the point of having a consistent but dead database?) are most important, you have to sacrifice consistency. CALM and BASE show ways to eventually reach consistency, even without manual intervention. For all data related to money, you will want consistency as well but think about it: How many accounts are there in your database? And how many comments? Is ACID really necessary for each comment?

Solution: Put your important data (when money is involved) into an old school relational database. Single instance. Feed that database with queues, so it doesn’t hurt (much) when it goes down once in a while. Put comments, recommendations, shopping carts into a NoSQL database. So what if a shopping cart isn’t synchronized over all your partitions? Just make sure that users stay on one shard and they will only notice when the shard dies and you can’t restore the shopping cart quickly enough from the event stream.

Which event stream? The one which your CQRS design created. More on that in another post. You might also want to look at Akka 2.0 which comes with a new EventBus.