Jazoon 2012: CQRS – Trauma treatment for architects

4. July, 2012

A few years ago, concurrency and scalability were a hype. Today, it’s a must. But how do you write applications that scale painlessly?

Command and Query Responsibility Segregation (CQRS) is an architectural pattern to address these problems. In his talk, Allard Buijze gave a good introduction. First, some of the problems of the standard approach. Your database, everyone says, must be normalized.

That can lead to a couple of problems:

  • Historic data changes
  • The data model is neither optimized for writes nor for queries

The first problem can result in a scenario like this. Imagine you have a report that tells you the annual turnover. You run the report for 2009 in January, 2010. You run the same report again in 2011 and 2012 and each time, the annual turnover of 2009 gets bigger. What is going on?

The data model is in third normal form. This is great, no data duplication. It’s not so great when data can change over time. So if your invoices point to the products and the products point to the prices, any change of a price will also change all the existing invoices. Or when customers move, all the addresses on the invoices change. There is no way to tell where you sent something.

The solution is to add “valid time range” to each price, address, …, which makes your SQL hideous and helps to keep your bug tracker filled.

It will also make your queries slow since you will need lots and lots of joins. These joins will eventually get in conflict with your updates. Deadlocks occur.

On the architectural side, some problems will be much easier to solve if you ignore the layer boundaries. You will end up business logic in the persistence layer.

Don’t get me wrong. All these problems can be solved but the question here is: Is this amount of pain really necessary?

CQRS to the rescue. The basic idea is to use two domain models instead of one. Sounds like more work? That depends.

With CQRS, you will have more code to maintain but the code will be much more simple. There will be more tables and data will be duplicated in the database but there will never be deadlocks, queries won’t need joins in the usual case (you could get rid of all joins if you wanted). So you trade bugs for code.

How does it work? Split your application into two main parts. One part takes user input and turns that into events which are published. Listeners will then process the events.

Some listeners will write the events into the database. If you need to, you will be able to replay these later. Imagine your customer calls you because of some bug. Instead of asking your customer to explain what happened, you go to the database, copy the events into a test system and replay them. It might take a few minutes but eventually, you will have a system which is in the exact same state as when the bug happened.

Some other listeners will process the events and generate more events (which will also be written to the database). Imagine the event “checkout”. It will contain the current content of the shopping cart. You write that into the database. You need to know what was in the shopping basket? Look for this event.

The trick here is that the event is “independent”. It doesn’t contain foreign keys but immutables or value objects. The value objects are written into a new table. That makes sure that when you come back 10 years later, you will see the exact same shopping cart as the customer saw when she ordered.

When you need to display the shopping cart, you won’t need to join 8 tables. Instead, you’ll need to query 1-2 tables for the ID of the shopping cart. One table will have the header with the customer address, the order number, the date, the total and the second table will contain the items. If you wanted, you could add the foreign keys to the product definition tables but you don’t have to. If that’s enough for you, those two tables could be completely independent of any other table in your database.

The code to fill the database gets the event as input (no database access to read anything from anywhere) and it will only write to those two tables. Minimum amount of dependencies.

The code to display the cart will only need to read those two tables. No deadlocks possible.

The code will be incredibly simple.

If you make a mistake somewhere, you can always replay all the events with the fixed code.

For tests, you can replay the events. No need to a human to click buttons in a web browser (not more than once, anyway).

Since you don’t need foreign keys unless you want to, you can spread the data model over different databases, computers, data centers. Some data would be better in a NoSQL repository? No problem.

Something crashes? Fix the problem, replay the events which got lost.

Instead of developing one huge monster model where each change possibly dirties some existing feature, you can imagine CQRS as developing thousands of mini-applications that work together.

And the best feature: It allows you to retroactively add features. Imagine you want to give users credits for some action. The idea is born one year after the action was added. In a traditional application, it will be hard to assign credit to the existing users. With CQRS, you simply implement the feature, set up the listeners, disable the listeners which already ran (so the action isn’t executed again) and replay the events. Presto, all the existing users will have their credit.

Related:


Peace Between Java and SQL

23. November, 2011

There are various attempts to get Java and SQL to behave with each other. We have JDBC, OR mappers like Hibernate and EclipseLink, language support like in Groovy. All of those have advantages and drawbacks.

JDBC is powerful but low-level. The API is not really friendly. You need to write a lot of boiler plate code for even simple tasks.

Languages like Groovy wrap JDBC to make simple tasks simple. The code becomes much more readable but changes in the database schema become runtime errors.

OR mappers try to turn a relational database into a OO database. It works better than you’d expect but it also causes odd problems and leaks into design of your code: You must no’t use the ID field in equals, hiding the session in a thread-local variable can cause exceptions when you use lazy loading, failing to understand the requirements of the OR mapper causes spurious bugs. At least the OR mappers will complain when the schema changes.

Enter jOOQ. It’s like a OO wrapper for JDBC:

  • You get all the power of JDBC if you need it
  • The readability of a fluent interface
  • The database schema is part of the code (so you get compile time errors if it changes)
  • You can iterate over results as if they were a plain Java collection

Related:


Jazoon 2011, Day 2 – NoSQL – Schemaless Data-stores Not Only for the Cloud – Thomas Schank

26. June, 2011

NoSQL – Schemaless Data-stores Not Only for the Cloud – Thomas Schank

Thomas gave an overview of some NoSQL databases and the theoretical background of it.

The main points are: SQL databases get inefficient as the data grows and if you need to split the data between instances (how do you join tables between two DB servers? Even if you can, performance is going to suffer).

But there are new problems: Data can be inconsistent for a while (keyword: MVCC).

OTOH, these databases don’t need locks and, as Amazon demonstrated, any kind of lock will eventually become a bottleneck:

Each node in a system should be able to make decisions purely based on local state. If you need to do something under high load with failures occurring and you need to reach agreement, you’re lost. If you’re concerned about scalability, any algorithm that forces you to run agreement will eventually become your bottleneck. Take that as a given.

Werner Vogels, Amazon CTO and Vice President

And since the servers can heal inconsistencies, broken connections don’t mean the end of the world. The acronym of the hour is BASE: Basically Available, Soft-state, Eventually consistent.

Interesting stuff, especially since every company owns a super-computer today. My own team has one with 64 cores, 64GB of RAM and 16TB of disk space sitting in 8 boxes spread under the desks of the developers. Not much but it only costs $8 000. And if we need more, we can simply buy another node. Unfortunately, it’s hard to leverage this power. I’ll come back to that in a later post.

One thing to take with you: If you can’t stand JavaScript but you need to write it, have a look at CoffeeScript.

Links:


Jazoon 2011, Day 1 – Opening Keynote

26. June, 2011

Opening Keynote

The opening keynote was “Platforms in the Cloud: Where Will Your Next Application Run?” by David Chappell. He put a lot of the bits and piece of cloud computing into perspective: Private and public clouds, when a cloud makes sense and why people use clouds. Some use it because it’s a way to avoid their own IT which says a lot. He also put a couple of frameworks and products next to each other to make it more easy to see through all the fog.

Personally, I agree with him: Cloud computing is the next step. It solves one of the basic problems in computers today: You have too much computer power when you don’t need it and too little when you do.

Actually, I hope that CC won’t only make life easier for the business but also for developers. More on that in my next installment of TNBT – The Next Best Thing.

Some highlights from the talk:

Start-ups need to fail fast or scale fast. So clouds are perfect for them: Cheap, salable.

In the long run PaaS (Platform as a Service) will win over IaaS (Infrastructure as a Service). There are already many companies which offer PaaS by tailoring an IaaS VM to do what you need.

When it comes to NoSQL, that means “not only SQL”. Most applications need a mix of SQL and non-SQL data sources. For example an MP3 cloud player will keep the song titles and other meta data in an SQL table (so you can easily sort and search) but the songs will be in a non-SQL storage.

Another use case for cloud computing is off-site backup. That puts your data at risk to being copied but which is more hazardous for your company: That a competitor might be able to break the encryption or that the data is lost forever? If you lose your business data, you’ll probably bankrupt faster.

I talked to him after the presentation and he made an odd comment about open source (“Oh, you’re one of those open source guys. Don’t you have children to feed?”) My guess is he makes the same mistake as many people: Free software is free as in freedom, not as in beer. You can change it but there is no reason to give it away for free. Some people do but that only means they have some other means to generate revenue.


Logging JDBC with slf4j

7. October, 2010

 

This chart represents several constituent comp...

Image via Wikipedia

 

If you use slf4j in your project and need to log JDBC events, then have a look at log4jdbc. It has all the usual features: Timing statements, showing all arguments, mapping SQL to connections. It can even log stack traces if you need to know where a JDBC call was made. Sample output:

16:32:56.162 [INFO ] jdbc.connection - 1. Connection opened  java.sql.DriverManager.getConnection(DriverManager.java:525)
16:32:56.168 [DEBUG] jdbc.connection - open connections:  1 (1)
16:32:56.169 [DEBUG] jdbc.audit - 1. Connection.new Connection returned   java.sql.DriverManager.getConnection(DriverManager.java:525)
16:32:56.284 [DEBUG] jdbc.audit - 1. PreparedStatement.new PreparedStatement returned   com.avanon.basic.db.XPreparedStatement.prepare(XPreparedStatement.java:84)
16:32:56.292 [DEBUG] jdbc.audit - 1. Connection.prepareStatement(SELECT *
FROM V_RCSA_40_4_CTRL_RATING) returned net.sf.log4jdbc.PreparedStatementSpy@423606  com.avanon.basic.db.XPreparedStatement.prepare(XPreparedStatement.java:84)
16:32:56.162 [INFO ] jdbc.connection - 1. Connection opened  java.sql.DriverManager.getConnection(DriverManager.java:525)
16:32:56.168 [DEBUG] jdbc.connection - open connections:  1 (1)
16:32:56.169 [DEBUG] jdbc.audit - 1. Connection.new Connection returned   java.sql.DriverManager.getConnection(DriverManager.java:525)
16:32:56.284 [DEBUG] jdbc.audit - 1. PreparedStatement.new PreparedStatement returned   com.avanon.basic.db.XPreparedStatement.prepare(XPreparedStatement.java:84)
16:32:56.292 [DEBUG] jdbc.audit - 1. Connection.prepareStatement(SELECT *FROM V_RCSA_40_4_CTRL_RATING) returned net.sf.log4jdbc.PreparedStatementSpy@423606  com.avanon.basic.db.XPreparedStatement.prepare(XPreparedStatement.java:84)
16:32:56.342 [DEBUG] jdbc.audit - 1. PreparedStatement.clearParameters() returned   com.avanon.basic.db.SetParameters.clear(SetParameters.java:144)
16:32:56.343 [DEBUG] jdbc.sqlonly -  com.avanon.basic.db.XPreparedStatement.executeQuery(XPreparedStatement.java:71)
1. SELECT * FROM V_RCSA_40_4_CTRL_RATING
16:32:56.350 [INFO ] jdbc.sqltiming - SELECT * FROM V_RCSA_40_4_CTRL_RATING  {executed in 7 msec}
16:32:56.356 [DEBUG] jdbc.audit - 1. PreparedStatement.executeQuery() returned net.sf.log4jdbc.ResultSetSpy@2c5444  com.avanon.basic.db.XPreparedStatement.executeQuery(XPreparedStatement.java:71)
16:32:56.412 [DEBUG] jdbc.audit - 1. PreparedStatement.close() returned   com.avanon.basic.db.DBUtil.close(DBUtil.java:114)
16:32:56.418 [INFO ] jdbc.connection - 1. Connection closed  com.avanon.basic.db.DBUtil.close(DBUtil.java:129)
16:32:56.418 [DEBUG] jdbc.connection - open connections:  none

The strange “1.” is the connection number.

Alternatively, check out jdbcdslog.


Extremely complex queries in BIRT reports

30. September, 2010

Sometimes, SQL just isn’t enough. What do you do when you just can’t get the SQL to run in under an hour while a little piece of Java code does the same query in a few seconds? Put the result into a report table and then run the report against that table.


Finding unindexed foreign key columns

15. December, 2009

If you’re using Oracle and you have tables with foreign keys, then you must remember to add indexes to all the columns in the referenced tables (i.e. the foreign tables). If your schema has more than two tables, it’s hard to make sure all the necessary indexes exist. Fret no more and let Oracle do (most of) the work for you:

select table_name, constraint_name,
       cname1 || nvl2(cname2,','||cname2,null) ||
       nvl2(cname3,','||cname3,null) || nvl2(cname4,','||cname4,null) ||
       nvl2(cname5,','||cname5,null) || nvl2(cname6,','||cname6,null) ||
       nvl2(cname7,','||cname7,null) || nvl2(cname8,','||cname8,null)
              columns
    from ( select b.table_name,
                  b.constraint_name,
                  max(decode( position, 1, column_name, null )) cname1,
                  max(decode( position, 2, column_name, null )) cname2,
                  max(decode( position, 3, column_name, null )) cname3,
                  max(decode( position, 4, column_name, null )) cname4,
                  max(decode( position, 5, column_name, null )) cname5,
                  max(decode( position, 6, column_name, null )) cname6,
                  max(decode( position, 7, column_name, null )) cname7,
                  max(decode( position, 8, column_name, null )) cname8,
                  count(*) col_cnt
             from (select substr(table_name,1,30) table_name,
                          substr(constraint_name,1,30) constraint_name,
                          substr(column_name,1,30) column_name,
                          position
                     from sys.user_cons_columns ) a,
                  sys.user_constraints b
            where a.constraint_name = b.constraint_name
              and b.constraint_type = 'R'
            group by b.table_name, b.constraint_name
         ) cons
   where col_cnt > ALL
           ( select count(*)
               from sys.user_ind_columns i
              where i.table_name = cons.table_name
                and i.column_name in (cname1, cname2, cname3, cname4,
                                      cname5, cname6, cname7, cname8 )
                and i.column_position <= cons.col_cnt
              group by i.index_name
           )

Isn’t it a beauty? Thanks to Tom.