Jazoon 2012: Building Scalable, Highly Concurrent and Fault-Tolerant Systems: Lessons Learned

29. June, 2012

What do Cloud Computing, multi-core processors and Big Data have in common?

Parallelism.

In his presentation, Jonas Bonér showed what you should care about:

  • Always prefer immutable
  • Separate concerns in different layers with the minimum amount of dependencies
  • Separate error handling from the business logic
  • There is no free lunch: For every feature, you will have to pay a price
  • Avoid using RPC/RMI. Try lure you into “convenience over correctness”
  • Make sure you handle timeouts correctly
  • Use CALM if you can
  • Not all your data needs ACID.
  • Know about CAP and BASEDrop ACID And Think About Data
  • Get rid of dependencies by using event sourcing/CQS/CQRS
  • Frameworks like Hibernate always leak in places where you can’t have it. KISS.

Longer explanation:

Immutables can always be shared between threads. Usually, they are also simple to share between processes, even when they run on different computers. Trying locks and clever concurrency will only get you more bugs, unmaintainable code and a heart attack.

Dependencies kill a project faster and more efficiently than almost any other technique. Avoid them. Split your projects into Maven modules. You can’t import what you don’t have on the classpath.

Error handling in your business logic (BL) will bloat the code and make it harder to maintain. Business logic can’t handle database failures. Parameters should have been validated before they were passed to business logic. Business logic should produce a result and the caller should then decide what to do with it (instead of mixing persistence code into your business layer). The BL shouldn’t be aware that the data comes from a database or that the result goes back into a database. What would your unit tests say? See also Akka 2.0 and “parental supervision.”

Obvious programming has a value: You can see what happens. It has a price: Boiler plate code. You can try to hide this but it will still leak. Hibernate is a prefect example for this. Yes, it hides the fact that getChildren() needs to run a query against the database – unless the entity leaks outside of your transaction. It does generate proxies to save you from seeing the query but that can break equals().

Same applies to RMI. When RMI decides that you can’t handle the message, then you won’t even see it. In many cases, a slightly “unusual” message (like one with additional fields) wouldn’t hurt.

As soon as you add RMI or clustering, you add an invisible network in your method calls. Make sure you have the correct timeouts (so your callers don’t block forever) and that you handle them correctly. New error sources that are caused adding the network:

  1. Failure to serialize the message
  2. Host unreachable
  3. Packet drops
  4. Network lag
  5. Destination doesn’t accept message because of configuration error
  6. Message is sent to the wrong destination
  7. Destination can’t read message
Claim checks allow to resend a message again after a timeout without having it processed twice by the consumer.

CALM and BASE refer to the fact that you can only have two of the tree CAP characteristics: Consistency, Availability and Partition Tolerance. Since Partition Tolerance (necessary for scaling) and Availability (what’s the point of having a consistent but dead database?) are most important, you have to sacrifice consistency. CALM and BASE show ways to eventually reach consistency, even without manual intervention. For all data related to money, you will want consistency as well but think about it: How many accounts are there in your database? And how many comments? Is ACID really necessary for each comment?

Solution: Put your important data (when money is involved) into an old school relational database. Single instance. Feed that database with queues, so it doesn’t hurt (much) when it goes down once in a while. Put comments, recommendations, shopping carts into a NoSQL database. So what if a shopping cart isn’t synchronized over all your partitions? Just make sure that users stay on one shard and they will only notice when the shard dies and you can’t restore the shopping cart quickly enough from the event stream.

Which event stream? The one which your CQRS design created. More on that in another post. You might also want to look at Akka 2.0 which comes with a new EventBus.


Peace Between Java and SQL

23. November, 2011

There are various attempts to get Java and SQL to behave with each other. We have JDBC, OR mappers like Hibernate and EclipseLink, language support like in Groovy. All of those have advantages and drawbacks.

JDBC is powerful but low-level. The API is not really friendly. You need to write a lot of boiler plate code for even simple tasks.

Languages like Groovy wrap JDBC to make simple tasks simple. The code becomes much more readable but changes in the database schema become runtime errors.

OR mappers try to turn a relational database into a OO database. It works better than you’d expect but it also causes odd problems and leaks into design of your code: You must no’t use the ID field in equals, hiding the session in a thread-local variable can cause exceptions when you use lazy loading, failing to understand the requirements of the OR mapper causes spurious bugs. At least the OR mappers will complain when the schema changes.

Enter jOOQ. It’s like a OO wrapper for JDBC:

  • You get all the power of JDBC if you need it
  • The readability of a fluent interface
  • The database schema is part of the code (so you get compile time errors if it changes)
  • You can iterate over results as if they were a plain Java collection

Related:


Jazoon 2010, day 2

7. June, 2010

This is my report of day 2 (see my posts about day 1).

Total Cost of Ownership by Ken Schwaber

This talk was basically about Scrum and the fact that you can’t get something of value for free. Or to put it another way around: If you save some time today by quickly hacking up a feature, you’re gonna pay in the future. There is even an interest on this, so the later you have to pay, the more expensive it will get.

So the next time your boss asks you to do something “quick”, ask him whether (s)he is aware of the total cost and whether (s)he is really willing to pay it.

Unleash your processor(s) by Václav Pech

We all know that CPU’s don’t get faster, they just reproduce faster. PCs sold today have 4 cores (and each core can execute two threads at the same time). In 2012, intel is planning to release a 50 core chip and that’s just peanuts to what you can find on your graphics card (which can have roughly 600 very simple CPUs on a single chip).

The main problem here is that we, as humans, are extremely good at parallel processing at the hardware level (most of our body continues to works while we talk, think, eat, etc.) but we’re extremely bad when thinking about parallel problems.

Concepts from the HPC world and mainframes come to the rescue: Actors, Fork/Join, Parallel Arrays, Agents and Dataflow.

The idea is to get away from the tedious synchronization and use data structures which are already thread-safe and then write simple algorithms which are invoked by a framework on an as-needed basis. Imagine you have a huge amount of images to scale to thumbnails. The algorithm is always the same and it works independent of the input. So you can allocate a number of generic worker threads. Each of them gets a copy of the algorithm at runtime plus the arguments (image and thumbnail file name).

Then you have an algorithm to traverse the directory tree which produces the input and output file names. Instead of doing everything yourself, you take a parallel array and add file names as your tree workers discover them. This will trigger the thumbnail workers.

The interesting thing here: No synchronization. You don’t even write the threads. All you do is a single call:

inParallel (filenames, thumbnailer);

The thumbnailer is just as simple:

public class Thumbnailer extends Actor {
    public void act (Object item) {
        File input = ((File[])item)[0];
        File output = ((File[])item)[0];
        ... insert favorite scaler here ...
    }
}

See? No synchronized, volatile or extends Thread. Can’t wait? Check out JSR-166y

JavaFX: Designer developer workflow by Martin Gunnarsson and Pär Sikö

Tough one since I couldn’t decide where to go. HTML 5 with WebSockets? Maven 3.0?

Mouth-wetting talk about what you could do with software if everyone was just a little bit more open. In the talk, they showed how you could draw something in Photoshop and then export the design and use it directly in JavaFX.

It also showed some of the new features of JavaFX 1.3 which seems to follow the historical model of Java: 1.2 is the first version which is really usable. But it’s nice to see some progress in the Java world at last. I just wished we’d have got these five years ago when it would have mattered :-)

Lunch break. :-)

JavaFX – The condemned live longer by Andreas Fürer, David Sauter and Daniel Seiler

Along the same lines as the previous talk but this time, it shows some of the dark sides of JavaFX. Mostly boils down to: If you want to do fancy graphics in JavaFX, just do it. If you want to use it for more traditional UIs , then think again. Everything but the most simple components are still missing and those which exist sometimes have ugly bugs. :-(

My conclusion: Immature, oversold technology (or in managese: Sun’s bold response to the threads imposed by RCP and Flex/AIR).

Patterns and Practices in Hibernate by Patrycja Wegrzynowicz

Hibernate might be the most successful OR mapper for Java but it’s not the most simple (which is partly because of the documentation and that the problem itself isn’t very simple). I own the standard book about Hibernate, too, and I can agree that it explains in detail all the great features of Hibernate but it doesn’t answer “Why would I use that?”

Patrycja did a code review of the examples in the book and came up with lots of small problems and a couple of major ones (like code which doesn’t lock the rows in the correct order leading to an illegal state in the database). I talked to her and she wants to put the results of her code review online. This would mean we’d get a project with correct examples for using Hibernate.

She also mentioned SOLID which is an acronym made up from acronyms and stands for:

S – SRP (Single responsibility principle),
O – OCP (Open/closed principle),
L – LSP (Liskov substitution principle),
I – ISP (Interface segregation principle),
D – DIP (Dependency inversion principle)

This blog also explains it very well.

Migration to JPA – real life experience by Jan Sliwa

It’s always interesting when marketing hits the real world and all those buzz words are stripped to the bones. Jan talked about how to build a Java application which connects data centers all over Europe which contain sensitive data (medical records). To make the data secure, they applied a simple solution: The personal data is stored on the computers of the responsible doctor and only the medical files are saved on the servers. This means that the medical data itself is anonymous.

Two of the problems they encountered were:

  1. Creating an EntityManagerFactory is expensive. When do I open/close one? Is one enough for the whole application? Do I need a pool?
  2. How do I know whether an object is detached from the session?

He also talked about problems during testing. Maybe he should read my blog more often :-)

Managed JPA in an OSGi framework – getting the best of both worlds by Tim Ward

OSGi is a framework for the paranoid. By default, it hides everything. So how do you expose your model to both the JPA framework and all the other places where it is used?

Tim explains the problems they encountered and how they solved them. My conclusion: For your problem, OSGi is not the solution. Spring and similar frameworks have shown how to do DI properly and Maven has shown how to handle dependencies. OSGi more and more feels like a remnant from the cold war where no one trusted anyone.

That’s for all for day 2. Next: day 3.


Testing the Impossible: Shifting IDs

7. October, 2009

So you couldn’t resist and wrote tests against the database. With Hibernate, no less. And maybe some Spring. And to make things more simple, toString() contains the IDs of the objects in the database. Now, you want to check the results of queries and other operations using my advice how to verify several values at once.

During your first run, your DB returns some values, you copy them into a String. When you run the tests again, the tests fail because all the IDs have changed. Bummer. Now what? Just verify that the correct number of results was returned?

Don’t. assertEquals (3, list.size()) doesn’t give you a clue where to search for the error when the assert fails.

The solution is to give each ID a name. The name always stays the same and when you look at the test results, you’ll immediately know that the ID has been “fixated” (unlike when you’d replace the number with another, for example). Here is the code:

import java.util.*;

/** Helper class to fixate changing object IDs */
public class IDFixer
{
    private Map<String, String> replacements = new HashMap<String, String> ();
    private Map<String, Exception> usedNames = new HashMap<String, Exception> ();
    
    /** Assign a name to the number which follows the string <code>id=</code> in the object's
     * <code>toString()</code> method. */
    public void register (Object o, String name)
    {
        if (o == null)
            return;
        
        String s = o.toString ();
        int pos = s.indexOf ("id=");
        if (pos != -1)
            pos = s.indexOf (',', pos);
        if (pos == -1)
            throw new RuntimeException ("Can't find 'id=' in " + s);
        String search = s.substring (0, pos + 1);
        
        pos = search.lastIndexOf ('=');
        String replace = search.substring (0, pos + 1) + name + ",";
        
        add (name, search, replace);
        
        registerSpecial (o, name);
    }

    /** Add a search&replace pattern. */
    protected void add (String name, String search, String replace)
    {
        if (usedNames.containsKey (replace))
            throw new RuntimeException ("Name "+name+" is already in use", usedNames.get (replace));
        
        //System.out.println ("+++ ["+search+"] -> ["+replace+"]");
        usedNames.put (replace, new Exception ("Name was registered here"));
        replacements.put (search, replace);
    }
    
    /** Allow for special mappings */
    protected void registerSpecial (Object o, String name)
    {
        // NOP
    }

    /** Turn a <code>Collection</code> into a <code>String</code>, replacing all IDs with names. */
    public String toString (Collection c)
    {
        StringBuilder buffer = new StringBuilder (10240);
        String delim = "";
        for (Object o: c)
        {
            buffer.append (delim);
            delim = "\n";
            buffer.append (o);
        }
        return toString (buffer);
    }
    
    /** Turn a <code>Map</code> into a <code>String</code>, replacing all IDs with names. */
    public String toString (Map m)
    {
        StringBuilder buffer = new StringBuilder (10240);
        String delim = "";
        for (Iterator iter=m.entrySet ().iterator (); iter.hasNext (); )
        {
            Map.Entry e = (Map.Entry)iter.next ();
            
            buffer.append (delim);
            delim = "\n";
            buffer.append (e.getKey ());
            buffer.append ('=');
            buffer.append (e.getValue ());
        }
        return toString (buffer);
    }
    
    /** Turn an <code>Object</code> to a <code>String</code>, replacing all IDs with names.
     * 
     *  <p>If the object is a <code>Collection</code> or a <code>Map</code>, the special collection handling methods will be called. */
    public String toString (Object o)
    {
        if (o instanceof Collection)
        {
            return toString ((Collection)o);
        }
        else if (o instanceof Map)
        {
            return toString ((Map)o);
        }
        
        if (o == null)
            return "null";

        String s = o.toString ();
        for (Map.Entry<String, String> entry: replacements.entrySet ())
        {
            s = s.replace (entry.getKey (), entry.getValue ());
        }
        
        return s;
    }

    public boolean knowsAbout (String key)
    {
        return replacements.containsKey (key);
    }
}

To use the IDFixer, call register(object, name) to assign a name to whatever follows the number after id=. The method will search for this substring in the result of calling object.toString().

If you call IDFixer.toString(object) for a single object or a collection, it will replace id=number with id=name everywhere. If you have references between objects, you can register additional pattern to be replaced by overriding registerSpecial().

To make it easier to compare lists and maps in assertEquals(), each item gets its own line.


Traits for Groovy/Java

25. June, 2009

I’m again toying with the idea of traits for Java (or rather Groovy). Just to give you a rough idea if you haven’t heard about this before, think of my Sensei application template:

class Knowledge {
    Set tags;
    Knowledge parent;
    List children;
    String name;
    String content;
}
class Tag { String name; }
class Relation { String name; Knowledge from, to;

A most simple model but it contains everything you can encounter in an application: Parent-child/tree structure, 1:N and N:M mappings. Now the idea is to have a way to build a UI and a DB mapping from this code. The idea of traits is to implement real properties in Java.

So instead of fields with primitive types, you have real objects to work with:

    assert "name" == Knowledge.name.getName()

These objects exist partially at the class and at the instance level. There is static information at the class level (the name of the property) and there is instance information (the current value). But it should be possible to add more information at both levels. So a DB mapper can add necessary translation information to the class level and a Hibernate mapper can build on top of that.

Oh, I hear you cry “annotations!” But annotations can suck, too. You can’t have smart defaults with annotations. For example, you can’t say “I want all fields called ‘timestamp’ to be mapped with an java.sql.Timestamp“. You have to add the annotation to each timestamp field. That violates DRY. It quickly gets really bad when you have to do this for several mappers: Database, Hibernate, JPA, the UI, Swing, SWT, GWT. Suddenly, each property would need 10+ annotations!

I think I’ve found a solution which should need relatively few lines of code with Groovy. I’ll let that stew for a couple of days in by subconscious and post another article when it’s well done :)


Jazoon: Distributed Client/Server Persistence

26. June, 2008

In his talk, Alexander Snaps presented a framework called Hölchoko which allows to cache objects from the server on the client. This is a bit like Gears but for Hibernate. No magic bullet, just a layer over the OR mapper to push objects over the wire, cache them in a local DB and make the merge with the server more simple once you’re connected again.

See his blog for more details.


What’s Wrong With Java Part 2

21. July, 2007

OR Mapping With Hibernate

After the model, let’s look at the implementation. The first candidate is the most successful OR mapper combination in the Java world: Hibernate.

Hibernate brings all the features we need: It can lazy-load ordered and unordered data sets from the DB, map all kinds of weird relations and it lets us use Java for the model in a very comfortable way: We just plain Java (POJO‘s actually) and Hibernate does some magic behind the scenes that connects the objects to the database. What could be more simple?

Well, an OO language which is more dynamic, for example. Let’s start with a simple task: Create a standalone keyword and put that into the DB. This is simple enough:

// Saving <tt>Keyword</tt> in database
Keyword kw = new Keyword();
kw.setType (Keyword.KEYWORD);
kw.setName ("test");

session.save (kw);

(Please ignore the session object for now.)

That was easy, wasn’t it? If you look at the log, you’ll see that Hibernate sent an INSERT statement to the DB. Cool. So … how do we use this new object? The first, most natural idea, would be to use the object we just saved:

// Saving <tt>Knowledge</tt> with a keyword in the database
Knowledge k = new Knowledge ();
k.addKeyword (kw);

session.save (k);

Unfortunately, this doesn’t work. It does work in your test but in the final application, the Keyword is created in the first transaction and the Knowledge in the second one. So Hibernate will (rightfully) complain that you can’t use that keyword anymore because someone else might have changed it.

Now, what? You have to ask Hibernate for a copy of every object after you closed the transaction in which you created it before you can use it anywhere else:

   1:
   2:
   3:
   4:
   5:
   6:
   7:
   8:
   9:
  10:
  11:
Keyword kw = new Keyword();
kw.setType (Keyword.KEYWORD);
kw.setName ("test");

session.save (kw);
kw = dao.loadById (kw.getId ());

Knowledge k = new Knowledge ();
k.addKeyword (kw);

session.save (k);

How to save Knowledge with a keyword in the database with transactions

Why do we have to load an object after just saving it? Well … because of Java. Java has very strict rules what you can do with (or to) an object instance after it has been created. One of them is that you can’t replace methods. So what, you’d think. In our case, things aren’t that simple. In our model, the name of a Knowledge instance is a Keyword. When you look at the code, you’ll see the standard setter. But when you run it, you’ll see that someone loads the item from the KEYWORD table. What is going on?

   1:
   2:
   3:
public void setName (Keyword name) {
    this.name = name;
}

setName() method

Behind the scenes, Hibernate replaces this method by using a proxy object, so it can notice when you change the model (setting a new name). The most simple soltuion would be to replace the method setName() in session.save() with calls the original setter and notifies Hibernate about the modification. In Python, that’s three lines of code. Unfortunately, this is impossible in Java.

So to get this proxy objects, you must show an object to Hibernate, let it make a copy (by calling save()) and then ask for the new copy which is in fact a wrapper object that behaves just like your original object but it also knows when to send commands to the database. Simple, eh?

Makes me wonder why session.save() doesn’t simply return the new object when it is more safe to use it from now on … especially when you have a model which is modified over several transactions. In this case, you can easily end up with a mix of native and proxy objects which will cause no end of headache.

Anyway. This approach has a few drawbacks:

  • If someone else creates the object, calls your code and then continues to do something with the original object (because people usually don’t expect methods to replace objects with copies when they call them), you’re in deep trouble. Usually, you can’t change that other code. You loose. Go away.
  • The proxy object is very similar but not the same as the original object. The biggest difference is that it has a different class. This means, in equals(), you can’t use this.getClass == other.getClass(). Instead, you have to use instanceof (the copy is derived from the original class). This breaks the contract of equals() which says that it must be symmetric.
  • If you have large, complex objects, copying them is expensive.
  • After a while, you will start to write factory methods that create the objects for you. The code is always the same: Create a simple object, save it, load it again and then return the copy. Apart from cut&paste, this means that you must not call new for some of your objects. Again, this breaks habits which leads to bugs.

All in all, the whole approach is clumsy. Really, it’s not Hibernate’s fault but the code is still ugly, hard to maintain (because it breaks the implicit rules we have become so used to). In Python, you just create the object and use it. The dynamic nature of Python allows the OR mapper to replace or wrap all the methods as it needs to and you never notice it. The code is clean, easy to understand and compact.

Another problem are the XML config files. Besides all the issues with Java XML parsers, it is always problematic to store the same information in two places. If you ever change your Java model, you better not forget to update the XML or you will get strange errors. You can’t refactor the model classes anymore because there is code outside the scope of your refactoring tool. And let’s not forget code completion which works pretty good for Java. Not so for XML files. If you’re lucky, someone has written a code completion for your type of XML config. Still, there will be problems. If there is a new version, your code completion will lag behind.

It’s like regexp: Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. — Jamie Zawinski

Fortunately, Sun solved this problem with JPA (or at least eased the pain). JPA allows to use annotations to store the mapping configuration in the class file itself. Apart from a few small problems (like setting up everything), this works pretty well. Code completion works perfectly because any IDE which has code completion will be able to use the latest and greatest version of your helper JARs without any customization. Just drop the new JAR in your classpath and you’re ready to do. Swell.

But there are more problems:

  • You must create a session object “somewhere” and hand it around. If you’re writing a webapp, this better be thread-safe. Not to mention you must be able to override this for tests.
  • The session object must track if you have already started a transaction and nest them properly or you will have to duplicate code because you can’t call existing methods if they use transactions.
  • Spring and AOP will help a lot but they also add another layer of complexity, you’ll have to learn another API, another set of rules how to organize your code, etc.
  • JAR file-size. My code is 246KB. The JARs it depends on take … 6’096KB, more than 40 times of my code. And I’m not even using Spring.
  • Even with JPA, Hibernate is not simple to use because Java itself is not simple to use.

In the end, the model was 5’400 LoC. A added a small UI to it using SWT/JFace which added 2’400 LoC.

If you look at the model in the previous installment, then the question is: Why do I need 5’000 LoC to write a program which implements an OR mapper for a model which has only three classes and 26 lines of code?

Granted, test cases and helper code take their toll. I could accept that this code needs four or five times the size of the model itself. Still, we have a gap.

The answer is that there are no or bad defaults. For our simple case, Hibernate could guess everything. Java could generate all the setters and getters, equals() and hashCode(). It’s no black magic to figure out that Relation has a reference to Knowledge so there needs to be a database table which stores this information. Sadly, defaults in Java are always “safe” rather than “clever”. This is the main difference to newer languages. They try to guess most of the stuff and then, you can fix those few exceptions that you always have. With Java, all the exceptions are handled but you have to do everyday stuff yourself.

The whole experience was frustrating, especially since I’m a seasoned Java developer. It took me almost two weeks to write the code for this small model mostly because because of a bug in Hibernate 3.1 and because I couldn’t get my mind around the existing documentation. Also, parent-child relations were poorly documented in the first Hibernate book. The second book explains this much better.

Conclusion: Use it if you must. Today, there are better ways.

Next stop: TurboGears, a Python web framework using SQL Objects.


Follow

Get every new post delivered to your Inbox.

Join 336 other followers