Rating of my Talk

30. July, 2007

The rating of my talk at the Jazoon just came in: 2.74 on a scale from 1 to 5. That’s even below average (3 would be average). Hm. Okay, I was sick and tried to put too much information into my 40 minutes. Anything else I can do better next time?


Quickly disabling tests

25. July, 2007

Ever needed to disable all (or most) tests in a JUnit test case?

How about this: Using the editor of your choice, search for “void test” and replace all of them with “void dtest” (“d” as in disabled). Now, you can simply enable the few tests you need to run by deleting the “d” again.

I’m also using “x” to take out tests that won’t run for a while. Using global search in the whole project, it’s also simple to find them again just in case you’re wondering if there are any disabled tests left.


Testing BIRT

23. July, 2007

I’m a huge fan of TDD. Recently, I had to write tests for BIRT, specifically for a bug we’ve stumbled upon in BIRT 2.1 that has been fixed in 2.2: Page breaks in tables.

The first step was to setup BIRT so I can run it from my tests.

public IReportEngine getEngine () throws BirtException
{
    EngineConfig config = new EngineConfig();
    config.setLogConfig("/tmp/birt-log", Level.FINEST);
    
    // Path to the directory which contains "platform"
    config.setEngineHome(".../src/main/webapp");
    PlatformConfig pc = new PlatformConfig ();
    pc.setBIRTHome(basepath);
    PlatformFileContext context = new PlatformFileContext(pc);
    config.setPlatformContext(context);
    
    Platform.startup(config);
    
    IReportEngineFactory factory = (IReportEngineFactory) Platform
    .createFactoryObject(IReportEngineFactory
        .EXTENSION_REPORT_ENGINE_FACTORY);
    if (factory == null)
 throw new RuntimeException ("Couldn't create factory");
    
    return factory.createReportEngine(config);
}

My main problems here: Find all the parts necessary to install BIRT, copy them to the right places and find out how to setup EngineConfig (especially the platform part).

public void renderPDF (OutputStream out, File reportDir,
        String reportFile, Map reportParam) throws EngineException
{
    File f = new File (reportDir, reportFile);
    final IReportRunnable design = birtReportEngine
        .openReportDesign(f.getAbsolutePath());
    //create task to run and render report
    final IRunAndRenderTask task = birtReportEngine
        .createRunAndRenderTask(design);
    
    // Set parameters for report
    task.setParameterValues(reportParam);
    
    //set output options
    final HTMLRenderOption options = new HTMLRenderOption();
    options.setOutputFormat(HTMLRenderOption.OUTPUT_FORMAT_PDF);
    options.setOutputStream(out);
    task.setRenderOption(options);
        
    //run report
    task.run();
    task.close();
}

I’m using HTMLRenderOption here so I could use the same code to generate HTML and PDF.

In my test case, I just write the output to a file:

public void testPageBreak () throws Exception
{
    Map params = new HashMap (20);
    ...
    
    File dir = new File ("tmp");
    if (!dir.exists()) dir.mkdirs();
    File f = new File (dir, "pagebreak.pdf");
    if (f.exists())
    {
 if (!f.delete())
     fail ("Can't delete "+f.getAbsolutePath()
         + "nMaybe it's locked by AcrobatReader?");
    }
    
    FileOutputStream out = new FileOutputStream (f);
    ReportGenerator gen = new ReportGenerator();
    File basePath = new File ("../webapp/src/main/webapp/reports");
    gen.generateToStream(out, basePath, "sewingAtelier.rptdesign"
        , params);
    if (!f.exists())
 fail ("File wasn't written. Please check the BIRT logfile!");
}

Now, this is no test. It’s only a test when it can verify that the output is correct. To do this, I use PDFBox:

    PDDocument doc = PDDocument.load(new File ("tmp", "pagebreak.pdf"));
    // Check number of pages
    assertEquals (6, doc.getPageCount());
    assertEquals ("Error on page 1",
            "...n" + 
            "...n" +
     ...
            "..."
            , getText (doc, 1));

The meat is in getText():

private String getText (PDDocument doc, int page) throws IOException
{
    PDFTextStripper textStripper = new PDFTextStripper ();
    textStripper.setStartPage(page);
    textStripper.setEndPage(page);
    String s = textStripper.getText(doc).trim();
    
    Pattern DATE_TIME_PATTERN = Pattern.compile("^dd.dd.dddd dd:dd Page (d+) of (d+)$", Pattern.MULTILINE);
    Matcher m = DATE_TIME_PATTERN.matcher(s);
    s = m.replaceAll("23.07.2007 14:02 Page $1 of $2");
    
    return fixCRLF (s);
}

I’m using several tricks here: I’m replacing a date/time string with a constant, I stabilize line ends (fixCRLF() contains String.replaceAll("rn", "n");) and do this page by page to check the whole document.

Of course, since getText() just returns the text of a page as a String, you can use all the other operations to check that everything is where or as it should be.

Note that I’m using MockEJB and JNDI to hand a datasource to BIRT. The DB itself is Derby running in embedded mode. This allows me to connect to directly a Derby 10.2 database even though BIRT comes with Derby 10.1 (and saves me the hazzle to fix the classpath which OSGi builds for BIRT).

@Override
protected void setUp () throws Exception
{
    super.setUp();
    MockContextFactory.setAsInitial();
    
    Context ctx = new InitialContext();
    MockContextFactory.setDelegateContext(ctx);
    
    EmbeddedDataSource ds = new EmbeddedDataSource ();
    ds.setDatabaseName("tmp/test_db/TestDB");
    ds.setUser("");
    ds.setPassword("");

    ctx.bind("java:comp/env/jdbc/DB", ds);
}

@Override
protected void tearDown () throws Exception
{
    super.tearDown();
    MockContextFactory.revertSetAsInitial();
}

Links:


What’s Wrong With Java Part 2b

23. July, 2007

To give an idea why I needed 5KLoC for such a simple model, here is a detailed analysis of Keyword.java:

LoC Used by
43 Getters and setter
40 XML Import/Export
27 Model
27 equals()/hashCode()
21 Hibernate mapping with annotations
14 Imports
2 Logging
174 Total

As you can see, boiler plate code like getter/setters and equals() need 70LoC or 40% (48% if you add imports). Mapping the model to XML is more expensive than mapping it to a database. In the next installment, we’ll see that this can be reduced considerably.

Note: This is not a series of articles about flaws in Hibernate or the Java VM, this is about the Java language (ie. what you type into your IDE and then compile with javac).


What’s Wrong With Java Part 2

21. July, 2007

OR Mapping With Hibernate

After the model, let’s look at the implementation. The first candidate is the most successful OR mapper combination in the Java world: Hibernate.

Hibernate brings all the features we need: It can lazy-load ordered and unordered data sets from the DB, map all kinds of weird relations and it lets us use Java for the model in a very comfortable way: We just plain Java (POJO‘s actually) and Hibernate does some magic behind the scenes that connects the objects to the database. What could be more simple?

Well, an OO language which is more dynamic, for example. Let’s start with a simple task: Create a standalone keyword and put that into the DB. This is simple enough:

// Saving <tt>Keyword</tt> in database
Keyword kw = new Keyword();
kw.setType (Keyword.KEYWORD);
kw.setName ("test");

session.save (kw);

(Please ignore the session object for now.)

That was easy, wasn’t it? If you look at the log, you’ll see that Hibernate sent an INSERT statement to the DB. Cool. So … how do we use this new object? The first, most natural idea, would be to use the object we just saved:

// Saving <tt>Knowledge</tt> with a keyword in the database
Knowledge k = new Knowledge ();
k.addKeyword (kw);

session.save (k);

Unfortunately, this doesn’t work. It does work in your test but in the final application, the Keyword is created in the first transaction and the Knowledge in the second one. So Hibernate will (rightfully) complain that you can’t use that keyword anymore because someone else might have changed it.

Now, what? You have to ask Hibernate for a copy of every object after you closed the transaction in which you created it before you can use it anywhere else:

   1:
   2:
   3:
   4:
   5:
   6:
   7:
   8:
   9:
  10:
  11:
Keyword kw = new Keyword();
kw.setType (Keyword.KEYWORD);
kw.setName ("test");

session.save (kw);
kw = dao.loadById (kw.getId ());

Knowledge k = new Knowledge ();
k.addKeyword (kw);

session.save (k);

How to save Knowledge with a keyword in the database with transactions

Why do we have to load an object after just saving it? Well … because of Java. Java has very strict rules what you can do with (or to) an object instance after it has been created. One of them is that you can’t replace methods. So what, you’d think. In our case, things aren’t that simple. In our model, the name of a Knowledge instance is a Keyword. When you look at the code, you’ll see the standard setter. But when you run it, you’ll see that someone loads the item from the KEYWORD table. What is going on?

   1:
   2:
   3:
public void setName (Keyword name) {
    this.name = name;
}

setName() method

Behind the scenes, Hibernate replaces this method by using a proxy object, so it can notice when you change the model (setting a new name). The most simple soltuion would be to replace the method setName() in session.save() with calls the original setter and notifies Hibernate about the modification. In Python, that’s three lines of code. Unfortunately, this is impossible in Java.

So to get this proxy objects, you must show an object to Hibernate, let it make a copy (by calling save()) and then ask for the new copy which is in fact a wrapper object that behaves just like your original object but it also knows when to send commands to the database. Simple, eh?

Makes me wonder why session.save() doesn’t simply return the new object when it is more safe to use it from now on … especially when you have a model which is modified over several transactions. In this case, you can easily end up with a mix of native and proxy objects which will cause no end of headache.

Anyway. This approach has a few drawbacks:

  • If someone else creates the object, calls your code and then continues to do something with the original object (because people usually don’t expect methods to replace objects with copies when they call them), you’re in deep trouble. Usually, you can’t change that other code. You loose. Go away.
  • The proxy object is very similar but not the same as the original object. The biggest difference is that it has a different class. This means, in equals(), you can’t use this.getClass == other.getClass(). Instead, you have to use instanceof (the copy is derived from the original class). This breaks the contract of equals() which says that it must be symmetric.
  • If you have large, complex objects, copying them is expensive.
  • After a while, you will start to write factory methods that create the objects for you. The code is always the same: Create a simple object, save it, load it again and then return the copy. Apart from cut&paste, this means that you must not call new for some of your objects. Again, this breaks habits which leads to bugs.

All in all, the whole approach is clumsy. Really, it’s not Hibernate’s fault but the code is still ugly, hard to maintain (because it breaks the implicit rules we have become so used to). In Python, you just create the object and use it. The dynamic nature of Python allows the OR mapper to replace or wrap all the methods as it needs to and you never notice it. The code is clean, easy to understand and compact.

Another problem are the XML config files. Besides all the issues with Java XML parsers, it is always problematic to store the same information in two places. If you ever change your Java model, you better not forget to update the XML or you will get strange errors. You can’t refactor the model classes anymore because there is code outside the scope of your refactoring tool. And let’s not forget code completion which works pretty good for Java. Not so for XML files. If you’re lucky, someone has written a code completion for your type of XML config. Still, there will be problems. If there is a new version, your code completion will lag behind.

It’s like regexp: Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. — Jamie Zawinski

Fortunately, Sun solved this problem with JPA (or at least eased the pain). JPA allows to use annotations to store the mapping configuration in the class file itself. Apart from a few small problems (like setting up everything), this works pretty well. Code completion works perfectly because any IDE which has code completion will be able to use the latest and greatest version of your helper JARs without any customization. Just drop the new JAR in your classpath and you’re ready to do. Swell.

But there are more problems:

  • You must create a session object “somewhere” and hand it around. If you’re writing a webapp, this better be thread-safe. Not to mention you must be able to override this for tests.
  • The session object must track if you have already started a transaction and nest them properly or you will have to duplicate code because you can’t call existing methods if they use transactions.
  • Spring and AOP will help a lot but they also add another layer of complexity, you’ll have to learn another API, another set of rules how to organize your code, etc.
  • JAR file-size. My code is 246KB. The JARs it depends on take … 6’096KB, more than 40 times of my code. And I’m not even using Spring.
  • Even with JPA, Hibernate is not simple to use because Java itself is not simple to use.

In the end, the model was 5’400 LoC. A added a small UI to it using SWT/JFace which added 2’400 LoC.

If you look at the model in the previous installment, then the question is: Why do I need 5’000 LoC to write a program which implements an OR mapper for a model which has only three classes and 26 lines of code?

Granted, test cases and helper code take their toll. I could accept that this code needs four or five times the size of the model itself. Still, we have a gap.

The answer is that there are no or bad defaults. For our simple case, Hibernate could guess everything. Java could generate all the setters and getters, equals() and hashCode(). It’s no black magic to figure out that Relation has a reference to Knowledge so there needs to be a database table which stores this information. Sadly, defaults in Java are always “safe” rather than “clever”. This is the main difference to newer languages. They try to guess most of the stuff and then, you can fix those few exceptions that you always have. With Java, all the exceptions are handled but you have to do everyday stuff yourself.

The whole experience was frustrating, especially since I’m a seasoned Java developer. It took me almost two weeks to write the code for this small model mostly because because of a bug in Hibernate 3.1 and because I couldn’t get my mind around the existing documentation. Also, parent-child relations were poorly documented in the first Hibernate book. The second book explains this much better.

Conclusion: Use it if you must. Today, there are better ways.

Next stop: TurboGears, a Python web framework using SQL Objects.


What’s Wrong With Java, Part 1

11. July, 2007

As I promised at the Jazoon, I’m starting a series of posts here to present the reasons behind my thoughts on the future on Java. To do this, I’ll develop a small knowledge management application with the name Sensei, the Japanese word for “teacher”. The final goal is to have three versions in Java, Python an Groovy at the same (or at least a similar) level.

The application will sport the usual features: A database layer, a model and a user interface. Let’s start with the model which is fairly simple.

We have knowledge nodes that contain the knowledge (a short name and a longer description), keywords to mark knowledge nodes and relations to connect knowledge nodes. Additionally, we want to be able to organize knowledge nodes in a tree. Here is the UML:

To make it easier to search the model for names, I collect them in the keyword class. So a knowledge node has no String field with the name but a keyword field instead. The same applies to the relations. Here is what the Java model looks like:

   1:
   2:
   3:
   4:
   5:
   6:
   7:
   8:
   9:
  10:
  11:
  12:
  13:
  14:
  15:
  16:
  17:
  18:
  19:
  20:
  21:
  22:
  23:
  24:
  25:
  26:
class Keyword {
    public enum Type {
        KEYWORD,
        KNOWLEDGE,
        RELATION
    };

    static Set<Keyword> allKeywords;
    
    Type   type;
    String name;
}

class Knowledge {
    Keyword         name;
    String          knowledge;
    Knowledge       parent;
    List<Knowledge> children;
    Set<Keyword>    keywords;
}

class Relation {
    Keyword   name;
    Knowledge from;
    Knowledge to;
}

UML of Sensei Model

Note that I omitted all the usual cruft (private fields, getters/setters, equals/hashCode).

This model has some interesting properties:

  1. There is a recursive element. Many OR mappers show lots of example how to map a String to a field but for some reason, examples with tree-like structures are scarce.
  2. It contains 1:N mappings where N can be 0 (keywords can be attached to knowledge nodes but don’t have to be), 1:N mappings where N is always >= 1 (names of knowledge nodes and relations) and N:M mappings (relations between knowledge nodes).
  3. It contains a field called “from” which is an SQL keyword.
  4. There are sorted and unsorted mappings (children and keywords of a knowledge node).
  5. Instances of Keyword must be garbage collected somehow. When I delete the last knowledge node or relation with a certain name, I want the keyword deleted with it. This is not true for pure keywords, though.

To summarize, this is probably the smallest meaningful model you can come up with to test all possible features an OR mapper must support.

Let’s have a look at the features we want to support:

  1. Searching knowledge nodes by name or keyword or relation
  2. Searches should by case insensitive and allowing to search for substrings
  3. Adding child nodes to an existing knowledge node
  4. Reordering children
  5. Moving a child from one parent to another
  6. Adding and removing relations between nodes
  7. Adding and removing keywords to a knowledge node

In the next installment, we will have a look how Java and Hibernate can cope with these demands.


Building HTML

9. July, 2007

There are two ways to generate HTML: The right way and the JSP way. Enough has been said about JSP, what’s the right way?

The right way is to have a programming language that is flexible enough to merge HTML and code painlessly. Groovy does it this way:

 def builder = new MarkupBuilder ();
 builder.html {
    head {
        title getTitle()
    }
    body {
        genBody (builder)
    }
 }

What is going on here?

First of all, you must know that you can omit several things in Groovy: parentheses or semi-colon, for example. If we add these, the example becomes less readable but better understandable for Java developers:

 def builder = new MarkupBuilder ();
 builder.html () {
    head () {
        title (getTitle());
    };
    body () {
        genBody (builder);
    };
 };

So obviously, in the first line, a method html() is called. Now, you need to know that the code in curly brackets is a closure and that closure is added as the last parameter to the call of the method in front of it. This means the code in the curly brackets is passed into html() and can be executed any time html() wants to do it. It can even invoke the closure several times. In Java, the definition of html would be: html(Closure c).

The same happens with head(), title() (which takes a String argument) and body(). Now where are these methods defined?

Nowhere. MarkupBuilder() is a class which defines a “catch all” method called invokeMethod. Whenever Groovy cannot find the right method to invoke, it will check if a class defines

 Object invokeMethod(String methodName, Object args)

and invoke this method instead. The method gets the name of the original method and all the original parameters in a list.

In our case, MarkupBuilder.invokeMethod() will use the method name as the HTML element name. That’s it. A little bit of flexibility in the parser got us a long way towards HTML without making the code unreadable.

Next, we need a flexible way to pass HTML attributes. In Groovy, named arguments are supported:

 void genBody (MarkupBuilder builder) {
     builder.p (style:'font-weight: bold;', 'Impressive.')
 }

This will call MarkupBuilder.invokeMethod() with “p” as the method name and a list consisting of the map with the style and a string. This information will be used to build the element and it’s content.

Of course, the builder will stream the output (which is very simple since it can wrap your code when it needs to: essentially it will just warp all the virtual methods you call), there is no way to forget a closing element (your program won’t compile anymore) and splitting complex HTML into several methods is a cinch.

Life can be so simple with a little bit of flexibility.

Further reading: Using MarkupBuilder for Agile XML creation


Follow

Get every new post delivered to your Inbox.

Join 339 other followers