Never Rewrite From Scratch

24. April, 2024

In many projects, there is code which is so bad that no one wants to touch it. Eventually, the consensus is “we need to rewrite this from scratch”.

TL&DR: Never do this in one big step. Break it down into 1-4 hour work pieces. Do each on the side while doing your normal work. Start today, not when you need to add a feature to or fix a bug in the messy code.

How to Make Rewrites Work

The goals what we need to achieve:

  • Doing feature work during the rewrite must be easy.
  • The new code must be substantially better.
  • It would be great if we could stop halfway. Not that we have to, but the choice would be valuable.

If you think about this from a customer perspective: They pay you for (working) features, not for keeping your work area clean. The latter is a given. How would you feel if your favorite restaurant would put “Cleaning spillover: $50” on your next bill?

“Stop the world” rewrites should be avoided. They are also incredibly dangerous. Imagine, you estimate the rewrite takes two weeks. After two weeks, you notice you need more time. How much? Well … one week … maybe? You get another week. During which you find something unexpected which gives you an idea why the code was so messy in the first place. What now? Ask for a month more? Or admit defeat? Imagine how the meeting will go where you explain to your manager the situation. You have already spent 15 days on this. If you stop now, those will be wasted. Your lifetime will be wasted. And on top of this, you will still have the bad code. You will have spent a lot of money and gained nothing at all.

That’s why the rewrite must have a manageable impact on the productiveness of the team. Not negligible, but we must always stay in control on how we spend our effort.

The easiest and most reliable ways to make code better is to cut it into smaller pieces and to add unit tests on the way. A few lines of code are always easier to understand than a thousand. Tests document our expectations in ways that nothing else can.

Example

You have this god class that loads data from several database tables, processes it and then pumps the output into many other tables and external services.

Find the code which processes the data and move it into a new class. All fields become constructor parameters, all local variables become method parameters. You now have a simpler god class and a transformer. Write one or two unit tests for the transformer. Make sure you don’t need a database connection – if the transformer fetches more data, move that into a new helper and pass it as constructor parameter. Or load all the data in advance and pass a map into the transformer.

Let’s see what we did in more abstract terms.

How to rewrite bad code

In general, find an isolated area of functionality (configuration, extracting data, validation, transforming or loading into destination) and cut it out with the least amount of changes elsewhere. Ideally, you should use your IDE’s refactoring “move to new method in a new class.”

What have we achieved:

  • We cut the complicated mess into two pieces, one much easier to understand than before.
  • We have invested just a little bit of time.
  • We now understand the mess a bit better.
  • The remaining mess is less code, making it easier to deal with in the future.
  • We now have more code under test than before.
  • The extracted code is much, much easier and faster to test than the old mess.
  • When there is a bug in the extracted code, we are now much more efficient to write a test for it and fix it.
  • There is only a small risk that we introduced new bugs or even none when we used a refactoring.
  • We can merge this into the shared/production branch right away. No need for a long-living rewrite branch.
  • If it is valuable, we can add more tests. But we don’t have to. This gives us options.

Rinse & repeat. After a few times of this, you will begin to understand why the old code is messy. But at that time, you will have moved all the irrelevant stuff to unit tested code. So fixing the core of the mess will be much easier because now,

  • it’s much less code to deal with,
  • a lot of complexity will be gone,
  • many unit tests have your back, and
  • you will have begun to understand this code despite it’s lack of good documentation and it’s messiness.

The best parts:

  • No additional stress when doing it this way.
  • If you make a mistake, you won’t have wasted a lot of the client’s money AND your teams lifetime AND your reputation AND you can easily revert it.
  • You can stop at any time – there is never a “if we stop now, it’s all for naught” situation.
  • Several people can do this in parallel with just a little bit of coordination.
  • Management and customers are fine with tidying up for a few hours every week.
  • If something else is more important, you can switch focus. If the customer needs a new feature here, you can spend more time extracting the least messy stuff because it will make you more efficient. Or not. Again, you get a choice that you didn’t have before.
  • You now have something valuable you can work on when you’re blocked for one hour.
  • You get to fix it (eventually).
  • At the end, everyone will be happy.

It just takes a bit longer from start to finish. Treat yourself to a bag of sweets when the mess is good enough. You deserve it.

If you’re a manager: Organize lunch/dinner for the whole team. They deserve it.

Next, let us look at whether we should do this at all.

Why We Should Rewrite

Rewrites don’t bring immediate business value. From a developer perspective, the rewrite is absolutely necessary. From a customer perspective, it’s a waste of time: “Spend my money to get the same functionality?”

Cleanliness

So let’s think about this as cleanliness. Everyone has to clean their room, desk, body once in a while. You don’t have to shower three times a day. But once a week is definitely not enough. So there is a sweet spot between several times a day and once per week.

Why? How? When?

Why do we do it? Because dirt accumulates over time and if you don’t tidy up regularly, the cost of tiding up suddenly explodes PLUS you won’t be able to do your work efficiently.

How long do we do it? You shower for 10 – 30 minutes, not a whole day. So spend at most 10% of your working time on this. I find that half an hour every day is working great.

When do we start? Now. Really. Right now. Don’t let the dirt evolve into cancer and kill you. Or in software terms: Which one is better?

  1. You start improving the worst part of your code base today. After a few fixing rounds, you get a high priority bug/feature which has to be fixed/implemented right now
  2. You get a high priority bug/feature which has to be fixed/implemented right now. It’s in the worst part of the code base. You haven’t done any tidying there, yet.

The lesson here is: Tidy up regularly and in short bursts. Focus on where you expect changes in the near future over code that hasn’t changed for a long time.

Looking at software in general: Spending several consecutive days on cleanup is too much. Going without cleanup for a week and the software will soon starts to reek. Aim for at least one hour per week and at most four hours.

Least Messy

Apply this to rewrites: Today, you have a huge mess. Find a part in it that is least “messy” and fix that. Some time later, depending on your workload, do it again.

I define “least messy” like this:

  • Most independent – changes here will affect the least amount of code.
  • Easiest to understand – try to find something that does one thing only.
  • Can be extracted within one to four hours of work, including at least one new unit test.

We now know how to rewrite and when to do it. But one question is still open: What makes rewrites fail?

Why Rewrite From Scratch Fails

Many developers believe that rewrite from scratch is the only possible solution for really bad code. More importantly, they think they can fix it within a certain – usually low – time budget. We have the code to guide us, right? Should be easier second time around, right? With all that we’ve learned since?

Usually not. You don’t understand the code: this is the main reason why you want to scrap it! It has hidden issues which drive the “badness”. You don’t have a reliable list of all features. Lastly, you either have to lie to your manager about the effort or you won’t get permission.

Let’s look at each of those in more detail.

Bad code is hard to understand

The first reason why you want to rewrite from scratch is that you don’t understand the bad code. This makes it harder to change and drives your urge to get rid of it.

For the same reason, it will also slow you down during the rewrite. The rule of thumb with bad code is: “If it took N hours to write the first time, it will take ~N hours to rewrite from scratch”. This only applies when

  • you have a competent team,
  • everyone involved in writing the bad code is still there.

You can do better, if

  • all the requirements that ever went into this code are readily available,
  • there is good documentation,
  • not much time has passed since the time the mess was created.

But usually, the messy code is in this state because the first two were missing the first time around and the latter isn’t true since no one dared to touch the code because it always caused problems.

For these reasons, the bad code will slow you down instead of helping you during the rewrite. But that’s not all.

Hidden Design Issues

Why is the code so bad? There is a reason for that. No matter how bad it looks today, it was written my smart, competent people just like you. What happened?

Often, it was written with assumptions that turned out to be somewhat wrong. Not totally off target, just not spot on. Like how complex the underlying problem is. The original design didn’t solve the problem in an efficient way. The code didn’t work well to begin with, time pressure built up, eventually the team had to move on. Code had to be made to work “somehow” to meet a deadline.

Do you understand today where you went wrong the first time? Do you know how to solve it, now? Without this, your attempt to rewrite will produce “bad code, version 2”. Or at best “slightly better code, way over budget”. In addition to those two, you might even know less today than the first time.

Lack of Information

The third reason is that you don’t have good documentation. The knowledge of most of the features will be lost or hidden in old bug/feature tickets and outdated wiki pages.

Since you can’t trust the code, you will have to painstakingly rebuild the knowledge that went into the first version without many of the information sources you had the first time. Many developers involved in the first version have left. Even if they are still around: The reasons for most of the decisions will be long forgotten by now.

Therefore information wise, you probably start worse off than when someone made this mess the first time. Which was some time ago. Bad code festered into an ugly mess of thousands of lines of code, workarounds and hasty bug fixes. How long will it take to clean this up?

Realistic Estimates

The last reason is that a rewrite takes longer than a few days. If this wasn’t the case, you’d have solved the problem already – no one argues about a rewrite that takes just a few hours.

Here, we have a psychological problem. No one knows how long it took to write the original code – it “evolved.” Maybe a month? Half a year?

Well, we know better this time, so it has to take less time. We do? Why? Okay, how much less? Well … this code is so bad, it hurts so much … it has to go! … it’s embarrassing to even talk about this … how much is management willing to … and you’re doomed. Instead of giving an honest estimate, you try to find a number that will green-light the attempt. Or you give an honest estimate and management will (correctly) say “No.”

Challenges

You will face many challenges. I’ve listed suggestions how to handle them.

My boss/client won’t let me

Argue that you need time to clean your work area, just like a carpenter needs to sweep the chips from the the floor between projects. When too much dirt accumulates, you can’t work quickly or safely. Which means new features will either be more expensive or they will have more bugs.

We don’t have time for this!

One picture says more then a thousand words in this case: https://hakanforss.wordpress.com/2014/03/10/are-you-too-busy-to-improve/

It’s so bad, we can’t fix individual parts of it!

Well, let me know how it went.

For everyone else: This is in production. So it can’t be that bad. As in it’s not killing you right now. It’s just very painful and risky to make changes there. Despite how bad the whole is, a lot of thought and effort went into the individual changes. Often more than elsewhere because extra care was taken since this was a dangerous area. This also means that it would be a terrible waste to throw everything away just because it looks a like huge reeking dump of garbage from a distance. You know how they fix oil spills? The put a barrier up and then, it’s one dirty bird / beach at a time.

So look at the messy code. Try to see what you can salvage today. Keep all the good stuff that you can reuse. Clean it. Keep chipping away at the huge pile. Move carefully so it can’t come crashing down. As your knowledge grows, the remaining work shrinks. Eventually, you will be able to replace whole code paths. And one day, guaranteed, the huge pile will become a molehill that you can either stomp into the ground with the heel of your boot or … ignore.

While I’m at it, let me just fix this as well!

You will often feel the urge to go on cleaning after you started. Just one more warning. Oh, and I can extract this, now! And I know how to write five more unit tests.

Set a time limit and learn to stick to it. If you have more ideas how to improve things, write them down. A comment in the code works well since someone else might pick it up. If you clean code for three days, other people won’t praise you. Imagine it the other way around: There are so many important things to do right now and your colleague just spent three days cleaning up compiler warnings?

Also, remember the 80:20 rule: Most clean ups will only take a bit of time. As soon as you get in the “hard to fix” area, you’re spending more and more effort. Eventually, the clean up will cost more than you’ll ever benefit from it. Keeping it time boxed will prevent you from falling into this trap.

I don’t have time to write a unit test

Come back when you have. Adding tests is an important part of the work. It’s like a carpenter sweeping the chips under a rug. You need this test. Because …

Writing the unit test takes ages

Excellent! You have found a way to measure whether you’re doing it right or wrong. If writing the new unit test is hard, there is a problem that you don’t understand yet. The code you extracted has turned out to be much more dangerous than you thought. Great! Close your eyes and focus on that feeling. Learn to recognize it as early as possible. This emotion will become one of the most valuable tools in your career. Whenever you feel it, stop immediately. Get up. Get a coffee. Stare at the wall. Ask yourself “Why am I feeling this? Which mistake am I about to make?”

Now let’s look at reasons why the unit test is so hard to write.

The unit test needs a lot of setup

This indicates that you have an integration test, not a unit test. You probably failed to locate what I called “least messy” above. Document your findings and revert. Try to find a part to extract that has fewer dependencies.

The unit test needs complicated data structures

Looks like you need to improve the design of the data model. Check how you can make the different data classes more independent of each other. For example, if you want to write tests for the address of an invoice, you shouldn’t need order items. If improving your data model will make it more efficient to write the tests, stop the tidying here and clean the data model instead.

Option #2: Consider creating a test fixture with test data builders for your model classes. The builders should produce standard test cases. In your tests, you create the builder, then modify just the fields that your test needs and call build() to get a valid instance of your complex model.

Writing the unit test fails for another reason

Write a comment in a text editor what you tried and why it failed. Include all useful information including class names and stack traces. Revert your changes. Commit the comment.

You really can’t achieve much more here. Stop for now, do a feature, and resume tidying tomorrow. If you have an idea then how to improve this code: Do it. If not, tidy up elsewhere.

I can’t find anything to extract

Try to extract fewer lines of code. Sometimes, extracting a single line into a method with a good name helps tremendously understanding complex code. This is counter intuitive: How can turning one line into four make the code easier to understand? Because how the brain works: You brain doesn’t read characters, it looks for indentation. Reading a good method name is faster and more efficient than running a 80 character expression in your head.

Next, sort each code line into “fetching data from somewhere”, “transforming the data” and “loading the data into something”. For methods that mix two or three of those, try to split the method into two or three methods or classes where each does just one thing.

Conclusion

The net effect of the above is that software developers tend to underestimate rewrites. The result: the rewrite costs much more than expected. Management is unhappy: “just can’t trust the estimates of developers”. Developers are unhappy: “management will not allow us to do this again” and “I put so much effort into this and everyone hates me for it”. The customer is very, very unhappy (“I paid how much to get what I already have??? And what about … ?? You postponed it? I needed that this week! Do you have any idea how much revenue I lost …”).

So the only solution which will work most of the time:

  • Cut the huge rewrite into small, manageable parts.
  • Each part should slightly improve the situation.
  • Add at least one unit test to each improved part.
  • Spend a small amount of your weekly work time on tidying.
  • Merge quickly.
  • Start today.

See Also


What are Software Developers Doing All Day?

6. November, 2021

Translate.

Mathematics? Nope. I use the trigonometric functions like sin(x) to draw nice graphics in my spare time but I never use them at work. I used logarithmic last year to round a number but that’s about it. Add, multiply, subtract and divide is all the math that I ever do and most of that is “x = x + 1”. If I have to do statistics, I use a library. No need to find the maximum of a list of values myself.

So what do we do? Really?

We translate mumble into Programming Language of the Year(TM).

Or more diplomatic: We try to translate the raw and unpolished rambling of clients into the strict and unforgiving rules of a programming language.

We’re translators. Like those people who translate between human languages. We know all the little tricks how to express ourselves and what you can and can’t easily express. After a while, we can distinguish between badly written code and the other kind, just like an experienced journalist.


Jazoon 2012: CQRS – Trauma treatment for architects

4. July, 2012

A few years ago, concurrency and scalability were a hype. Today, it’s a must. But how do you write applications that scale painlessly?

Command and Query Responsibility Segregation (CQRS) is an architectural pattern to address these problems. In his talk, Allard Buijze gave a good introduction. First, some of the problems of the standard approach. Your database, everyone says, must be normalized.

That can lead to a couple of problems:

  • Historic data changes
  • The data model is neither optimized for writes nor for queries

The first problem can result in a scenario like this. Imagine you have a report that tells you the annual turnover. You run the report for 2009 in January, 2010. You run the same report again in 2011 and 2012 and each time, the annual turnover of 2009 gets bigger. What is going on?

The data model is in third normal form. This is great, no data duplication. It’s not so great when data can change over time. So if your invoices point to the products and the products point to the prices, any change of a price will also change all the existing invoices. Or when customers move, all the addresses on the invoices change. There is no way to tell where you sent something.

The solution is to add “valid time range” to each price, address, …, which makes your SQL hideous and helps to keep your bug tracker filled.

It will also make your queries slow since you will need lots and lots of joins. These joins will eventually get in conflict with your updates. Deadlocks occur.

On the architectural side, some problems will be much easier to solve if you ignore the layer boundaries. You will end up business logic in the persistence layer.

Don’t get me wrong. All these problems can be solved but the question here is: Is this amount of pain really necessary?

CQRS to the rescue. The basic idea is to use two domain models instead of one. Sounds like more work? That depends.

With CQRS, you will have more code to maintain but the code will be much more simple. There will be more tables and data will be duplicated in the database but there will never be deadlocks, queries won’t need joins in the usual case (you could get rid of all joins if you wanted). So you trade bugs for code.

How does it work? Split your application into two main parts. One part takes user input and turns that into events which are published. Listeners will then process the events.

Some listeners will write the events into the database. If you need to, you will be able to replay these later. Imagine your customer calls you because of some bug. Instead of asking your customer to explain what happened, you go to the database, copy the events into a test system and replay them. It might take a few minutes but eventually, you will have a system which is in the exact same state as when the bug happened.

Some other listeners will process the events and generate more events (which will also be written to the database). Imagine the event “checkout”. It will contain the current content of the shopping cart. You write that into the database. You need to know what was in the shopping basket? Look for this event.

The trick here is that the event is “independent”. It doesn’t contain foreign keys but immutables or value objects. The value objects are written into a new table. That makes sure that when you come back 10 years later, you will see the exact same shopping cart as the customer saw when she ordered.

When you need to display the shopping cart, you won’t need to join 8 tables. Instead, you’ll need to query 1-2 tables for the ID of the shopping cart. One table will have the header with the customer address, the order number, the date, the total and the second table will contain the items. If you wanted, you could add the foreign keys to the product definition tables but you don’t have to. If that’s enough for you, those two tables could be completely independent of any other table in your database.

The code to fill the database gets the event as input (no database access to read anything from anywhere) and it will only write to those two tables. Minimum amount of dependencies.

The code to display the cart will only need to read those two tables. No deadlocks possible.

The code will be incredibly simple.

If you make a mistake somewhere, you can always replay all the events with the fixed code.

For tests, you can replay the events. No need to a human to click buttons in a web browser (not more than once, anyway).

Since you don’t need foreign keys unless you want to, you can spread the data model over different databases, computers, data centers. Some data would be better in a NoSQL repository? No problem.

Something crashes? Fix the problem, replay the events which got lost.

Instead of developing one huge monster model where each change possibly dirties some existing feature, you can imagine CQRS as developing thousands of mini-applications that work together.

And the best feature: It allows you to retroactively add features. Imagine you want to give users credits for some action. The idea is born one year after the action was added. In a traditional application, it will be hard to assign credit to the existing users. With CQRS, you simply implement the feature, set up the listeners, disable the listeners which already ran (so the action isn’t executed again) and replay the events. Presto, all the existing users will have their credit.

Related:


Jazoon 2012: Improving system development using traceability

4. July, 2012

When you develop a software, you will ask yourself these questions (quoted from here):

  • Is it still possible to accept a late change request? What would be the impact?
  • What is the overall level of completion of the system or a component?
  • Which components are ready for testing?
  • A failure occurs because the system is erroneous. What parts of the system should I check?

In his talk “Improving system development using traceability“, Ömer Gürsoy shows an approach to answer these. The idea is to trace changes end-to-end: From the idea over requirements to design, implementation, tests, bug reports and the product manual. For this to work, you’ll need to

  • Analyze
  • Document
  • Validate
  • Manage

At itemis, they developed tooling support. A plug-in for Eclipse can track changes in all kinds of sources (text documents, UML diagrams, requirement DSLs) and “keep them together”. It can answer questions like “who uses this piece of code?”

The answer will tell you where you need to look to estimate the impact of a change. That helps to avoid traps like underestimation or missing surveillance.

Today, the plug-in shows some promise but there are rough edges left. The main problem is integration with other tools. The plug-in supports extension points to add any kind of data source but that only helps if the data source is willing to share. The second problem is that it doesn’t support versioning right now. It’s on the feature list.

On the positive side, it can create dependencies from a piece of text (say a paragraph in a text file). If you edit other parts of the text file, the tool will make sure the dependency still points to the right part of the text. So you can make notes during a meeting. Afterwards, you can click on the paragraphs and link them to (new) requirements or parts of the code (like modules) that will be affected. Over time, a graph of dependencies will be created that helps you to keep track of everything that is related to some change and how it is related: Where did the request come from? Which code was changed?

Always keep in mind that tracking everything isn’t possible – it would simply too expensive today. But you can track your most important or most dangerous changes. That would give you the most bang for the buck. To do that, you must know what you must track and why.

A feature that I’d like to see is automatic discovery. Especially Java source code should be easy to analyze for dependencies.


Jazoon 2012: Divide&Conquer: Efficient Java for Multicore World

29. June, 2012

Not much new in the talk “Divide&Conquer: Efficient Java for Multicore World” by Sunil Mundluri and Velmurugan Periasamy.

Amdahl’s law shows that you can’t get an arbitrary speed-up when running part of your code in parallel. In practice, you can expect serial code to execute 2-4 times faster if you run it with, say, the fork/join framework of Java 7. This is due to setup + join cost and the fact that the tasks themselves don’t get faster – you just execute more of them at the same time. So if a task takes 10 seconds and you can run all of them in parallel, the total execution time will be a bit over 10s.

If you want to use fork/join with Java 6, you can add the jsr166y.jar to your classpath.

Again, functional programming makes everything more simple. With Java 8 and lambda expressions, syntactic sugar will make things even more readable but at a price.

You might want to check one of today’s new languages like Xtend, Scala or Groovy to get these features today with Java 6.


Jazoon 2012: Syntactic Salt and Sugar

29. June, 2012

Syntactic Salt and Sugar was a presentation given by James Gould and Alex Holmes. They were talking about some recent developments and whether they are good (sugar) or bad (salt).

DSLs

DSLs are becoming ubiquitous. Everyone wants, needs and does DSLs today. But think of this for a moment: Is SQL a DSL?

Scary thought, eh? It’s certainly a limited language but since it’s Turing complete, the limits are more in the pain writing queries and not in the fact that it’s a language designed to query data sets.

The advantage of DSLs is that you can fine tune them to your domain. That can help to avoid a lot of confusion.

But …

  • There are five people on this planet who can develop a nice syntax that is easy to use, easy to read, easy to understand and mostly consistent. Guido van Rossum is one of them. You’re not.
  • It’s easy to fall for the “one more feature” trap in a DSL. The most important property of a DSL is that it’s limited. It’s not a general purpose programming language.
  • Getting the syntax right is very, very hard. It’s easy to define syntax in the Xtext grammar editor – as long as you blissfully ignore the consumers of your DSL. As soon as you try to make their lives easier, all hell will break loose. Do you allow trailing commas? How do you handle ambiguities? Did you make sure all error messages make sense? Is it still readable? Can you add features without breaking all existing code?
  • YALTL – Yet another language to learn

Default Methods in Java 8

In Java 8, you can add method bodies to methods defined in interfaces:

public interface Foo {
String getName() default { return "Foo"; }
}

Finally, you can have mixins in Java. Yay ^_^

Now, some people will wonder: Isn’t that multiple inhertiance?

Yup. And as usual, because of some “features” of Java, they had to implement this in a … surprising way. What does this code print?

public interface A {
    String getName() default { return "A"; }
}

public interface B {
    String getName() default { return "B"; }
}

public class C implements A, B {
    public void main() {
        System.out.println(new C().getName());
    }
}
Nothing – it doesn’t compile because the compiler can’t decide which method to call. But this one compiles:
public interface A {
    String getName() default { return "A"; }
}

public interface B {
    String getName() default { return "B"; }
}

public interface C extends B {}

public class D implements A, C {
    public void main() {
        System.out.println(new C().getName());
    }
}

If you’re wondering: Instead of inheriting directly from “B”, I added a new interface “C”. Now, “A” is “closer” and it will print “A”.

That means changes in A or C can modify the behavior of D. If you’re lucky, the compiler will refuse to compile it. *sigh*

No Free Lunch

Again, it’s easy to see that each feature comes with a cost attached.


Jazoon 2012: Building Scalable, Highly Concurrent and Fault-Tolerant Systems: Lessons Learned

29. June, 2012

What do Cloud Computing, multi-core processors and Big Data have in common?

Parallelism.

In his presentation, Jonas Bonér showed what you should care about:

  • Always prefer immutable
  • Separate concerns in different layers with the minimum amount of dependencies
  • Separate error handling from the business logic
  • There is no free lunch: For every feature, you will have to pay a price
  • Avoid using RPC/RMI. Try lure you into “convenience over correctness”
  • Make sure you handle timeouts correctly
  • Use CALM if you can
  • Not all your data needs ACID.
  • Know about CAP and BASEDrop ACID And Think About Data
  • Get rid of dependencies by using event sourcing/CQS/CQRS
  • Frameworks like Hibernate always leak in places where you can’t have it. KISS.

Longer explanation:

Immutables can always be shared between threads. Usually, they are also simple to share between processes, even when they run on different computers. Trying locks and clever concurrency will only get you more bugs, unmaintainable code and a heart attack.

Dependencies kill a project faster and more efficiently than almost any other technique. Avoid them. Split your projects into Maven modules. You can’t import what you don’t have on the classpath.

Error handling in your business logic (BL) will bloat the code and make it harder to maintain. Business logic can’t handle database failures. Parameters should have been validated before they were passed to business logic. Business logic should produce a result and the caller should then decide what to do with it (instead of mixing persistence code into your business layer). The BL shouldn’t be aware that the data comes from a database or that the result goes back into a database. What would your unit tests say? See also Akka 2.0 and “parental supervision.”

Obvious programming has a value: You can see what happens. It has a price: Boiler plate code. You can try to hide this but it will still leak. Hibernate is a prefect example for this. Yes, it hides the fact that getChildren() needs to run a query against the database – unless the entity leaks outside of your transaction. It does generate proxies to save you from seeing the query but that can break equals().

Same applies to RMI. When RMI decides that you can’t handle the message, then you won’t even see it. In many cases, a slightly “unusual” message (like one with additional fields) wouldn’t hurt.

As soon as you add RMI or clustering, you add an invisible network in your method calls. Make sure you have the correct timeouts (so your callers don’t block forever) and that you handle them correctly. New error sources that are caused adding the network:

  1. Failure to serialize the message
  2. Host unreachable
  3. Packet drops
  4. Network lag
  5. Destination doesn’t accept message because of configuration error
  6. Message is sent to the wrong destination
  7. Destination can’t read message
Claim checks allow to resend a message again after a timeout without having it processed twice by the consumer.

CALM and BASE refer to the fact that you can only have two of the tree CAP characteristics: Consistency, Availability and Partition Tolerance. Since Partition Tolerance (necessary for scaling) and Availability (what’s the point of having a consistent but dead database?) are most important, you have to sacrifice consistency. CALM and BASE show ways to eventually reach consistency, even without manual intervention. For all data related to money, you will want consistency as well but think about it: How many accounts are there in your database? And how many comments? Is ACID really necessary for each comment?

Solution: Put your important data (when money is involved) into an old school relational database. Single instance. Feed that database with queues, so it doesn’t hurt (much) when it goes down once in a while. Put comments, recommendations, shopping carts into a NoSQL database. So what if a shopping cart isn’t synchronized over all your partitions? Just make sure that users stay on one shard and they will only notice when the shard dies and you can’t restore the shopping cart quickly enough from the event stream.

Which event stream? The one which your CQRS design created. More on that in another post. You might also want to look at Akka 2.0 which comes with a new EventBus.


Commenting Code

1. March, 2012

A lot of people way “you must comment your code.”

Kevlin Henney wrote an excellent piece on this topic in 97 Things Every Programmer Should KnowComment Only What the Code Cannot Say

It really boils down to the last sentence: “Comment what the code cannot say, not simply what it does not say.”

There are various reasons why people demand comments:

  1. They are not fluent in the programming language or don’t know enough to read the code. There is nothing wrong with the code – the readers simply don’t know enough to understand it.
  2. The code is broken in some way and you need the comment to make sure people don’t break it even more.
  3. The comment explains something that no one will see from the code.

Only #3 is a valid reason for comments. #1 is just adding noise for people who shouldn’t touch the code anyway. #2 means you should refactor the code to make its intent clear – adding comments will only make things worse.

Related articles:


Open Source As Good As Proprietary Software

28. February, 2012

The Coverity Scan 2011 Open Source Integrity Report (registration necessary) says: “Open source quality is on par with proprietary code quality, particularly in cases where codebases are of similar size.”

Which isn’t that surprising considering that it’s the same people who write both.

But there are a couple of hard number in the report which are interesting:

Linux 2.6 has about 0.62 defects per 1000 lines of code (KLOC) which Coverity says “is roughly identical to that of its proprietary codebase counterparts.” They can’t tell names but I guess the counterparts are Windows and Mac OS X. They have 0.64 defects per KLOC.

The industry average is 1.0 defects per KLOC which matches well with my (more anecdotal) knowledge that the best software developers make about 3-4 mistakes per KLOC of which 75% are found during development.


Using Maven to Patch Third Party Code

26. October, 2011

If you have the source for the dependency, patching the code is simple: Just create a small Maven project that compiles the source with your changes. Since your changes are probably small, you need only a few tests: What’s the point to test the code that you didn’t touch? Also the build will be simple (just compile the sources for your need, no fancy resource processing/filtering).

But what if you don’t have the sources? Jakub Holý has a solution: Hacking A Maven Dependency with Javassist to Fix It