Never Rewrite From Scratch

24. April, 2024

In many projects, there is code which is so bad that no one wants to touch it. Eventually, the consensus is “we need to rewrite this from scratch”.

TL&DR: Never do this in one big step. Break it down into 1-4 hour work pieces. Do each on the side while doing your normal work. Start today, not when you need to add a feature to or fix a bug in the messy code.

How to Make Rewrites Work

The goals what we need to achieve:

  • Doing feature work during the rewrite must be easy.
  • The new code must be substantially better.
  • It would be great if we could stop halfway. Not that we have to, but the choice would be valuable.

If you think about this from a customer perspective: They pay you for (working) features, not for keeping your work area clean. The latter is a given. How would you feel if your favorite restaurant would put “Cleaning spillover: $50” on your next bill?

“Stop the world” rewrites should be avoided. They are also incredibly dangerous. Imagine, you estimate the rewrite takes two weeks. After two weeks, you notice you need more time. How much? Well … one week … maybe? You get another week. During which you find something unexpected which gives you an idea why the code was so messy in the first place. What now? Ask for a month more? Or admit defeat? Imagine how the meeting will go where you explain to your manager the situation. You have already spent 15 days on this. If you stop now, those will be wasted. Your lifetime will be wasted. And on top of this, you will still have the bad code. You will have spent a lot of money and gained nothing at all.

That’s why the rewrite must have a manageable impact on the productiveness of the team. Not negligible, but we must always stay in control on how we spend our effort.

The easiest and most reliable ways to make code better is to cut it into smaller pieces and to add unit tests on the way. A few lines of code are always easier to understand than a thousand. Tests document our expectations in ways that nothing else can.

Example

You have this god class that loads data from several database tables, processes it and then pumps the output into many other tables and external services.

Find the code which processes the data and move it into a new class. All fields become constructor parameters, all local variables become method parameters. You now have a simpler god class and a transformer. Write one or two unit tests for the transformer. Make sure you don’t need a database connection – if the transformer fetches more data, move that into a new helper and pass it as constructor parameter. Or load all the data in advance and pass a map into the transformer.

Let’s see what we did in more abstract terms.

How to rewrite bad code

In general, find an isolated area of functionality (configuration, extracting data, validation, transforming or loading into destination) and cut it out with the least amount of changes elsewhere. Ideally, you should use your IDE’s refactoring “move to new method in a new class.”

What have we achieved:

  • We cut the complicated mess into two pieces, one much easier to understand than before.
  • We have invested just a little bit of time.
  • We now understand the mess a bit better.
  • The remaining mess is less code, making it easier to deal with in the future.
  • We now have more code under test than before.
  • The extracted code is much, much easier and faster to test than the old mess.
  • When there is a bug in the extracted code, we are now much more efficient to write a test for it and fix it.
  • There is only a small risk that we introduced new bugs or even none when we used a refactoring.
  • We can merge this into the shared/production branch right away. No need for a long-living rewrite branch.
  • If it is valuable, we can add more tests. But we don’t have to. This gives us options.

Rinse & repeat. After a few times of this, you will begin to understand why the old code is messy. But at that time, you will have moved all the irrelevant stuff to unit tested code. So fixing the core of the mess will be much easier because now,

  • it’s much less code to deal with,
  • a lot of complexity will be gone,
  • many unit tests have your back, and
  • you will have begun to understand this code despite it’s lack of good documentation and it’s messiness.

The best parts:

  • No additional stress when doing it this way.
  • If you make a mistake, you won’t have wasted a lot of the client’s money AND your teams lifetime AND your reputation AND you can easily revert it.
  • You can stop at any time – there is never a “if we stop now, it’s all for naught” situation.
  • Several people can do this in parallel with just a little bit of coordination.
  • Management and customers are fine with tidying up for a few hours every week.
  • If something else is more important, you can switch focus. If the customer needs a new feature here, you can spend more time extracting the least messy stuff because it will make you more efficient. Or not. Again, you get a choice that you didn’t have before.
  • You now have something valuable you can work on when you’re blocked for one hour.
  • You get to fix it (eventually).
  • At the end, everyone will be happy.

It just takes a bit longer from start to finish. Treat yourself to a bag of sweets when the mess is good enough. You deserve it.

If you’re a manager: Organize lunch/dinner for the whole team. They deserve it.

Next, let us look at whether we should do this at all.

Why We Should Rewrite

Rewrites don’t bring immediate business value. From a developer perspective, the rewrite is absolutely necessary. From a customer perspective, it’s a waste of time: “Spend my money to get the same functionality?”

Cleanliness

So let’s think about this as cleanliness. Everyone has to clean their room, desk, body once in a while. You don’t have to shower three times a day. But once a week is definitely not enough. So there is a sweet spot between several times a day and once per week.

Why? How? When?

Why do we do it? Because dirt accumulates over time and if you don’t tidy up regularly, the cost of tiding up suddenly explodes PLUS you won’t be able to do your work efficiently.

How long do we do it? You shower for 10 – 30 minutes, not a whole day. So spend at most 10% of your working time on this. I find that half an hour every day is working great.

When do we start? Now. Really. Right now. Don’t let the dirt evolve into cancer and kill you. Or in software terms: Which one is better?

  1. You start improving the worst part of your code base today. After a few fixing rounds, you get a high priority bug/feature which has to be fixed/implemented right now
  2. You get a high priority bug/feature which has to be fixed/implemented right now. It’s in the worst part of the code base. You haven’t done any tidying there, yet.

The lesson here is: Tidy up regularly and in short bursts. Focus on where you expect changes in the near future over code that hasn’t changed for a long time.

Looking at software in general: Spending several consecutive days on cleanup is too much. Going without cleanup for a week and the software will soon starts to reek. Aim for at least one hour per week and at most four hours.

Least Messy

Apply this to rewrites: Today, you have a huge mess. Find a part in it that is least “messy” and fix that. Some time later, depending on your workload, do it again.

I define “least messy” like this:

  • Most independent – changes here will affect the least amount of code.
  • Easiest to understand – try to find something that does one thing only.
  • Can be extracted within one to four hours of work, including at least one new unit test.

We now know how to rewrite and when to do it. But one question is still open: What makes rewrites fail?

Why Rewrite From Scratch Fails

Many developers believe that rewrite from scratch is the only possible solution for really bad code. More importantly, they think they can fix it within a certain – usually low – time budget. We have the code to guide us, right? Should be easier second time around, right? With all that we’ve learned since?

Usually not. You don’t understand the code: this is the main reason why you want to scrap it! It has hidden issues which drive the “badness”. You don’t have a reliable list of all features. Lastly, you either have to lie to your manager about the effort or you won’t get permission.

Let’s look at each of those in more detail.

Bad code is hard to understand

The first reason why you want to rewrite from scratch is that you don’t understand the bad code. This makes it harder to change and drives your urge to get rid of it.

For the same reason, it will also slow you down during the rewrite. The rule of thumb with bad code is: “If it took N hours to write the first time, it will take ~N hours to rewrite from scratch”. This only applies when

  • you have a competent team,
  • everyone involved in writing the bad code is still there.

You can do better, if

  • all the requirements that ever went into this code are readily available,
  • there is good documentation,
  • not much time has passed since the time the mess was created.

But usually, the messy code is in this state because the first two were missing the first time around and the latter isn’t true since no one dared to touch the code because it always caused problems.

For these reasons, the bad code will slow you down instead of helping you during the rewrite. But that’s not all.

Hidden Design Issues

Why is the code so bad? There is a reason for that. No matter how bad it looks today, it was written my smart, competent people just like you. What happened?

Often, it was written with assumptions that turned out to be somewhat wrong. Not totally off target, just not spot on. Like how complex the underlying problem is. The original design didn’t solve the problem in an efficient way. The code didn’t work well to begin with, time pressure built up, eventually the team had to move on. Code had to be made to work “somehow” to meet a deadline.

Do you understand today where you went wrong the first time? Do you know how to solve it, now? Without this, your attempt to rewrite will produce “bad code, version 2”. Or at best “slightly better code, way over budget”. In addition to those two, you might even know less today than the first time.

Lack of Information

The third reason is that you don’t have good documentation. The knowledge of most of the features will be lost or hidden in old bug/feature tickets and outdated wiki pages.

Since you can’t trust the code, you will have to painstakingly rebuild the knowledge that went into the first version without many of the information sources you had the first time. Many developers involved in the first version have left. Even if they are still around: The reasons for most of the decisions will be long forgotten by now.

Therefore information wise, you probably start worse off than when someone made this mess the first time. Which was some time ago. Bad code festered into an ugly mess of thousands of lines of code, workarounds and hasty bug fixes. How long will it take to clean this up?

Realistic Estimates

The last reason is that a rewrite takes longer than a few days. If this wasn’t the case, you’d have solved the problem already – no one argues about a rewrite that takes just a few hours.

Here, we have a psychological problem. No one knows how long it took to write the original code – it “evolved.” Maybe a month? Half a year?

Well, we know better this time, so it has to take less time. We do? Why? Okay, how much less? Well … this code is so bad, it hurts so much … it has to go! … it’s embarrassing to even talk about this … how much is management willing to … and you’re doomed. Instead of giving an honest estimate, you try to find a number that will green-light the attempt. Or you give an honest estimate and management will (correctly) say “No.”

Challenges

You will face many challenges. I’ve listed suggestions how to handle them.

My boss/client won’t let me

Argue that you need time to clean your work area, just like a carpenter needs to sweep the chips from the the floor between projects. When too much dirt accumulates, you can’t work quickly or safely. Which means new features will either be more expensive or they will have more bugs.

We don’t have time for this!

One picture says more then a thousand words in this case: https://hakanforss.wordpress.com/2014/03/10/are-you-too-busy-to-improve/

It’s so bad, we can’t fix individual parts of it!

Well, let me know how it went.

For everyone else: This is in production. So it can’t be that bad. As in it’s not killing you right now. It’s just very painful and risky to make changes there. Despite how bad the whole is, a lot of thought and effort went into the individual changes. Often more than elsewhere because extra care was taken since this was a dangerous area. This also means that it would be a terrible waste to throw everything away just because it looks a like huge reeking dump of garbage from a distance. You know how they fix oil spills? The put a barrier up and then, it’s one dirty bird / beach at a time.

So look at the messy code. Try to see what you can salvage today. Keep all the good stuff that you can reuse. Clean it. Keep chipping away at the huge pile. Move carefully so it can’t come crashing down. As your knowledge grows, the remaining work shrinks. Eventually, you will be able to replace whole code paths. And one day, guaranteed, the huge pile will become a molehill that you can either stomp into the ground with the heel of your boot or … ignore.

While I’m at it, let me just fix this as well!

You will often feel the urge to go on cleaning after you started. Just one more warning. Oh, and I can extract this, now! And I know how to write five more unit tests.

Set a time limit and learn to stick to it. If you have more ideas how to improve things, write them down. A comment in the code works well since someone else might pick it up. If you clean code for three days, other people won’t praise you. Imagine it the other way around: There are so many important things to do right now and your colleague just spent three days cleaning up compiler warnings?

Also, remember the 80:20 rule: Most clean ups will only take a bit of time. As soon as you get in the “hard to fix” area, you’re spending more and more effort. Eventually, the clean up will cost more than you’ll ever benefit from it. Keeping it time boxed will prevent you from falling into this trap.

I don’t have time to write a unit test

Come back when you have. Adding tests is an important part of the work. It’s like a carpenter sweeping the chips under a rug. You need this test. Because …

Writing the unit test takes ages

Excellent! You have found a way to measure whether you’re doing it right or wrong. If writing the new unit test is hard, there is a problem that you don’t understand yet. The code you extracted has turned out to be much more dangerous than you thought. Great! Close your eyes and focus on that feeling. Learn to recognize it as early as possible. This emotion will become one of the most valuable tools in your career. Whenever you feel it, stop immediately. Get up. Get a coffee. Stare at the wall. Ask yourself “Why am I feeling this? Which mistake am I about to make?”

Now let’s look at reasons why the unit test is so hard to write.

The unit test needs a lot of setup

This indicates that you have an integration test, not a unit test. You probably failed to locate what I called “least messy” above. Document your findings and revert. Try to find a part to extract that has fewer dependencies.

The unit test needs complicated data structures

Looks like you need to improve the design of the data model. Check how you can make the different data classes more independent of each other. For example, if you want to write tests for the address of an invoice, you shouldn’t need order items. If improving your data model will make it more efficient to write the tests, stop the tidying here and clean the data model instead.

Option #2: Consider creating a test fixture with test data builders for your model classes. The builders should produce standard test cases. In your tests, you create the builder, then modify just the fields that your test needs and call build() to get a valid instance of your complex model.

Writing the unit test fails for another reason

Write a comment in a text editor what you tried and why it failed. Include all useful information including class names and stack traces. Revert your changes. Commit the comment.

You really can’t achieve much more here. Stop for now, do a feature, and resume tidying tomorrow. If you have an idea then how to improve this code: Do it. If not, tidy up elsewhere.

I can’t find anything to extract

Try to extract fewer lines of code. Sometimes, extracting a single line into a method with a good name helps tremendously understanding complex code. This is counter intuitive: How can turning one line into four make the code easier to understand? Because how the brain works: You brain doesn’t read characters, it looks for indentation. Reading a good method name is faster and more efficient than running a 80 character expression in your head.

Next, sort each code line into “fetching data from somewhere”, “transforming the data” and “loading the data into something”. For methods that mix two or three of those, try to split the method into two or three methods or classes where each does just one thing.

Conclusion

The net effect of the above is that software developers tend to underestimate rewrites. The result: the rewrite costs much more than expected. Management is unhappy: “just can’t trust the estimates of developers”. Developers are unhappy: “management will not allow us to do this again” and “I put so much effort into this and everyone hates me for it”. The customer is very, very unhappy (“I paid how much to get what I already have??? And what about … ?? You postponed it? I needed that this week! Do you have any idea how much revenue I lost …”).

So the only solution which will work most of the time:

  • Cut the huge rewrite into small, manageable parts.
  • Each part should slightly improve the situation.
  • Add at least one unit test to each improved part.
  • Spend a small amount of your weekly work time on tidying.
  • Merge quickly.
  • Start today.

See Also


Chained Unit Tests – CUT

29. March, 2023

The CUT approach allows to test logically related parts or to gradually replace integration tests with pure unit tests.

Let’s start with the usual app: There is a backend server with data and a frontend application. Logically speaking, those are connected but the backend is using a Java and the frontend uses TypeScript. At first glance, the only way to test this is to

  1. Set up a database with test data.
  2. Start a backend server.
  3. Configure the backend to talk to the database.
  4. Start the frontend.
  5. Configure the frontend to talk to the test backend.
  6. Write some code which executes an operation in the frontend to test the whole.

There are several problems with this:

  • If the operation changes the database, you sometimes have to undo this before you can run the next test. The usual example is a test which checks the rendering of a table of users and another test which creates a new user.
  • The test executes millions of lines of code. That means a lot of causes for failures which are totally unrelated to the test. The tests are flaky.
  • If something goes wrong, you need to analyze what happened. Unlike with unit tests, the problem can be in many places. This takes much more time than just checking the ~ 20 lines executed by a standard unit test.
  • It’s quite a lot of effort to make sure you can render the table of users.
  • It’s very slow.
  • Some unrelated changes can break these tests since they need the whole application.
  • Plus several more but we have enough for the moment.

CUT is an approach that can help here.

Step 1: Rendering in the Frontend

Locate the code which renders the table. Ideally, it should look like this:

  1. Fetch list of elements from backend using REST
  2. Render each element

Change this code in such a way that the fetching is done independent of the rendering. So if you have:

renderUsers() {
    const items = fetchUsers();
    return items.map((it) => renderUser(it));
}

replace that with this:

renderUsers() {
    const items = fetchUsers();
    return renderUserItems(items);
}
renderUserItems(items) {
     return items.map((it) => renderUser(it));
}

At first glance, this doesn’t look like an improvement. We have one more method. The key here is that you can now call the render method without fetching data via REST. Next:

  1. Start the test system.
  2. Use your browser to connect to the test system.
  3. Open the network tab.
  4. Open the users table in your browser.
  5. Copy the response of fetchUsers() into a JSON file.
  6. Write a test that loads the JSON and which calls renderUserItems().

This now gives you a unit test which works even without a running backend.

We have managed to cut the dependency between frontend and backend for this test. But soon, the test will give us a false result: The test database will change and the frontend test will run with outdated input.

Step 2: Keeping the test data up-to-date

We could use the steps above to update the test data every time the test database changes. But a) that would be boring, b) we might forget it, c) we might overlook that a change affects the test data, d) it’s tedious, repetitive manual work. Let’s automate this.

  1. Find the code which produces the JSON that fetchUsers() asks for.
  2. Write a unit test that connects to the test database, calls the code and compares the result with the JSON file in the frontend project.

This means we now have a test which fails when the JSON changes. So in theory, we can notice when we have to update the JSON file. There are some things that are not perfect, though:

  • If the test fails, you have to replace the content of the JSON file manually.
  • It needs a running test database.
  • The test needs to be able to find the JSON file which means it must know the path to the frontend project.

Step 2 a: Update the JSON file

There are several solutions to this:

  • Use an assertion that your IDE recognizes and which shows a diff when the test fails. That way, you can open the diff, check the changes, copy the new output, open the JSON file, paste the new content. A bit tedious but if you use keyboard shortcuts, it’s just a few key presses and it’s always the same procedure.
  • Add a flag (command line argument, System property, environment variable) which tells the test to overwrite the JSON when the test fails (or always, if you don’t care about wear&tear of your hardware). Since all your source code is under version control, you can check see the diff there and commit or revert.
    • Optional: If the file doesn’t exist, create it. This is a bit dangerous but very valuable when you have a REST endpoint with many parameters and you need lots of JSON files. That way, the first version gets created for you and you can always use the diff/copy/paste pattern.

You probably have concerns that mistakes could slip through when people mindlessly update the JSON without checking all the changes, especially when there are a lot.

In my experience, this doesn’t matter. For one, it will rarely happen.

If you have code reviews, then it should be caught there.

Next, you have the old version under version control, so you can always go back and fix the issue. Fixing it will be easy because you now have a unit test that shows you exactly what happens when you change the code.

Remember: Perfection is a vision, not a goal.

Step 2 b: Cut away the test database

Approaches to achieve this from cheapest to most expensive:

  • Fill the test database from CSV files. Try to load the CSV in your test instead of connecting to a real database.
  • Use an in-memory database for the test. Use the same scripts to set up the in-memory database as the real test database. Try to load only the data that you need.
    • If the two databases have slightly different syntax, load the production script and then patch the differences in the test to make the same script work for both.
  • Have a unit test that can create the whole test database. The test should verify the contents and dump the database in a form which can be loaded by the in-memory database.
  • Use a Docker image for the test database. The test can then run the image and destroy the container afterwards.

Step 2 c: Project organization

To make sure the backend tests can find the frontend files, you have many options:

  • Use a monorepo.
  • Make sure everyone checks out the two projects in the same folder and using the same names. Then, you can just go one up from the project root to find the other project.
  • Use an environment variable, System property or config file to specify the path. In the last case, make sure the name of the config file contains the username (Java: System property user.name) so every developer can have their own copy.

What else can you do?

There are several more things that you can add as needed:

  • Change fetchUsers() so you can get the URL it will try to fetch from. Put the URL into a JSON file. Load the JSON in the backend and make sure there is a REST endpoint which can handle this URL. That way, you can test the request and make sure the fetching code in the frontend keeps working.
  • If you do this for every REST endpoint, you can compare the list from the tests against the list of actual endpoints. That way, you can delete unused endpoints or find out which ones don’t have a test, yet.
  • You can create several URLs with different parameters to make sure the fetching code works in every case.

Conclusion

The CUT approach allows you to replace complex, slow and flaky integration tests with fast and stable unit tests. At first, it will feel weird to modify files of another project from a unit test or even trying to connect the two projects.

But there are several advantages which aren’t obvious:

  1. You now have test data for the default case. You can create more test cases by copying parts of the JSON, for example. This means you no longer have to keep all edge cases in your test database.
  2. This approach works without understanding what the code does and how it works. It’s purely mechanical. So it’s a great way to start writing tests for an unknown project.
  3. This can be added to existing projects with only small code changes. This is especially important when the code base has few or no tests since every change might break something.
  4. This is a cheap way to create test data for complex cases, for example by loading the JSON and then duplicating the rows to to trigger paging in the UI rendering. Or you can duplicate the rows and the randomize some fields to get more reasonable test data. Or you can replace some values to test cases like very long user names.
  5. It gives you a basis for real unit tests in the frontend. Just identify the different cases in the JSON and pick one example for each case. For example, if you have normal and admin users and they look different, then you need two tests. If there is special handling when the name is missing, add one more test for that. Either get the backend to create the fragments of the JSON for you or load the original JSON and then filter it. Make sure you fail the test when expected item is no longer in the list.
  6. The first test will be somewhat expensive to set up. But after that, it will be cheap to add more tests, for example for validation and error handling, empty results, etc.

Why chained unit test? Because they connect different things in a standard way like the links of a chain.

From a wider perspective, they allow to verify that two things will work together. We use the same approach routinely when we expect the compiler to verify that methods which we call exist and that the parameters are correct. CUT allows to do the same for other things:

  • Code and end user documentation.
  • Code and formulas in Excel files.
  • Validation code which should work exactly the same in frontend and backend.

When to put generated code under version control

29. June, 2022

Many people think that when a computer generates code, there is no point to put it under version control. In a nutshell: If you generate the code once with a tool that you’re confident with, there is no point to put under version control. If you need to tweak a lot, version control will make your life so much easier.

Decision tree:

  • Do you need to tweak the options the code generator until everything works? If so, then yes.
  • How confident are you with using the code generator? If not very, then yes.
  • Is the code generator mature? Then not.

Some background: Let’s compare a home-grown code generator which is still evolving with, say, the Java Compiler (which generates byte code). The latter is developed by experienced people, tested by big companies and used by thousands of people every day. If there is a problem, it was already fixed. The output is stable, well understood and on the low end of the “surprise” scale. It has only a few options that you can tweak and most of them, you’ll never even need to know about. No need to put that under version control.

The home grown thing is much more messy. New, big features are added all the time. Stuff that worked yesterday breaks today. No one has time for writing proper tests. In this kind of situation, you will often need to compare today’s output with a “known good state”. There is a dozen of roughly understood config options for many things that might make sense if you were insane. Putting the generated code under version control in this situation is a must have since it will make your life easier.

The next level is that the code generator itself is mature bit it offers a ton of config options. Hypothetically, you will know the correct ones to use before you use the generator for the first and only time. Stop laughing. In practice, your understanding of config options will evolve. As you encounter bugs and solutions, you will need to know what else a config change breaks. Make your life easy and use version control: Config change, regenerate, look at diff, try again.

In a similar fashion, learning to use the code generator in an efficient and useful way will take time. You will make mistakes and learn from them. That won’t stop a co-worker from making the same mistakes or other ones. Everyone in the team has to learn to use the tool. Version control will prevent from one person breaking things for others.

How

Write a parameterized unit test which generates the code in a temporary folder. In the end, each file should be a single test which compares the freshly generated version with the one in the source tree.

Add one test at the end which checks that the list of files in both folders is the same (to catch newly generated files and files which have to be deleted).

Add a command line option which overwrites the source files with the ones created by the test. That way, you can both catch unexpected changes in your CI builds and efficiently update thousands of files when you want.

The logic in the test should be:

expected = content freshly generated file
actual = content  of the file in the source tree 
      or just the file name if the file doesn't exist (makes it
      easier to find the file when the test itself is broken).

if expected != actual, then
    if (overwrite) then copy expected to actual
    assert expected == actual

Use a version of the assert that shows a diff in your IDE. That way, you can open the file in your IDE and use copy&paste out of the diff window to fix small changes to get a feeling how they work.

Or you can edit the sources until they look the way they should and then tweak config options until the tests confirm that the code generator now produces the exact desired result.

Bonus: You can tweak the generated code in your unit test. It’s as simple as applying patches in the “read content of the freshly generated file” step. One way you can use this is to fix all the IDE warnings in the generated code to get a clean workplace. But you can also patch any bugs that the code generator guys don’t want to fix.

Workaround

If you don’t want to put all generated code under version control, you can create a spike project to explore all the important features. In this spike, you create an example for every feature you need and put the output under version control. That way, you don’t have to put millions of lines under version control.

The drawback is that you need a team of disciplined individuals who stick to the plan. In most teams, this kind of discipline is shot in the back by the daily business. If you find yourself in a mess after a few weeks: Put everything under version control. It’s a bit of wasted disk space. Say, $10 per month. If you have to discuss this with the team for more than five minutes, the discussion was already much more expensive.


Another Reason to Avoid Comments

16. April, 2022

There is this old saying that if you feel you have to write a comment to explain what your code does, you should rather improve your code.

In the talk above, I heard another one:

A common fallacy is to assume authors of incomprehensible code will somehow be able to express themselves lucidly and clearly in comments.
– Kevlin Henney


What are Software Developers Doing All Day?

6. November, 2021

Translate.

Mathematics? Nope. I use the trigonometric functions like sin(x) to draw nice graphics in my spare time but I never use them at work. I used logarithmic last year to round a number but that’s about it. Add, multiply, subtract and divide is all the math that I ever do and most of that is “x = x + 1”. If I have to do statistics, I use a library. No need to find the maximum of a list of values myself.

So what do we do? Really?

We translate mumble into Programming Language of the Year(TM).

Or more diplomatic: We try to translate the raw and unpolished rambling of clients into the strict and unforgiving rules of a programming language.

We’re translators. Like those people who translate between human languages. We know all the little tricks how to express ourselves and what you can and can’t easily express. After a while, we can distinguish between badly written code and the other kind, just like an experienced journalist.


When stupidity meets cleverness

31. October, 2021

we get a potency of impossibilities.


Good and Bad Tests

16. January, 2017

How do you distinguish good from bad tests in your code?

Check these criteria. Good tests

  • Nail down expectations
  • Monitor assumptions
  • Help to locate the cause of a failure
  • Document usage patterns
  • Allow to change code
  • Allow to verify changes
  • Are short (LOC + time)

Bad tests

  • Waste development time
  • Execute many, many lines of code
  • Prevent code changes
  • Need more time to write than the code they test
  • Need a lot of code to set up
  • Take ages to execute
  • Are hard to run

Expectations

There are a lot of checks in your compiler. Those help to catch mistakes you make. Do the same with your tests. There are a lot of things that compilers don’t check: File encodings, existence of files, existence of config options, types of config options.

Use tests to nail down your expectations. Read config files and validate the odd option.

Create a test which collects the whole configuration of your program and checks it against a known state. Check that each config option is set only once (or at least that it has the same value in all places).

When you need to translate your texts, add tests which make sure that you have all the texts that you need, that texts are unique, etc.

Assumptions

Convention over configuration only works when everyone agrees what the convention is. Conventions are assumptions. Your brain has to know them since they are no longer in the code. If this approach fails for you, write a test that validates your assumptions.

Check that code throws the exceptions that you expect.

If you have found a bug in a framework and added a workaround, add a test which fails when the bug is fixed. Add a comment “If this test fails, you can remove the workaround.”

Speed

The world speeds up. No one can afford slow tests, tests that are hard to understand, hard to maintain, tests that get in the way of Get-Things-Done™. Make sure you can run your tests at the touch of a button. Make sure they never fail unless they should. Make sure they fail when they should. Make sure they are small (= execute fast, few lines to understand, little code to write, easy to change, cheap to delete). Ten tests, each asserting a single fact, are better than one test that asserts ten facts. If your tests run for more than ten seconds, you lose.

Documentation

There is code rot. But long before that, there is documentation rot. Who has time to update the comments and documentation after a code change?

Why not document code usage in tests? Tests tell you the Truth™. Why give someone 100 pages of words they can’t trust when you can give them 100 unit tests they can execute?

Conclusion

Make your life easier. Stop wasting time in your debugger, begging for production log files, running code in your head. Write a good test today, it will watch your back for as long as the project lives. Write a thousand good tests and they will be like an army of angels, warding you from suffering, every time you press that button.


Testing Fonts for Software Developers

11. September, 2015

Characters that you need to be able to distinguish clearly:

0O – Zero and upper case o
l1I – Lower case L, one, upper case i
Z2 – Upper case z and two
S5 – Upper case s and five
G6 – Upper case g and six
B8 – Upper case b and eight
71 – seven and one
lI – Lower case L, upper case i
vy – lower case v and lower case y

Just copy and paste into your favorite code editor.

Fonts you should try:


TNBT: Proactive IDEs

13. February, 2015

Imagine this situation: You’re working on some code and you get an exception when you run the unit tests. Next to the output is a link with the text: “User Joe had the same exception two months ago and fixed it with the commit b8cfda02.”

How would that work? We’re using big data for all kinds of things, tracking customer happiness, searching the Internet and discovering terrorist threats (or not).

Standard development teams have about 10 people. That means you have a super computer with 40-80 cores, 160 GB of RAM and 20 TB of disk space connected with a fast LAN in your office already. That beast is usually idling while it waits for the developers to press keys. It would be pretty simple to install a clustered log analyzer on this hardware which simply reads all the log files and reports which Maven and running JUnit test creates. It would be as simple to connect the same database to your version control. That means this system could track all the errors and exceptions that you get when you run unit tests or the whole application.

This information could then be used to detect when someone in the team gets a new exception plus the change sets which fixes them. If the system detects an exception which it has seen before, it can tell you which developer has fixed it or who is currently working on it – instead of wasting your time, you could see the code which contains the solution or ask someone who has already solved the problem.

With proper filtering, the data could be split into internal and framework code. That way, the system could report to library projects where consumers struggle most.

On the large scale of things, this system can tell you which parts of the system are most brittle.

As usual with big data, there are some downsides. The same system would tell you which developer breaks the code most often. Who writes the worst code. If your manager isn’t able to see the human value in his charges, this might not be your best bet.

Related Articles:

  • The Next Best Thing – Series in my blog where I dream about the future of software development

Jazoon 2013 – 33 things you want to do better

25. October, 2013

Jazoon 2013 badgeTom Bujok listed a lot of methods, technologies and frameworks that you should be aware about in his talk “33 things you want to do better” (slides on speakerdeck)

At the beginning he reminded us how quickly a well designed system goes bad due to hurried changes. We need to be aware of our technical debt and we need to allocate time to spend on reducing it (slides 3-12).

As an example, car batteries are easy to find. They are a replacement part, designers and engineers make it easy to find. Compare this to the configuration of your project. If you need to change it, how easy is it to find the file that needs to be changed and then the place in the file?

Another important point is skills. In most other professions, you have some mastery of a skill before you use it. You train hundreds of hours before you play your first football game. In Software, we show you a computer, we show you the programming language of the year (not necessarily this year’s). There is no time to master the tools you have to use from day one (slides 13-15).

“We are what we repeatedly do; excellence then, is not an act but a habit.” – Aristotle

Or as Wikipedia defines it:

“Habits […] are routines of behavior that are repeated regularly and tend to occur subconsciously.”

Stop wondering why you always make the same mistakes – they’re habits. Eliminate them ASAP (slide 19):

Bad Habits – Katherine Murdock “The Psychology of Habit”:

  • Recognize bad habits and eliminate them ASAP
  • The older you get the more difficult it is to remove a bad habit
  • Each repetition leaves its mark!

Turning bad habits into good ones – Dr. Michael Roussell, PhD.:

  • You can’t erase a habit, you can only overwrite one.
  • Insert the new habits into the current habit loops

Bad Habits

Configure your IDE properly and remove bad defaults. Replace “ex.printStackTrace();” with “throw new RuntimeException(ex.getMessage(), ex);” (slides 43-45).

One bad habit is empty catch blocks with “can never happen” comments. If you see one during a code review, replace it with “System.exit(-1);”. It can never happen, right? Right? (slides 46-47).

Note: I have create a “ShouldNotHappenException” for this case 🙂

Another one is to make every method in a static helper class public. Maybe some of them can be package private? (slide 48)

Learn about other good habits. Read books like “Effective Java” (Joshua Bloch) and “Clean Code” (Robert C. Martin) (slide 49)

Use code reviews to notice bad habits and to spread knowledge in your team but prevent blame games. (slide 73)

Learn the keyboard shortcuts of your IDE (slide 78)

Remember (slide 79):

“Any jackass can kick down a barn, but it takes a good carpenter to build one.” – Sam Ryburn

Projects You Should Know

Project lombok and lombok-pg – In a nutshell, these hook into the Java compiler and generate additional bytecode when certain annotations are present. Bored with getters, setters, hashCode() and equals() plus a nice toString()? Use @Data (slides 21-28).

Guava (slides 29-33)) is a great library with many tools that you have been missing in Java for years. You might also want to look at commons-lang.

Want to use lambda expressions but can’t upgrade to Java 8? Then lambdaj is for you (slides 34-39).

Logging? slf4j (40-42). Especially nice when combined with the @Slf4j annotation from lombok.

Bored to write all that boiler plate code to create all those services and managers that form your app? Look at Guice or Spring.

Use Spock to make tests more compact and easier to understand. (slides 50-52)

Unitils contains all the helper functions that we always missed in JUnit and Hamcrest (53-56).

JUnitParams will help you run tests with different parameters (57-59).

Need to wait for something during a test? Awaitility will help. (60-61)

When mocking isn’t enough and you need to inject code during a test, Byteman is the tool you want to look at (62-63)

Getting bored writing boiler plate code in Java to make a compiler happy? Have a look at Groovy. (64-67)

How about adding dependencies to your scripts? Try Grape. (68-69)

Is your build a mess? Do you feel Maven is too verbose or too limiting? Gradle might be for you. (70-72)

Version control is slowing you down? Have a look at Mercurial or Git (75)

Use bash or Python to automate man-/menial work. If you’re on Windows, look at Babun or Cygwin. (76-77).