When to put generated code under version control

29. June, 2022

Many people think that when a computer generates code, there is no point to put it under version control. In a nutshell: If you generate the code once with a tool that you’re confident with, there is no point to put under version control. If you need to tweak a lot, version control will make your life so much easier.

Decision tree:

  • Do you need to tweak the options the code generator until everything works? If so, then yes.
  • How confident are you with using the code generator? If not very, then yes.
  • Is the code generator mature? Then not.

Some background: Let’s compare a home-grown code generator which is still evolving with, say, the Java Compiler (which generates byte code). The latter is developed by experienced people, tested by big companies and used by thousands of people every day. If there is a problem, it was already fixed. The output is stable, well understood and on the low end of the “surprise” scale. It has only a few options that you can tweak and most of them, you’ll never even need to know about. No need to put that under version control.

The home grown thing is much more messy. New, big features are added all the time. Stuff that worked yesterday breaks today. No one has time for writing proper tests. In this kind of situation, you will often need to compare today’s output with a “known good state”. There is a dozen of roughly understood config options for many things that might make sense if you were insane. Putting the generated code under version control in this situation is a must have since it will make your life easier.

The next level is that the code generator itself is mature bit it offers a ton of config options. Hypothetically, you will know the correct ones to use before you use the generator for the first and only time. Stop laughing. In practice, your understanding of config options will evolve. As you encounter bugs and solutions, you will need to know what else a config change breaks. Make your life easy and use version control: Config change, regenerate, look at diff, try again.

In a similar fashion, learning to use the code generator in an efficient and useful way will take time. You will make mistakes and learn from them. That won’t stop a co-worker from making the same mistakes or other ones. Everyone in the team has to learn to use the tool. Version control will prevent from one person breaking things for others.

How

Write a parameterized unit test which generates the code in a temporary folder. In the end, each file should be a single test which compares the freshly generated version with the one in the source tree.

Add one test at the end which checks that the list of files in both folders is the same (to catch newly generated files and files which have to be deleted).

Add a command line option which overwrites the source files with the ones created by the test. That way, you can both catch unexpected changes in your CI builds and efficiently update thousands of files when you want.

The logic in the test should be:

expected = content freshly generated file
actual = content  of the file in the source tree 
      or just the file name if the file doesn't exist (makes it
      easier to find the file when the test itself is broken).

if expected != actual, then
    if (overwrite) then copy expected to actual
    assert expected == actual

Use a version of the assert that shows a diff in your IDE. That way, you can open the file in your IDE and use copy&paste out of the diff window to fix small changes to get a feeling how they work.

Or you can edit the sources until they look the way they should and then tweak config options until the tests confirm that the code generator now produces the exact desired result.

Bonus: You can tweak the generated code in your unit test. It’s as simple as applying patches in the “read content of the freshly generated file” step. One way you can use this is to fix all the IDE warnings in the generated code to get a clean workplace. But you can also patch any bugs that the code generator guys don’t want to fix.

Workaround

If you don’t want to put all generated code under version control, you can create a spike project to explore all the important features. In this spike, you create an example for every feature you need and put the output under version control. That way, you don’t have to put millions of lines under version control.

The drawback is that you need a team of disciplined individuals who stick to the plan. In most teams, this kind of discipline is shot in the back by the daily business. If you find yourself in a mess after a few weeks: Put everything under version control. It’s a bit of wasted disk space. Say, $10. If you have talk about it with the team for more than five minutes, that’s already more expensive.


Children can become anything they want

26. June, 2022

The difference is that some people think “they” means “the children” while other people think it means themselves.


%d bloggers like this: