Remember when I wondered how slow it is to copy data around in memory?
This time, I needed to load a 2MB file from disk and examine it for certain patterns.
The original from jazzy spell checker worked like this: It did a binary search in the file for a code. The code would be at the start of each line and be separated from the correct word by an asterisk:
AT*wide AT*widow AT*width AT*wight AT*wit
The search algorithm would seek somewhere in the file, skip the current line, read the next one, compare the code against the one we look for. If our code was smaller, we would seek backward; if it was larger, we would seek forward. Standard binary search.
Only it didn’t work. The test took 4 seconds to load the file without finding anything. Debugging recursive algorithms isn’t as nice as looking at them …
So I considered a different approach: I would load the whole file, character by character, and remember the lines with the same code in an index. A good time to have a look at NIO, especially the grep example.
Question: If it takes 4 seconds to seek ten times in a RandomAccessFile and read about 2000 bytes from it, how long would it take to read it character by character, examine each line, build an index and then load whole chunks (say, 1KB each) from the file when a word needs to be looked up? Plus the additional work of removing the code pattern from the chunk to produce a list of words …
Answer: 0.3 seconds. That’s more than ten times faster. And I could probably optimize the code some more.
Conclusion: When it comes to performance, measurement beats superstition.
And the new code is easy to understand, and to test, too! ^_^