Good vs Bad Legacy Code
Given a set of specs (specifications), an engineer usually tries to come up with the optimal architecture for those specs.
As we all know, specs tend to change. The true understanding of a problem usually comes after you’ve started working on it. Like Mike Tyson used to say, everyone has a plan, until they get punched in the face. So people facing the problem will come to the engineer with new, updated specs, over the course of the project.
When you design an architecture, you make assumptions, that you’re not always aware of entirely. And if I learned 1 thing over the years, it’s that when you get a new spec, it usually challenges a low-level assumption you made about the system, which means keeping the architecture “optimal” for the updated specs would require heavy refactoring: this is called the Murphy Law. To put it simpler, the unexpected often comes in the worst possible way.
So, inevitably, once you started implementing your architecture, as soon as a few specs are updated, from a pure theoretical point of view, a lot of the code you wrote qualify as “legacy”.
Problem is, you have deadlines, so you can’t spend your life refactoring everything to keep your architecture “clean and optimal” with respect to the new specs. So you make compromises. If you’re lucky, your company specification process is a 2-way process, where product managers define the underlying problems, and engineers propose solutions that take into account legacy stuff. A decision is then made collectively, when the trade-offs have been properly exposed.
Then, at some point, the constraints from legacy accumulate, and it becomes harder and harder to come up with quickly and cleanly implementable solutions that answer product managers problems.
This is when a serious, precisely targetted refactor should be considered.
To conclude, good legacy code is code that was written after a 2-way specification process, where business needs and legacy architecture were properly taken into account. Good legacy code is understood, and accounted for.
Bad legacy code is code written under tight deadlines with no possibility for engineers to be involved in the specification process. Those situations tend to yield quick and dirty undocumented hacks, and quickly turn the codebase into spaghetti code.
But even in a well-run company (with a 2-ways specification process), it can be hard to strike the right balance between accommodating legacy and adjusting the architecture to acknowledge new requirements and all their ramifications. It is not always very clear whether one has gone too far in accommodating legacy, and usually it all comes down to experience.
Software is nothing but a caricature of the ones who wrote it. It is thus no wonder that code quality is deeply correlated with the quality of the processes which led to its writing.
Originally published at fruty.io on July 30, 2018.