I was asked to do a small talk about a refactoring process called “Branch by Abstraction” at work. Not knowing anything about this technique or how it related to the work I’m currently doing, I created the following notes. Some of the description and the images are taken from the Martin Fowler explanation available here and the “remix” of that article available here
What problem are we trying to solve with this technique? The scenario is that we want to undertake a large-scale refactoring on a piece of software. However, we still want to push out regular releases during the refactoring periods, so we can’t break the software whilst the refactoring is under way. At any point we want to be able to push out an new update with everything working.
One solution might be to use feature branching. In this case we’d create a new branch for the refactoring, and upon completion we’d merge the code back into the trunk. This might be a viable solution, or we might decide that we don’t want to keep changes on a long-running branch, or that the merging will be significantly painful and that we’re prefer to take an approach that avoided it. This might well be the decision if the refactoring is significantly large that it takes weeks or months to complete the change. The arguments for and against branching are too big and painful to get into here.
An alternative is to use “Branch by Abstraction”. The name is slightly misleading as no branching is involved. Instead this technique describes an alternative to branching where all development work continues to be done on the trunk, which is kept in a stable state throughout. Changes are instead performed by introducing abstraction.
What does this mean? Imagine this is the initial state of the system. We have several “clients” make use of the flawed “supplier” code:
The first thing we’d do is to create an abstraction layer around the code we want to refactor. This might be a simple interface which calls into the flawed suppler code. The clients are then migrated over to use the new abstraction layer. For the sake of a simple example, imagine it is trivial to move all the refactored code behind the abstraction layer:
At the point where nothing is reliant on it, the initial flawed supplier code can be deleted, along with the abstraction layer which has now fulfilled it’s purpose.
So the underlying idea is to create a single abstraction of the supplier, and then to create multiple implementations of that abstraction which can exist side-by-side. We can then migrate from the old implementation to the new implementation at our own speed, until it has been replaced completely.
Step-by-step, the process would be (copied directly from this article):
- Create an abstraction over the part of the system that you need to change.
- Refactor the rest of the system to use the abstraction layer.
- Create new classes in your new implementation, and have your abstraction layer delegate to the old or the new classes as required.
- Remove the old implementation.
- Rinse and repeat the previous two steps, shipping your system in the meantime if desired.
- Once the old implementation has been completely replaced, you can remove the abstraction layer if you like.
Code which is already follows decent SOLID principles, especially using dependency inversion and the interface segregation principles will be easier to refactor using these techniques, as the interface makes a natural place to introduce the abstraction layer.
In practice it may be too big a step to move every client behind the abstraction layer in one go. There is nothing to stop us picking one client which only makes use of a small portion of the supplier and making the change there first. This diagram shows an example of a possible first step when refactoring a more complex case:
Although not mentioned in the Martin Fowler article, it’s possible to also use feature toggles to switch between the two implementations, or to slowly roll out the new supplier to individual accounts.
What are the benefits of this approach? As mentioned, the goal is to keep the code deployable at any stage. The code “works” at all times, so only the team involved in the refactoring are affected by the change. It avoids merging. Confidence may be higher when committing micro-changes on a regular basis, as opposed to a larger single change at the end of a branch. Adding the abstraction may also help improve the modelling of the application in it’s own right.