A bit of history…
“Konrad, we have to talk!”, the Programme Manager said rushing into my office. I don’t like it when Programme Managers move fast: it is usually a bad sign. “We have an issue with the release of the next module of our financial system. Its code refers to some core libraries which we have significantly modified since last release. Now those amendments need to be merged with the live version of the module. You will lead a team that will work on the integration task”.
I already knew about this issue, along with several others, because I’d been in the programme for more than nine months. I was keen to do the job as I had heard numerous complaints from other architects about the many technical aspects that had not been properly taken care of, or had been endlessly postponed because they’d been valued less than business features. It wasn’t divine intervention that led the customer to suddenly realise that those things mattered: it was simply that the deadline for the release was looming closer and the ground was starting to slide from under the customer’s feet. I set about building a list of the initial backlog of work in collaboration with my colleagues. It was only the tip of the iceberg and at that stage we had no idea of the extent of the ice lurking underneath. We had to:
- combine the changes in the database scheme and the data integration logic between modules, each of which had evolved this code in a different and mutually-exclusive way
- evolve the folder-structure of branches for a relatively huge codebase to a common format, so that the folders are named consistently and located in the same place of the directory tree
- upgrade third party libraries, the CMS framework and our core libraries in all modules to the same version
- write additional integration tests and, where these proved too costly, perform manual regression in order to confirm that the result of our merger was stable
- resolve deployment issues, determine where release steps had not been automated and then automate them
Furthermore, during the process, we discovered that our deployment scripts were not maintainable. When we tried to fix them, we entered a vicious cycle – each problem we uncovered led to another. At some point builds started to take more than forty minutes, which further aggravated our situation.
After a few months of unforeseen work at a frantic pace, we successfully delivered the project, thanks largely to a great team. The most important lesson to emerge from the history of that struggle emerged was that a vast amount of effort and problems can be avoided by having a proper branching strategy and by a continuous integration (CI) process. Here are ten pretty good practices which I found useful.
Pretty good Practices for Branching and Merging
Use the standard Source Control folder structure correctly
In the development environment I work in, SVN is very popular version control system (VCS). I noticed that several problems were caused by developers not understanding the standard folder structure and its purpose. We have three main folders:
- Trunk, the first one, is usually used in two ways: as a read-only reflection of ‘live’ or as an active development branch.
- Tagsis meant to be read-only; it is an immutable set of shortcuts to shipped releases and it should never be used for coding.
- Finally, the Branches folder should contain hotfixes, experimental branches and the intermediate results of parallel feature development.
Know the strategy used in your project
Communication will be clearer, and misunderstandings will be less frequent, if the team uses terminology and jargon correctly and consistently. It is not just the architect who is responsible for promoting this, but the entire delivery team. For example, we have three strategies which I believe every member of the development team should be aware of.
- “Branch by release”, sometimes called the “staircase model” is the oldest one where, for every planned release, we have a separate branch.
- “Branch by feature”, in turn, has distinct branches for different user stories (US). This occasionally comes in handy, such as when we have to do a challenging task such as a cross-cutting upgrade and we do not want to risk the stability of our code by mixing upgrade check-ins with other stuff.
- Lastly, we have the “branch by abstraction” approach, which has recently been given much more attention due to the “continuous delivery” (CD) buzz. This strategy is founded on the concept of a single branch for everything. Features that are new, but not yet finished, can be enabled or disabled by feature-toggles. We can use this device to release different versions of our application from a single branch, by changing its configuration. It is very convenient, because it frees us from the merging task, but it is not appropriate for every project.
Try to minimise the number of branches
No matter which strategy is in use, it is always beneficial to minimise the number of branches that are required. Fewer branches means less merges, reduced conflict and diminished misunderstanding within a team, in the same way that, the fewer the layers of the architecture, the lower the cost. In the extreme case, this might be reduced to a single branch with feature toggles. However, on larger code-bases, branching allows changes to be isolated, and so helps teams to avoid disturbing each other. Short-lived feature branches can also work well – especially when used for risky refactoring or experimentation. It is only when the branches remain in existence for too long that the likelihood of merge-conflicts mounts.
Predict release dependencies
When working with a few branches of code that reflect subsequent releases, it is good to be able to predict dependencies and make decisions that will make subsequent merges as easy as possible. There are questions that need to be answered as soon as possible. Which release will go first? What will be the direction of merges and their frequency? On which branch should a particular core component be implemented so that it is most useful and less risky for everybody? If left unanswered, these become so-called “last mile” delivery problems. We have to try to foresee potential issues and plan branches / merges accordingly.
Conduct regular merges
Deployments are still sporadic in the enterprise environment, even though start-ups and community projects are revolutionising delivery techniques. It will take time to change the release strategy for the majority of companies. As a consequence, merges might happen rarely and I’ve already alluded to the sort of problems that this leads to. Even if the release cycle is extended, it is much better to conduct the merges regularly; not necessarily automatically, because you then lose the chance to review the code and get familiar with changes of other teams. I recommend weekly merges as a good frequency to start with. Those merges should be performed by the team that is responsible for the target branch of the merge – they know best when to do it and how to resolve any conflicts.
Think about the impact of the choice of repository
Every VCS has its advantages and its drawbacks; these will affect the decisions you make about the merging process and the Source-Control strategy you choose. SVN or Git manage merging and branching are better than TFS. TFS’s Auto-merge feature is poorer than in competing products and, occasionally, the results are so bad that some of my colleagues have decided to not use this feature at all. Moreover, there is no possibility of synchronising the three branches – the tool will always detect some difference, even though it is only in the metadata. TFS tried to prevent the misuse of merging by limiting this process to branches that are in parent-child relationship. Although this initially seems like a safer option, in reality it is restrictive. TFS’s “Baseless merge” is a way to work around this, but it is an alternative that is available only from the command-line and so leaves users with the feeling that you need several “hacks” to work with the tool. My experience comes from work with the 2010 edition, so please bear in mind that this might already have been fixed.
Coordinate changes of shared components
Merges might be difficult; they may take days and require expertise in code. It is occasionally almost impossible, especially for binary files or XML, such as the one used by SSIS’s DTSX files. That is why we have to avoid them wherever possible by planning and coordinating the changes to elements of the application that pose a risk – especially shared elements such as the database or core libraries. Mitigation techniques can include:
- Brain-storming sessions to identify all possible implementation options
- Organising meetings to discuss the impact of changes
- Creating core teams that are responsible for shared components
At Objectivity, one of teams that faced a particularly complex challenge used a branch graph, hanging on the wall, to discuss and plan future releases. I really liked that practice.
Develop methods to simplify merges
Undoubtedly, there will be times when merging cannot be avoided. Certain methods may help to shorten the time before a merge, and minimise the number and scope of code-conflicts. It may help to structure the folders to isolate the code on which teams will simultaneously work.
When continuously writing database upgrade scripts, each identified by a unique version, you may come across script version conflict during merge. It will happen, for example, when teams working independently on separate branches have written two scripts, upgrading to the same DB version, which contain different database changes. With more file conflicts the situation is even worse as usually upgrades are not idempotent and they have to be run in a planned sequence, giving you no place in-between to insert conflicting scripts. To prevent this, simply reserve individual version ranges for the teams. Alternatively, use separate schemes or databases so that each team owns part of the DB. That helps to mitigate conflicts but is typically not sufficient in enterprise apps requiring data sharing.
When a system gets large, it is likely to need to be modularized; its components, whether shared or not, should be kept in separate branches, compiled into binary form (DLL or NuGet, for example) and used in dependent projects as external libraries. This way a component’s code is held in a single place; it does not require merging and the build is faster, as the components are compiled just once. If those methods fail or are too costly, the fall-back is the above-described change coordination.
Promote “CI for everybody”
Every active branch should have a separate CI process to ensure its quality and to make it easy to change and maintain. Without CI and automated tests, the system becomes harder to modify due to the prolonged feedback loop. If branches survive too long without a merge, then the merge-related bugs are discovered too late, when some of the detail about the changes within the branch has been forgotten. This rule is very intuitive, yet not always followed.
Include merging in estimation
Good agile teams use “definition of done” (DoD) to decide when a particular feature can be considered as “done”, so that there are fewer misunderstandings within a team or between a team and customer about the progress. I found it useful to consider whether merging activity should be included into DoD: sometimes it is irrelevant or obvious but at times it counts a lot.