Continuous Integration in a Huge Multi-Module Project

Posted on Wednesday, November 19, 2008


The benefits of Continuous Integration cannot be discounted in any environment whether you are working on a small project or a huge one. The difference being in the set up and configuration of continuous integration in a multi-module project.

Multiple modules bring in additional complexity in terms of versions, teams collaborating, release cycles, separate sprints etc etc. In an ideal world all this would not be necessary and we would have just one module for the entire system. With every developer check-in the CI build would trigger. Of course the CI box would have unlimited processing power, load distributing capability and unlimited storage to still build the system of any size under 10 minutes.

The magic number of 10 minutes is an XP practice for CI where it is correctly stated that if a build is taking more than 10 minutes then the value of CI is lost. If the time is more developers would check in less frequently because who wants to wait for say 2 hours to get the status, they would rather check-in their code end of the day and even worse not wait for the build to complete. A broken build, which would happen many times would be fixed the next day. Now just assume that the team is distributed across time zones and are struggling with a broken build when they step into the office.

Let us switch the real world and agree that having a huge system build under 10 minutes is a far fetched dream. If you are working in a huge enterprise, where various complex systems are collaborating together then you would agree to my point.

So the best way to build a huge system is by distributing it into multiple smaller systems or modules and then building each module separately so that each module can build under 10 minutes.

Taking a simplistic view of the real world let us assume that we successfully divided our system into 3 modules.

System = Ʃ modules (A,B,C)


Each module is developed and maintained by a separate scrum team, so we have 3 scrum teams.

The dependency between the modules is like this. Module A is independent. Module B depends on A and module C depends on A and B.


Each module has its own build cycle and frequency and since the System is divided into 3 modules, each module can be built under the 10 minute limit.

Staging Area

Once a module is built, it is deployed onto the repository area. This repository area is version-ed so all build of module A would be present like A1.0, A1.01-Snapshot, A1.02-Snapshot, A1.1-RC1, A1.1-RC2, A1.1 etc.

When A is built it does not have a dependency on B or C hence the build artifact of A is created and posted to the repository.


B depends on A. Whenever the continuous integration of B triggers it has a dependency on a particular version of A in the repository area. So essentially B always builds against a repository version of A. So say we are building a new version of B which is B1.01-RC1, now this version would be dependent on the version A1.1Rc1 as shown in the diagram.


There could be an issue where B has the dependency upon A1.1-RC1 where as the TeamA has already produced a new version A1.1-RC2. In this scenario either one has to manually check for a new version in the repository area or the build process compares the current dependency with the latest release and notifies back with a warning. A maven mojo makes this process easier.

Similarly C depends on A and B and follows the similar process of being dependent upon released versions of A and B in the repository. As a module has dependency on more number of modules the process of keeping upto date with the latest released versions becomes even more important.

Some would argue that this is a pseudo continuous integration because the entire System is not being continuously integrated. To an extent they are correct. The key thing to note here is that you have to walk a tight rope between achieving complete CI and fail to achieve the 10 minute build target or break it down into smaller modules and make it more manageable. A way to still achieve something closer to full continuous integration is to have an intelligent process running in the background which build all the latest versions present in the repository area and report issues if any. This would take care of the fact that the individual modules do not have their dependencies updated to reflect the latest versions in the repositories.

Apart from the above technical challenges there are other softer challenges which need to be tackled with module separated teams. Each of these teams is now an individual scrum team, there would be an additional cost of communication. Scrum of scrums would help but you have to accept the fact that there are multiple teams now and the us-them phenomenon would spring up however hard you may try. Each module should consider other modules on which it is dependent as third party libs. This would make the process easier. The handoff between the modules is through well defined and mutually agreed upon interfaces.