29 May 2018

Continuous Integration, its about time

My feed has been full of strange stories and interesting posts lately. This morning I found this conversation about Continuous Integration and it got me thinking about that practice. Specifically it got me thinking about that I was taught and how it has changed in practice. 

Fifteen years ago I stumbled upon the practice of build automation entirely by accident. I was running a small team of distributed developers and I didn't want to wait six weeks for them to get their development environments working. I wrote a massive Ant file that assembled all the tooling, configuration, and resources necessary to start development on day one. It took 90min to run the setup-all target. When it was done you'd have Ant, Eclipse, DB2, Oracle, WebSphere, all the code, everything built, and any-all tests executed. I event went so far as to build a server that, given and ant build script would run it for you (presuming it could get ahold of and configure the resources). I didn't know what I was doing, I'd never heard of cruise control, CI, or any of that. I was just solving a problem. 

As inelegant as my solution was, it worked. I was optimizing the initialization of work phase of my project. In retrospect, it was because I had no idea what we were going to do. However, I was looking at the south side of the north bound donkey. In typical fashion, once we started work, each of us had our 'area' of the application and we mostly didn't step on each other. Along the edges I managed to stay involved and smooth over the conflicts. Some pretty sever BDUF happened to minimize issues. In the end it all worked out.

Fast forward about a year, I got introduced to the world of Extreme Programming. I'd seen some books (I think I read the forward to XP Explained and discarded it by this point) and I learned about this strange practice called continuous integration. It made total sense. 

The way I heard it was, integrate everything as often as possible and test it to make sure it works before you do anything else. This lead me to development practices like pulling code 3x a day and integrating to what I was working on...no matter what. So every morning I'd synch with source control (CVS and Subversion back then) and resolve all my conflicts. Then run everything and make sure it still worked. I'd do the same after lunch, and once more before I knocked off. I wasn't even paying attention to the CI server -- I couldn't even tell you now what server we had, though I suspect it was either Hudson or Cruise Control. In any case, I did this pretty consistently and you know what, it didn't slow me down a bit. In fact, I'm pretty sure it saved me a lot of pain in the long run.

I read the aforementioned article this morning and was reminded of this. In that article there is an example of a merge conflict (kudos to author at Apimhub for taking on that topic, painful to describe cleanly) and propose at least one strategy of dealing with the issue -- a meeting. Back in the early oughts that would have been my answer too. However I don't think I've had one of those kinds of discussions in nearly a decade. Here's why.

First, you should be rebasing or merging any branch, if you use such a thing, very frequently. Depending on the rate of change in your repo, 2-3x a day. If you are doing that, you'll likely see the intersection well before you've gone down the rabbit hole into integration hell.

Second, the integrator takes responsibility for the integration. So, whomever is doing their local merge or rebase takes responsibility for making it all work out. That doesn't mean don't ask for help, just that the only one hung up should be the receiver of a pull from master. I know this sounds cruel, but if you have 10 developers banging away on different parts of the system you can't reasonably stop all of them when there is an issue. 

This last bit has some interesting consequences. One is, it should cause everyone to make smallish commits and smallish changes. That might not happen until someone gets bit by the mother of all refactors (which coincidentally isn't a smallish thing) but once it does people will adapt to this mode PDQ. It will, likely, change the modularization of the system (for the better). In order to avoid the conflicts developers will start isolating their components from the other components of the system so that the integration point is very clearly defined and as small as possible. This will minimize friction when trying to integrate. (I recognize that there can be some negatives too, but that will need to wait for another discussion).

The third thing that prevents the BDUF conversations and coordination is community. We should not develop in isolation if at all possible. That is, we should be in more or less constant contact with the other developers on our team. That is what Slack, Gmail, etc. is for. Or if we're all in the same room, that rarely used voice thing we have. Every day we have standup, everyday we talk about what each of us has and will do. We should all have some general sense of where those things intersect. If we don't, we should step back and think about that too (maybe another discussion here?). Also, generally, we sort of know in most cases who's turf is who's. Joe is the 'server guy', Terri is the 'UI dev', and Sara is omnipresent. If we have concerns about the intersection of what we are doing we should proactively discuss them with the team. 

Anyway, back to the general Continuous Integration discussion.

Overall Continuous Integration is about time. Its very much about saving you time, but its also about where you spend your time. The way I heard it back in the day was, rather then spend 6mo at the end of a project integrating everything, spread that time out across the project time line, doing a little integration at once and you will, overall, reduce the total time integrating AND you (probably) won't fail to integrate. 

As I've described it, maybe you think CI will increase the amount of time you spend doing development. It might. But you aren't the only one we are worried about. We are worried about the overall viability of the system in production and it sustainability within the organization. 

person/pair might spend say 10% more time working on any given task as a result of the repeated rebase/merge step, but if that translates into nearly zero tail-end effort to merge to master and a coherent design in the end it was well worth it. Waiting to do the integration at the end of a long branch lifecycle will just move the integration time to one place and likely cause others to get drawn into the time. So one person/pairs 10% becomes two person/pairs 10%. Worse, if, at the time of integration a significant issue arrises with the design, in the name of expedience and getting things done, some design tragedy might occur, leaving us with code that works, but is ugly, hard to understand, etc. That could lead to maintenance concerns later that are costly. 

So, consider that Continuous Integration isn't about Servers, or even about building artifacts or build automation even. Continuous Integration is about saving you time in the long run, its about coherent design, and product viability. CI Servers, artifacts, etc. are all great things, but that isn't what the practice is really about. Our goal as always is to develop the best solution to a problem in a manner that is sustainable by the organization. CI is just one small but important part of reaching that goal.

I'll save feature toggles and other integration strategies for another discussion. I hope this post helps you in some way.