Rich Dammkoehler's Mini Blog: July 2017

31 July 2017

It's not going to happen

Sometimes we need to realize that 'It's not going to happen'.

There are times when we've committed to a solution or a timeline that we need to realize for any number of reasons, it's not going to happen. We need to recognize when it isn't going to happen and adjust our plans.

A key component of that recognition is communication. If we don't communicate to others that something isn't going to happen they cannot adjust their plans. This can throw a project team into chaos.

Sometimes, it just isn't going to happen.

28 July 2017

Side Effects

Side Effects are an often over looked consideration when coding. A side-effect occurs when a function modifies the state of something outside of its own scope. That is, it changes the value of a global, static, or argument. Functions should ideally not do this.

However, here is a conundrum, how do you modify the state of they system otherwise? For example, if your code is updating the value of a transfer object, how do change its value since it is technically outside of your scope?

One technique is to clone the input and modify that input. This prevents the side-effect of having modified the object passed in. In stead you are creating a new object as a copy of the object received and modifying it. If you do this consistently throughout the system there will be no side-effects.

An approach like that has some powerful consequences. For one, in a multithreaded system there is no risk of two threads making concurrent modifications. Thats typically a good thing. Another benefit is that you don't have to worry about a function changing the state of its arguments, so as a caller you can rely on the state of the arguments being unchanged. This also allows you to make all of your objects immutable. Immutability providing you the assurance that an specific instances value/state has not changed.

In the end, this will force you to be very concise in your method declarations however. Typically we don't return complicated structures, if we generally call a function we expect only one value/object back from it. This is also a pretty good design approach.

26 July 2017

It's too Big

An ongoing topic is the size of a class, function, or test. I think for each of us there is a size limit of some sort, but it varies. What I am clear on is that too big is too big.

For me, functions with more than 2-3 logical branches are too big.

Classes that are more than about 75 lines are too big.

Tests that are more than 10 lines are too big.

The size of things is a feedback mechanism we often don't pay close enough attention to. When we feel that pressure its time to split things up.

24 July 2017

Shadows

We've all seen code where the variable names shadow something, that is the local variables have the same name as a class member or keyword. Python and Ruby both allow this, but you shouldn't do it.

Shadows create confusion for the reader. One recent example was a bit of code where several functions arguments shadowed the members of the class. It required the reader to slow down and think about which value was actually being used. Worse, in conversation it had to be discussed more than once which value was being used. Even the original author had trouble keeping track. Ultimately we renamed the function parameters and discovered we could eliminate most of the duplication and just use the class member.

In all, the shadows probably doubled the time it took us to complete the work just because of the confusion they caused.

21 July 2017

Hobgoblins

Back in college there were a couple of people who liked to remind the world that 'Consistency is the hobgoblin of little minds'. They were forgetting an important word in that quote, foolish. The original quote says 'A foolish consistency is the hobgoblin of little minds' -- Emerson

I think Emerson was after a slightly different point, thinking about thinking and not elaborating on foolish v wise. However, there is something here that I find very interesting.

What we do

As developers we spend a lot of time trying to be consistent. We name things using patterns. We use Patterns in our code to make things similar. We define standards and enforce them. And we have static analysis tools to make sure we follow 'the rules'.

What we fail at

What we seem to fail to do is reconsider the 'Why' behind our consistency. In my opinion, at least in this case, we can distinguish foolish from wise by making this evaluation.

My Best Example

I grew up in a Dijkstra philosophy zone. I will attribute some (or all) of my belief system to him, though he hasn't been my only influence. From this basis I used to believe that a function should only ever have one return point, period. As a result I spent a lot of time writing my code with single return points and spending a lot of time 'fixing' other peoples code to ensure that it had only one return point. I did this for the first 18 years of my career, to the point that I frequently ranted at other developers for failing to do this and extolling the virtues of the single return point.

In roughly 2010 my team challenged my single return point thinking. In a fit of self examination I realized that while the single return point thinking makes sense in some cases, it isn't necessarily universally true. For example, in a three line function its not likely I'll loose track of the return points. Another example might be a matcher that checks 20-30 fields for equality; it could exit on the first violation just as easily as setting a local variable and exiting at the end.

I still adhere to the single return point thinking for the most part as part of my personal discipline; not always, but usually. But I've given up the ranting.

Consider this...

Almost everything we do should occasionally be reconsidered from the perspective of 'Why'. One great way of finding these things is to pair with a more junior developer and try to explain to them the 'Why' of every decision we make. Obviously it would be inefficient to discuss absolutely everything, but if we throw those discussions into our conversation we can discover things we do that aren't necessarily good, just 'the way we've always done it'. It can also help us discover false premisses or better yet, discover new and better ways to do things.

19 July 2017

Scheduling Debt Payments

Over time all active projects accumulate debt to some degree. When development is most active the debt typically stays pretty low (unless you create the debt intentionally). Once a product is released to production however and maybe you aren't paying attention to its internals as much, it can start to fall behind the innovation curve. As you develop better solutions to common problems, faster techniques for building, or better ways to skin some cat or another, your 'old code' falls behind.

It isn't reasonable to keep updating every single repository, although it would be great if you could. But some changes are necessary. That is, some innovations might require you to update every instance in existence. For example, a shared database API, or a build tool.

So how do you get that done?

Well, one simple thing you can do is schedule it. For example, declare that Friday afternoon is the time the whole team (or teams) stop and update all these things. If you aren't innovating that fast, do it every two weeks or a month. That is one effective an simple non-technological way to get it done.

A better solution might be to create hard API boundaries on things and update packages. If you follow semver strictly this should be workable and give you a more automatic means of delivery. But this isn't always reasonable, it might be harder to make a package than it would be to make a change.

Lastly, automation. The change is something like 'update the pinned dependency in the requirements.prod.txt file of all repositories', you should script that. Make it automatic and easy. Include in that all the necessary SCM changes. And if possible find a way to do mass PR approvals. This is especially true if the change is something you will do frequently.

17 July 2017

Keeping up with the Joneses

Over the past two years of development we've evolved a rather extensive code base with dozens of independent libraries using dozens of tools, modules, and plugins. It is pretty impressive and constantly evolving and getting better.

One thing I have observed though is that we struggle to keep things updated. We have template repositories we can use to kick start projects. What we seem to fail to do is keep those templates up to date. That is, we come up with an innovation in one repository and fail to move it back into the template. I'm seeing the same thing with our more generic code as well.

I'm not complaining, its a lot of code in a lot of places and it can be difficult to keep everything up to date. That said, everyone (my current team and the rest of the world) needs to remain diligent about these kinds of updates.

Not keeping these things up-to-date causes at least two issues. The first and most obvious one is that the benefits of the innovation don't propagate to others if they aren't shared. The second, more subtle one, is that they cause confusion. You find your self asking, why has this project done X when the template implies Y, X seems better than Y, so is the template stale? or is there a problem with X? Which one should I use in my project Z?

So, keeping up with the Joneses becomes pretty important as your systems grow.

14 July 2017

Deprecation

I learned this word in 1996 when Java introduced the @deprecation annotation. I looked it up. 'Pray for the removal of evil'. I think deprecation is a great way to communicate that something is no longer good enough. Though I continue to see this one issue. We deprecate things, but they never truly go away.

In my current work we have been working diligently to consolidate a number of common objects and tools into various shared repositories in order to reduce duplication and maintenance. Thats typically a good thing.

Over the past couple of days however I've discovered a flaw in our plan. We have enough things happening that we can't get all changes applied everywhere, its just not reasonable. So as we improve things and move things around we need a way to communicate that something has been improved. I have not found an official @deprecated tag in Python or Ruby although there are a number of packages available for creating such a tag.

Our mistake has been not using anything to indicate what has been deprecated. We're going to have to change that.

12 July 2017

Timing

In every project, timing is important. It is impossible to manage concurrent streams of work efficiently if you don't have the timing down. This is why predictability is a necessity for large projects. If two teams are working on related parts of a system, you cannot predict when the integration can happen if you cannot predict when each module is completed. That is an important consideration on a large multi-module code base.

10 July 2017

My Favorite IDE

Clearly it would be emacs.

When it isn't emacs its vim.

Be reasonable?

OK, I use PyCharm and RubyMine a lot these days. I think its because I've gotten lazy, or maybe tired. IDE's are great, they have refactoring tools, they can find things like method usage quickly for you. If you have a huge project they are invaluable.

The Complaint

IDEs are too slow. I haven't found a way to total it up yet but I spend a lot of my day waiting for things. Waiting for the IDE to start, waiting for the IDE to index the project, waiting for the IDE to load dependencies.

Solutions

I don't have any really great solutions but here is what I've been doing. First, I offloaded as much as I could to other machines. I have four at the moment, but I really use three of them. I use my main machine for my IDE. Most everything else is running on another machine. That is, slack, Messages, email, everything I don't have to have right in front of me. So most of the time I have an IDE or two, Chrome, and several services like RabbitMQ and Postgres running. This works OK, but it isn't the best solution.

Next, I leave my IDE open all the time. I practically never shut it down. That gets me past start up time. PyCharm starts and opens a repo in about 15 seconds on my workstation. Not bad. But if I have to keep reopening things, that chews up several minutes of my day. So I trade RAM for speed. Once a repo is open, I don't close it.

As a consequence I have to keep my browser tabs to a minimum. Right now I have 30 open tabs in Chrome. Some days I'll hit 100, but I notice that chews up all my memory -- forcing me to close tabs or IDEs. Tabs always loose. I've started using a Tab Suspender plugin to calm Chrome down.

I also use tmux to minimize the number of terminal windows I have open. I don't think I really save that much memory by doing this but it does cut down on clutter.

What the world needs...

First off, Mac Book Pro's with 32Gb of RAM. Second, blazingly fast drives and CPUs. Third, the thing we can control a bit more, really fast IDEs. And lastly, core tools that are quick.

Things like PIP and gem are slow. About one third of build time is spent on dependencies (when building clean).

At least another third are spent on static analysis tools like pylint, pep8, bandit, etc.

As a community we need to find a way to make these tools faster and easier to use.

07 July 2017

Tools should be fast

So a little while back I was talking about the speed of your test suite. That is really important. But your tools should be fast too. I recall a presentation at Codemash by Gary Bernhardt where he talked about tool speed and his dev environment. Then today I recall waiting several times for tools. Starting up an IDE, or opening a new project, and it takes 20 seconds or more.

How much time do we waste each day starting up our environments, or waiting for a CI job because the server is roughly the equivalent of a toaster?

Too much time is wasted waiting on tools!

I think it is really important to make sure we have the most powerful tools and equipment at our disposal. I typically buy the fastest machine I possibly can (although I'm still waiting for a real improvement from Apple). Still, don't go cheap on the hardware. To follow that up though, the tools need to be fast.

How fast?

You should be able to open a repo in 10 seconds. You should be able to start a Docker container in under 10 seconds. A VM should take less than 10 seconds to start. It should all take less than 10 seconds. And we should be able to do all these things simultaneously in 10 seconds or less.

Why?

Well for one, we're simply wasting time. But second, it is too easy to get distracted we are waiting for things to happen. I don't know how many times I've gotten bored waiting for an IDE to start and gone off to read my email or check a server status while I wait. That sounds like I'm being efficient, but really I'm being distracted.

So, don't go cheap on the hardware and look for tools that are fast. If your tools aren't fast, complain, loudly, to the manufacturer.

05 July 2017

Static Analysis is a waste of time

What I mean is, each time you run static analysis it wastes time. We need a proactive approach to a great many things that static analysis does for us.

IDEs

IDEs should format and correct (automatically) the vast majority of our mistakes. Eclipse used to do a pretty good job of this. All IDEs should do a great job of this. Most don't.

Coding Mistakes

Most of us make them. Wouldn't it be great if our IDEs ran analysis on the code in the background and told us when we'd make a mistake? Some tools do this (often by virtue of a plugin) but often they are hard to configure, harder to use, and slow down the IDE. We need a better way. Something in the background that doesn't slow the IDE down too much and gives us a clean and simple report of the issues. I think we can do better.

Cyclomatic Complexity and the TCB

Issues like Cyclomatic Complexity and the Try-Catch-Bury should get flagged immediately. These are easy to spot and our IDEs should just throw up a dialog or a fly-over telling us that we're bad programmers right there on the spot.

Why...

All of this static analysis is good stuff. We need it to help us make sure we're building our software well. But the amount of time we waste adding new lines to files or spaces in expressions or extra parens (adding and removing) is distracting us from getting things done. So much of what we do could be automated away and made easier. We'd get a lot of our time back so we could think about real issues.

03 July 2017

A single expression

While I'm on the topic of code and code clarity, I should mention logical expressions. I often see complicated expressions with multiple variables and'd or or'd together. This isn't very helpful or clear. Each of these should be extracted to a method with a clear and simple name. You want to do this because it brings clarity to the code. It also results in several small, re-useable methods.

You should apply this technique everywhere, not just IF statements, but in loops as well. Ideally, each of these extracted methods has only one logical test within it. Also, the logic should be as simple as possible. Avoid negatives and especially double negatives. Each condition should be as simple and clear as possible so that the reader doesn't need to put forth much effort.