High performing, ambitious engineering teams always care about “culture”. Culture is the set of unspoken rules that defines the behavior of a team. It is not something that is written down, and is often only visible in the actions of your team members - especially when no one is watching.
You cannot describe your culture to another. You can only signal it.
One of the many outcomes of a net-positive engineering culture is software hygiene. How do you signal this to your team?
Let’s turn to a highly popular study called the Broken Windows Theory from 1982 to guide us. As quoted in the original article by the authors in The Atlantic -
Social psychologists and police officers tend to agree that if a window in a building is broken and is left unrepaired, all the rest of the windows will soon be broken. This is as true in nice neighborhoods as in rundown ones. Window-breaking does not necessarily occur on a large scale because some areas are inhabited by determined window-breakers whereas others are populated by window-lovers; rather, one un-repaired broken window is a signal that no one cares, and so breaking more windows costs nothing.
What could be a stronger signal than a long unaddressed but painfully visible problem with either your code base, your automation or your infrastructure?
Alerts that buzz everyday but provoke no response.
Bugs that recur every so often, but are worked around each time.
Automation that is flaky and rarely works reliably.
The next alert you add is highly likely to be poorly designed and noisy. Engineers stop worrying about bugs they ship because they can ingenuously invent workarounds. People stop using or adding to automation, returning to the comfort of their slow but reliable manual workflows.
New engineers join your team, and at first are repulsed by it all. Most however soon regress to the mean and leave their own novelly broken windows behind.
The culture of apathy spreads and before you know it, your street is full of broken windows.
What do you do?
As an individual contributor
Let’s introduce ourselves to another maxim - The Boy Scout Rule - loosely described as:
Always leave the campground cleaner than you found it.
(PS: I am certain Girl Guides do the same.)
Irrespective of who is responsible for the code or automation or tooling that you find yourself working with, make it just that tad bit better for the next person who will find themselves down here. If enough engineers follow this practice, you end up with a more hygienic environment for everyone to thrive in.
Let’s say, for example, you are deep in the bowels of a module. You notice exceptions that are swallowed without trace. The functions you are updating are littered with literals. What do you choose to do if you weren’t throwing a new exception and were only reusing the existing literals?
If you miss this opportunity to handle exceptions and log them with the right severity, there is a high likelihood more unhandled exceptions appear in this module over time. Similarly, literals will continue to show up in more places rather than being defined as constants. You can start to change the landscape with one small act of voluntary effort.
As a leader
Leaders often seek incentive models (blame the behavioral psychologists for this!), so let’s start with the obvious one - rewarding your Boy Scouts and Girl Guides and celebrating their campground makeovers.
If you are however seeking stronger signals, maybe you can start by abolishing the term “Technical Debt” on your team. After all, engineers coined this term to be able to explain the value of working on hygiene problems to the powers-that-be responsible for the financial well-being of the business.
When faced with a street full of broken windows, do you call this a crime scene, or a playground? On a team that severely lacks hygiene and has become comfortable hiding these problems behind the shield of “technical debt”, you call this malpractice.
As Kris Brandow rightly calls out in this Changelog podcast, updating your vocabulary to invoke “malpractice” when tests are left broken, alerts aren’t fixed and code is unreadable alters the conversation from “we are moving fast and acquiring tech debt, and so it’s OK” to “we are moving fast but performing malpractice, is that OK?”.
What better change can you drive as a leader than to challenge the rationalization of the breaking of a window?
Hat tips
My friends and colleagues for introducing me to these concepts - Vivek Kannan for the Broken Windows Theory and Geeth Alladi for the Boy Scouts Rule.
Hadn't heard of the "Broken Windows Theory" -- it's a fitting metaphor. Thanks for the thought-provoking post!