I have a love-hate relationship with open source. It’s definitely more love than hate.
I’ve been an advocate of open source for much of my career and even built a successful open source project many years ago (within the context of the ColdFusion community for which it was built anyway). I also depend on open source projects on a daily basis – many from large companies or organizations, but also many maintained by individuals or small groups of individuals, most of whom donate their time.
Anyway, as I said – it’s mostly love. So where’s the hate come from? Documentation. Or a lack thereof.
In this article I want to make the case that your undocumented open source code incurs a significant cost to the development community. In many cases, I’d argue, the drawbacks outweigh the benefits whereby your good intentions may actually have a net negative value to the community.
In order to make my case, let me first indulge in an analogy.
Author’s note: Some readers appear to have taken offense at the title which I intended humorously, not seriously. My goal here is to get you to seriously consider the impact documentation (or lack of it) can have on the community – not to get you to stop writing open source.
As a society, we generally agree that recycling is a good thing. Of course, there are disagreements over specifics, but, on the whole, people have come to believe in the value of recycling.
We try to recycle as much as we can nowadays – with many of us moving to single stream recycling. Single stream recycling ended the days where cardboard and aluminum would need to be separated into individual bins, replacing it with a single bin where you place every recyclable item – even materials that were previously not accepted.
The thing is, recycling of aluminum and cardboard was relatively cheap and easy. However, single stream recycling means that the waste management company has to sort all of the items in the recycling bin – separating the garbage from the recycling and also separating the different types of recyclable materials. As you can imagine, this is a complicated and expensive endeavor.
As a labor-intensive activity, recycling is an increasingly expensive way to produce materials that are less and less valuable.
Recyclers have tried to improve the economics by automating the sorting process, but they’ve been frustrated by politicians eager to increase recycling rates by adding new materials of little value. The more types of trash that are recycled, the more difficult it becomes to sort the valuable from the worthless.
In fact, one study noted that the dollar cost of single stream recycling was $250 per ton compared to $45 per ton for landfilling.
This is complicated by many people choosing to simply dump lots of unrecyclable garbage into the single stream recycling bin. For most people, this is done with good intentions. They figure that the more we recycle, the better, so why not err on the side of recycling if I’m unsure about a particular items recyclability.
My goal here isn’t to actually debate single stream recycling versus landfilling – I am definitely in favor of recycling.
However, I want to make the analogy that GitHub has effectively become the single stream recycling bin of the developer community. We’ve come to believe in the value of open source and become comfortable with the ease of pushing code to GitHub – so much so that many of us are now simply dumping all of our garbage into the bin. End users (i.e. other developers) end up becoming the waste management company forced to pick through the garbage to find the useful items – and, unfortunately, this also comes at a great cost.
The major issue isn’t usually the code itself but rather the fact that it is either completely undocumented, only partially documented or poorly documented. Much like sorting our recyclables, documentation requires additional effort that increases the barrier to open source, but it also reduces the cost to the end user developer.
Similar to the person who throws garbage in the recycle bin because they are unsure of its value as a recyclable, developers tend to share their code with altruistic motives. We’ve been led to believe that the act of sharing code, in and of itself, has intrinsic value – but it also has costs.
Let’s look at it this way. Imagine there are two types of developers who might use your project.
Depite the altruistic motives in sharing, the lack of good documentation meant that the project didn’t help person 1 at all. Meanwhile, it led person 2 down a rabbit hole of time spent digging through code to see if the project actually solves their problem – if it doesn’t, we’ve now completely wasted their time, but, even if it does, we’ve forced them to spend far more time than necessary getting there.
Even from a “selfish” perspective, not properly documenting projects is counterproductive and costly. A lack of documentation tends to indicate a project that is either “not ready for prime time” or at high risk of abandonment. Many developers (often myself included) will not be willing to tough it out or take the chance, meaning whatever effort you spent in sharing is wasted.
The ones that do take a chance can often end up frustrated, leading them to submit issues or post negatively about your project – this is clearly not the reaction you hoped for when you decided to share it in the first place. You can either end up battling issues that are caused by confusion, or abandoning the project out of your own frustration with the negative feedback.
Let’s take, for instance, the world of static site generators, as I have a particular interest and experience in this. Currently, there are 422 of them. I have personally used about 12 (including many of the most prominent ones) and have presented reviews of them at various events to help people decide which they should consider using.
In my reviews, my primary complaint is that the documentation ranges from nearly non-existent to poor on a large majority of the ones I have tested. Without naming names here, there are some that only include installation and configuration, but no usage documentation. There are others that have a getting started guide but nothing more – you’re apparently expected to read the code if you want anything beyond just very basics. In many cases, a quick search of their issues brings up numerous questions that are related to confusion over how a feature works or is intended to be used.
All in all, this creates a frustrating and often needlessly time consuming experience for the developer choosing to use the project (I know it did for me). Even the sheer number of options is costly by creating a choice overload that can even make the process of choosing the right option stressful and prone to failure.
The point is, many of these projects are costing the very developers they intend to help valuable time and needless frustration.
I should be clear that I am not blaming GitHub for this problem (though one could argue that they do encourage the behavior). I also do think that sharing your code is, in general, a noble effort. But we’ve perhaps glorified it to the point that we’re now a “share first, ask questions later” community.
Image: The Far Side by Gary Larson
Developers’ skill levels and proficiency are often judged by the sheer number of public projects they have. Beyond just the admiration of their peers, this can even impact their ability to land a job.
I work for a startup and we look at github before anything else. Basically we look for any ‘full’ projects, so someone at least knows how complex even a small side project can be.
braunshaver via Reddit
By looking at one’s Github repositories, you can almost immediately tell if he’s an expert or beginner of a specific field. Also number of repositories, frequency of contributions activities and maybe number of followers/followings can also reflect how passionate the owner is about programming.
This culture that pressures people to share more is creating a growing problem.
As of February 2016, GitHub reported over 12 million users and over 31 million repositories. If every user actually created a repository (which is highly unlikely), you’d still end up at close to 3 repositories per user. My guess is that a substantial majority of these are public, meaning that there are somewhere in the tens of millions of public repositories.
Recent surveys indicate that there are somewhere around 11 million professional developers in the world. Including non-professionals bumps the number to 18 million. Even if less than 60% of the total repositories are public, we’d still have a public repository for every professional or non-professional developer in the world. In my opinion, this indicates that we’ve clearly overshot the target on code sharing.
By taking the time to document a project before releasing it, you answer a number of important questions about your project – most importantly, why anyone outside of yourself would need it. What problem does it solve? Who is the target audience (ex. what skills are needed to use it)? And even, what do I hope to gain by sharing this and how committed am I to maintaining it for the community?
Look, I admit, I have been guilty of this too. But, together, perhaps we can clean up our mess. Here’s what I suggest as fixes.
(Feel free to share your own suggestions in the comments.)
So (as Mike Jang pointed out in his Fluent 2016 session) if your project is one of the many projects whose documentation literally says “read the code” or one of the millions of others that have little to no documentation – we appreciate your good intentions, but let’s fix this problem together.
Header image courtesy of Kristian Bjornard