Your Open Source Project is Considered Harmful

I have a love-hate relationship with open source. It’s definitely more love than hate.

I’ve been an advocate of open source for much of my career and even built a successful open source project many years ago (within the context of the ColdFusion community for which it was built anyway). I also depend on open source projects on a daily basis – many from large companies or organizations, but also many maintained by individuals or small groups of individuals, most of whom donate their time.

Anyway, as I said – it’s mostly love. So where’s the hate come from? Documentation. Or a lack thereof.

In this article I want to make the case that your undocumented open source code incurs a significant cost to the development community. In many cases, I’d argue, the drawbacks outweigh the benefits whereby your good intentions may actually have a net negative value to the community.

In order to make my case, let me first indulge in an analogy.

Author’s note: Some readers appear to have taken offense at the title which I intended humorously, not seriously. My goal here is to get you to seriously consider the impact documentation (or lack of it) can have on the community – not to get you to stop writing open source.

The Cost of Recycling

As a society, we generally agree that recycling is a good thing. Of course, there are disagreements over specifics, but, on the whole, people have come to believe in the value of recycling.

We try to recycle as much as we can nowadays – with many of us moving to single stream recycling. Single stream recycling ended the days where cardboard and aluminum would need to be separated into individual bins, replacing it with a single bin where you place every recyclable item – even materials that were previously not accepted.

The thing is, recycling of aluminum and cardboard was relatively cheap and easy. However, single stream recycling means that the waste management company has to sort all of the items in the recycling bin – separating the garbage from the recycling and also separating the different types of recyclable materials. As you can imagine, this is a complicated and expensive endeavor.

As a labor-intensive activity, recycling is an increasingly expensive way to produce materials that are less and less valuable.

Recyclers have tried to improve the economics by automating the sorting process, but they’ve been frustrated by politicians eager to increase recycling rates by adding new materials of little value. The more types of trash that are recycled, the more difficult it becomes to sort the valuable from the worthless.

The Reign of Recycling

In fact, one study noted that the dollar cost of single stream recycling was $250 per ton compared to $45 per ton for landfilling.

This is complicated by many people choosing to simply dump lots of unrecyclable garbage into the single stream recycling bin. For most people, this is done with good intentions. They figure that the more we recycle, the better, so why not err on the side of recycling if I’m unsure about a particular items recyclability.

Great. But What Does This Have to Do with Open Source?

My goal here isn’t to actually debate single stream recycling versus landfilling – I am definitely in favor of recycling.

However, I want to make the analogy that GitHub has effectively become the single stream recycling bin of the developer community. We’ve come to believe in the value of open source and become comfortable with the ease of pushing code to GitHub – so much so that many of us are now simply dumping all of our garbage into the bin. End users (i.e. other developers) end up becoming the waste management company forced to pick through the garbage to find the useful items – and, unfortunately, this also comes at a great cost.

The major issue isn’t usually the code itself but rather the fact that it is either completely undocumented, only partially documented or poorly documented. Much like sorting our recyclables, documentation requires additional effort that increases the barrier to open source, but it also reduces the cost to the end user developer.

The Cost of Undocumented Code

Similar to the person who throws garbage in the recycle bin because they are unsure of its value as a recyclable, developers tend to share their code with altruistic motives. We’ve been led to believe that the act of sharing code, in and of itself, has intrinsic value – but it also has costs.

Let’s look at it this way. Imagine there are two types of developers who might use your project.

  1. The developer who will just leave because of the lack of documentation.
  2. The developer who is willing try to figure it out.

Depite the altruistic motives in sharing, the lack of good documentation meant that the project didn’t help person 1 at all. Meanwhile, it led person 2 down a rabbit hole of time spent digging through code to see if the project actually solves their problem – if it doesn’t, we’ve now completely wasted their time, but, even if it does, we’ve forced them to spend far more time than necessary getting there.

Even from a “selfish” perspective, not properly documenting projects is counterproductive and costly. A lack of documentation tends to indicate a project that is either “not ready for prime time” or at high risk of abandonment. Many developers (often myself included) will not be willing to tough it out or take the chance, meaning whatever effort you spent in sharing is wasted.

The ones that do take a chance can often end up frustrated, leading them to submit issues or post negatively about your project – this is clearly not the reaction you hoped for when you decided to share it in the first place. You can either end up battling issues that are caused by confusion, or abandoning the project out of your own frustration with the negative feedback.

For Example

Let’s take, for instance, the world of static site generators, as I have a particular interest and experience in this. Currently, there are 422 of them. I have personally used about 12 (including many of the most prominent ones) and have presented reviews of them at various events to help people decide which they should consider using.

In my reviews, my primary complaint is that the documentation ranges from nearly non-existent to poor on a large majority of the ones I have tested. Without naming names here, there are some that only include installation and configuration, but no usage documentation. There are others that have a getting started guide but nothing more – you’re apparently expected to read the code if you want anything beyond just very basics. In many cases, a quick search of their issues brings up numerous questions that are related to confusion over how a feature works or is intended to be used.

All in all, this creates a frustrating and often needlessly time consuming experience for the developer choosing to use the project (I know it did for me). Even the sheer number of options is costly by creating a choice overload that can even make the process of choosing the right option stressful and prone to failure.

The point is, many of these projects are costing the very developers they intend to help valuable time and needless frustration.

We’ve Created This Culture

I should be clear that I am not blaming GitHub for this problem (though one could argue that they do encourage the behavior). I also do think that sharing your code is, in general, a noble effort. But we’ve perhaps glorified it to the point that we’re now a “share first, ask questions later” community.

The Far Side

Image: The Far Side by Gary Larson

Developers’ skill levels and proficiency are often judged by the sheer number of public projects they have. Beyond just the admiration of their peers, this can even impact their ability to land a job.

I work for a startup and we look at github before anything else. Basically we look for any ‘full’ projects, so someone at least knows how complex even a small side project can be.

braunshaver via Reddit

Or…

By looking at one’s Github repositories, you can almost immediately tell if he’s an expert or beginner of a specific field. Also number of repositories, frequency of contributions activities and maybe number of followers/followings can also reflect how passionate the owner is about programming.

How to Make Github as Your New Resume

This culture that pressures people to share more is creating a growing problem.

As of February 2016, GitHub reported over 12 million users and over 31 million repositories. If every user actually created a repository (which is highly unlikely), you’d still end up at close to 3 repositories per user. My guess is that a substantial majority of these are public, meaning that there are somewhere in the tens of millions of public repositories.

Recent surveys indicate that there are somewhere around 11 million professional developers in the world. Including non-professionals bumps the number to 18 million. Even if less than 60% of the total repositories are public, we’d still have a public repository for every professional or non-professional developer in the world. In my opinion, this indicates that we’ve clearly overshot the target on code sharing.

By taking the time to document a project before releasing it, you answer a number of important questions about your project – most importantly, why anyone outside of yourself would need it. What problem does it solve? Who is the target audience (ex. what skills are needed to use it)? And even, what do I hope to gain by sharing this and how committed am I to maintaining it for the community?

How Do We Fix It?

Look, I admit, I have been guilty of this too. But, together, perhaps we can clean up our mess. Here’s what I suggest as fixes.

  • It’s ok to keep personal projects local or private. If you can’t afford an account to make your projects private, there’s nothing wrong with using Git locally without pushing the code to GitHub. (Note: edited for clarity based upon comments)
  • Note public projects that aren’t intended or ready for public use. If you push a project that you don’t actually intend for people to use (for example, if this is the code for your blog on GitHub Pages or a personal project that you needed to host), note in the readme that, while it is open, it really isn’t intended for public consumption – so, use at your own risk. (Note: edited for clarity based upon comments)
  • Otherwise, make installation, getting started and usage documentation a minimal requirement for launch (until “launch”, you can note in the readme that the project is in development and not ready for public consumption – again, use at your own risk). All three types of documentation are minimally necessary. Too many projects leave off at the very basic getting started, or even just the installation and configuration. This almost invites user frustration because it welcomes them into the pool but fails to teach them to swim.
  • Lastly, as I said in my Ignite presentation in 2013, we as a community need to value contributions to documentation as much as we value contributions to code. Many projects like jQuery, AngularJS, Jekyll and even Telerik’s NativeScript include ways (and often guidelines) for contributing to documentation.

(Feel free to share your own suggestions in the comments.)

So (as Mike Jang pointed out in his Fluent 2016 session) if your project is one of the many projects whose documentation literally says “read the code” or one of the millions of others that have little to no documentation – we appreciate your good intentions, but let’s fix this problem together.

Header image courtesy of Kristian Bjornard

Comments

  • Pingback: “Your focus determines your links.” – Qui-Gon Jinn - Magnus Udbjørg()

  • arvash

    If you want good documentation, write some.

    • This is absolutely one of the *worst* comments I hear in Open Source circles. I do not mean to pick on you in any way, but this attitude is exactly one of the problems that plagues OS. If your *intent* is for people to use your project, simply putting it out there w/ litlte no documentation can cause more harm then good. Brian makes an excellent point about the second developer who has to dig into your project to see if it fits their needs – wasting time (possibly) because proper documentation wasn’t included.

      In the end, it comes down to your intent. If you want to release something that is helpful to other people, documentation, along with testing, actively responding to bug reports/PRs, is a *required* part of the process. Far too many people treat documentation as optional.

      As an aside, I’m not saying you are *entirely* wrong. If docs exist, but have issues, then yes, folks can actively help out to improve the docs. GitHub has actually made some great strides here in making that easier. But that assumes the docs are *actually* written in the first place!

      • arvash

        The problem is people using a project, but not contributing to documentation. I don’t use projects that don’t have documentation unless I absolutely must, and if I do I write a few sentences here and there. No documentation is an extremely clear communication of no intent for people to use except in emergency. It means it likely was not tested, was not reviewed, and was not thought through by more than one person. It’s the epitome of swim at your own risk. You cannot (and should not) expect a person working at no pay to work in any particular way, if you want it you must help.

        • remotesynth

          If only it were just the projects that were completely undocumented. For example, of the 12 or so static site generators I’ve used, all of them clearly intended to have some degree of public consumption. They also often had a home page and some install instructions in the readme (sometimes even a logo). It was from there that they all fell down completely.

          My point here isn’t to dictate how people should work, but to ask an open source developer, if you are creating a project that you hope people will use, why are you making it so difficult and frustrating for them to do so simply by not documenting. You put forth all this effort, why waste it by not documenting. Even from a self-centric point of view, this is counterproductive.

        • “The problem is people using a project, but not contributing to documentation. ” And again – this is the type of statement that turns people off from OS. “Oh, you had trouble using it? Your fault.” While there are *definitely* lazy users out there, and *definitely* people who won’t RTFM, going first to the “it is the user’s fault” or “they should write the docs themselves” is indicative of a project that I want no part of.

          (And again – the assumption here is that we are talking about a project that you want folks using. For personal stuff you don’t want people using, I still think the responsibility is on us to make it clear, and it is something I’m going to try to be better at.)

  • Brian, I’m definitely with you on this article, and I too have used GitHub for ‘personal projects’. I do not begrudge them charging for private repos, but I think most folks are simply going to go the public route. I’m going to take your advice and start making these “public, but for me only” repos more clear in a README file.

  • I agree that project owners should write some documentation before sharing a project, but I disagree personal or unfinished projects should not be published to Github. Github makes it infinitely easier to share project, and I use it as a form of code backup. Maybe this reflects on how I personally use Github but I don’t go searching around for projects on Github. If I am looking for code to do something, I look for recommendations from others that link to Github. So if I publish an unfinished project to Github, I don’t expect anyone to look at it since I haven’t promoted it in any way. Does that make sense?

    • remotesynth

      Yup. I have updated the wording to be more clear. I think those projects *can* be posted where it makes sense, but make it clear in the readme that this project is neither intended to be documented, maintained nor supported. It’s a simple sentence or two that can help the consumer’s choice more clear.

  • While I agree that documentation is super important, and many open source projects (I’m looking squarely at you, Ruby community!) seriously fall down in this regard, I think your prescription of making all personal projects private is totally the wrong way to fix the problem.

    I can’t tell you how many times I’ve come upon solutions to various problems from a Github code search – if those ‘personal projects’ or even gists, had been kept private, I’d never have found them.

    To me, intent is important. If your project is intended for wide use, documentation is totally critical. If I had a dollar for every time a Rubyist said “Use the source, Luke!” I’d be a wealthy man.

    • remotesynth

      You’re right. My wording was a bit more black and white than I intended. I do think you can post projects that aren’t generally intended to be documented, supported or maintained (like the personal projects you mention), just not it clearly in a disclaimer in the readme. Help the consumer make a better educated choice.

      Fwiw, I have updated the wording to hopefully make that clearer.

  • I have to admit, while I can certainly see where you’re coming from and have felt many of these pain points myself, I think the underlying premise of the article is flawed. There’s a presumption in the article that open source is by-and-large for the consumer. That “dumping your trash” is harmful because it makes life difficult for those developers who might stumble on it. I’d say that’s certainly *one type* of open source project, but there are many others.

    I’d also argue that GitHub does not have an explicit (or even implicit) understanding that code shared must be easily consumable any more than Twitter content or Facebook content must always be written for the reader. GitHub is a place for anyone to publish whatever they want for whatever purpose they want to. Perhaps the onus here should not be on the publishers to ensure the code is easy to consume, but on the consumers to ensure that what they’re consuming is appropriate for that use.

    When it comes down to it, my biggest issue is with this: “Keep personal projects local or private. If you can’t afford an account to make your projects private, there’s nothing wrong with using Git locally without pushing the code to GitHub.” Where does it say GitHub is only for stuff you want other people to use? When did the community decide this? Why is it the author’s responsibility to ensure what they post is suitable for you? What if I want to make use of issues, or collaborate with a small number of people? What if I want my dad to be able to see my stuff? What if I want to back my things up instead of keeping them local? That’s the beauty of a public platform – I can do what I want on it. It’s not *my job* to make sure *my stuff* works out well *for you*.

    Aside from the rant above, I’d also argue there absolutely is value in posting personal code that’s never intended for widespread use. On a number of occasions I’ve found myself wondering, “how did someone else use this class?” or “how did they implement this algorithm?” On many occasions I’ve been able to answer those questions, learn some things, and derive value out of “personal” code by using the GitHub code search. And that’s really the whole point. Making everything that can be public, public, has benefits for both producers and consumers of open source that outweigh “I only want to see mature, well-documented projects when I browse through GitHub.”

    All that said, I do think there are some valuable points for projects that do fit into the category of intended for widespread use. Making sure you have good documentation, properly setting expectations for the project and what it can/should do, and properly denoting where in the development stage something is are all very valuable. Perhaps the biggest thing GitHub can do to help here is introduce a formal mechanism by which a project’s purpose can be clearly communicated.

    • remotesynth

      The onus *is* currently on the consumer. I’m arguing that this is a waste of people’s time and energy.

      However, I have clarified my wording on the personal projects. I don’t intend to say that everyone should withhold their personal projects – just be a little thoughtful in the way you post them. If you don’t really intend to maintain them for public use, then make it clear with some sort of disclaimer in the readme. It’s a simple way to help the consumer know that you are not intending to necessarily document, maintain or support this project. I’d say, you’d even save yourself some potential grief from frustrated users.

      Using that guideline, you could post whatever you want, just help the consumer make an educated choice. I wasn’t arguing that there was only one type of project – in fact, my preferred solution would be some way within GitHub to note the status of a project that could communicate this to a user (even, perhaps, communicate that project is available but not maintained or even abandoned). This could be beneficial when filtering results. Alas, it does not exist.

  • Kevin Jones

    I largely disagree with this post, but understand where you are coming from, I think.

    First, this seems to single out open source projects, which I think is done because of how visible they are. I’ve used quite a number of commercial, supported, libraries that also had horrible documentation. Support tickets asking for clarification would sometimes take days to turn around. I know Telerik tries to stand apart from that, but in general I don’t find documentation any more lacking for open source than for closed and commercial. So I don’t see why open source is getting any particular focus as being poorly documented. We draw our conclusions from our experiences.

    The recycling analogy is flawed insomuch that recycling is mandatory in my county. I’m quite free to look at an open source project and walk away from it. I suppose I’ll just go build SOLR or ElasticSearch from scratch myself since I think their documentation is lacking but remain one of the best ways to use Lucene for search indexing.

    Somehow this post pivots into GitHub as a resume and keeping person things private is where it really fell off the rails for me. GitHub is not a depot for well-maintained open source projects. If you don’t like what I’m doing with my code, then don’t look at it. That’s it. Move on, nothing to see here. This sort of attitude where we expect open source to be pristine and polished results in people being shamed for open sourcing their project ( https://harthur.wordpress.com/2013/01/24/771/ ) and that is, as a community, where we have failed.

    • remotesynth

      I’d agree with you that many commercial products have poor documentation, though we usually vote with our wallet there – meaning that, unlike the open source code I discuss here, there’s disincentive to ignore documentation.

      I would say there is no such thing as a perfect analogy. That being said, recycling isn’t mandatory everywhere. Also, I do note that you can walk away, but even in walking away you’ve at least invested time in determining you should (and you could argue that you may be walking away from an ideal solution and may not even know it). My point is, if the goal of SOLR or ElasticSearch (to use your example) is to provide a useful tool for the community, the simple choice not to properly document may be what’s causing it to fail in that mission (and I am not picking on those project, I don’t know anything about their documentation specifically).

      My point on the GitHub as a resume portion was that we’ve created a share first mentality. I’m not saying you can’t post your small personal projects, but help with a little change to your readme that this is an experimental or personal project and not really intended for public consumption.

  • One can lobby for standards in the free range, frontier like world of open source, but expecting/setting standards is a fool’s errand. Open source, is open range software. Rider beware. There are no rules! Except that there are no rules. Open source is the wild west of software development. And I love it. Everyone is free to do it however they want. The result is that you typically get zero, or one or two amazing solutions among a mountain of horribly implement and poorly communicated solutions. I’m ok with the system. It works. Because what can happen is a free solution can surpass the quality of paid solutions. And here balance is forced, making paid software an option but not king.

    The issue that is actually at hand is developer empathy. A developer will either release something for themselves and like minded developers (i.e. read source), which is fine. Or, they will release something that is welcoming to any type of developer. Both have value and place in the open source world. Both are needed. In fact I love the fact that a sub-quality solution can beat out a higher quality solution due simply to developer empathy (i.e. document everything well and make it dead easy for anyone to get started e.g. jQuery). This balancing mechanism is ideal. We need both the bad and the good (and everything in between) in the system.

    • remotesynth

      I disagree. You live in Idaho, a place that was once the wild west. However, I assume that you do in fact have laws now. It’s no longer lawless because it turns out that lawlessness is a very inefficient way to build strong communities.

      Now, I’m not saying that there should be a “law” (a GitHub enforced one in this case) – we obviously cannot (and don’t really want to) enforce such a thing in this scenario. However, I’d argue tht community “guidelines” that encouraged sharing helped create this problem and they can help alleviate it (I am not naive enough to think we can eliminate it).

      I also don’t think explaining the intent of your project (even if that intent is that I don’t intend to maintain, support or document it) is a lot to ask.

      • Like I said, Lobbying. Yes. Community standards no. Recommendations, sure. Expectations, silly. Even today, standards are recommendations not expectations. Which many dislike, but like many things in this area, this is the best of the unideal choices.

        Even when it was the wild west we had laws. I was not suggesting complete lawlessness. We can’t steal in the OS world. Just like in the wild west, I was suggesting, one can *pretty much* do anything. With limits i.e. stealing and killing at will would still run you into the law. It needs to be as open as it can be, to be open source. IMO.

    • Mike Jang

      I think you’re correct in at least one of your points: good documentation is a form of developer (and deployer) empathy — and can help a project “beat out” a higher quality solution.

  • remotesynth

    A big thank you to everyone so far for their thoughtful comments – even when you largely disagree.

  • Pingback: Dew Drop – March 29, 2016 (#2218) | Morning Dew()

  • Loving this discussion because I am a Technical Writer (based in Northern Ireland). The thing is, this discussion goes on and on, yet we know from experience that Developers tend not to like documenting, they’ve almost never been trained as a Technical Writer, and they’re not usually allocated any time to write (if employed by someone else) or have no time (because they’re doing this as a side project, or trying to make a living). What’s the solution?

  • Pingback: Don't Panic Labs – DPL Reading List – April 15, 2016()