If you use GitHub, you probably have “forked” a repo more times than you can count. It’s super common and the ideal way you can interact with a git repository you don’t control. But the idea of forking in open source can have another meaning, a far more interesting meaning. It’s when you take the open source project, and create a new open source project based on the first. It’s not as simple as clicking a button. The process is complicated and is a ton of corner cases.
When Sheogorath reached out to me and explained HedgeDoc has been forked 3 times, I knew it was a conversation that would be filled with lessons, warning, and just interesting events. We chatted about the story of HedgeDoc and all the things that come with forking an open source project. Everything from infrastructure, licensing, and even naming (the hardest problem of all).
This episode is also available as a podcast, search for “Open Source Security” on your favorite podcast player.
What is a fork?
Forking an open source project seems like it should be easy enough, the whole point of open source is to allow us to copy and reuse things. But in the context of open source projects, forking actually represents am event that impacts everything from infrastructure, to naming, licensing, and even security. The HedgeDoc project experienced this through its evolution from HackMD to CodiMD and finally to HedgeDoc.
The technical definition of a fork starts with creating a copy of a project. In HedgeDoc’s case, the project experienced several types of forks, each a little different than the one before.
A soft fork is what we call the forking and branching that happens during normal development workflows. A developer temporarily forks the codebase to add their changes, then eventually they submit their changes back. There are even reasons to have a more official temporary fork that will exist for a long time. We see this all the time when a linux distribution has a stable and testing version. Eventually the testing becomes the stable, then they keep developing on testing until the next merge happens.
The story of HedgeDoc, HackMD, and CodiMD is what we call a hard fork, where the codebases permanently diverge. A hard fork happens when there is no intention of working with the original project or sending code changes back. The idea is you are taking the code of an existing project and making a brand new project that is totally different. A hard fork is when you learn how much infrastructure exists in a properly running project.
The open source license maters
While you can’t just fork something and change the license to a proprietary one (it’s a super complicated matter actually, I’ll see about getting someone to talk about open source licensing in the future), one of the things HackMD CE did was to change the license from something permissive (MIT in this instance), to something much less permissive, AGPL. This was done by the project to prevent the non community version from using their source code without giving back to the project.
The initial MIT license of the project enabled HackMD.io to create a proprietary fork. The decision to transition to AGPL arose from the need to keep the community contributions in the community. There have been a number of forks recently that were the result of organizations changing their license to something permissive, to something commercial. A few recent examples have been Redis to Valkey, Elasticsearch to OpenSearch, Vault to OpenBao.
Every fork doesn’t happen because of a license change, but it seems like there a license change is drive many of the most recent forks.
Own your brand and infrastructure
One lesson Sheogorath brought to the discussion was to own your own infrastructure. And what we really mean by that is use domains you control. If you host your project on GitHub and you want to move to GitLab, it’s a lot less work if you don’t have a ton of GitHub URLs in your documentation (you would instead use URLs that point at your own domain).
This feels equally relevant outside of just source code and projects today. There are a lot of pushes to host content vis RSS and away from large social media sites. There’s something to be said about hosting content at a URL and service you can control.
Now, all that said, if you do fork a project and you have a new domain, you will have to re-brand everything in the code you just forked. This can mean new logos, new mascots, new names, new URLs, the list is huge. Getting out all the references to the old project is going to take a ton of work and a long time.
Security (of course)
And of course the conversation brought up security vulnerabilities. When you have a forked project, a security vulnerability can affect both the original project and the fork. It’ll be important to keep an eye on the other project to make sure no important security fixes get missed.
The two projects share a lot of code. It’s very likely that they will both have common bugs and security vulnerabilities, maybe for years or decades until the code might become more divergent. This will get even more wild as forks of projects tend not to get along all that well.
The lessons
The HedgeDoc experience seems to have discovered a lot of the challenges when managing open source forks. Repository namespace control, control over infrastructure, implementation of proper access controls, even where to host the code. It’s a lot of things you probably won’t ever worry about unless you’re forking a large project. I’m not sure there’s a single lesson to take away from this, it’s a lot of little things, but little things that only matter if you’re forking a project.
The hardest part of this discussion is every fork will be different, so the lessons Sheogorath brought us won’t be the lessons you learn if you manage a fork. It’s one of those things where the real lesson is there’s no lesson.
Wrapping up
It was a pretty fun conversation. I learned a lot, there are plenty of things that happen when running and forking an open source project I’m not aware of. I’ll probably never be aware of in fact. It’s nice to get to hear these things from someone who has lived through the challenges. Anyone looking to fork a large project can probably learn a thing or two, and if you do let me know, I’m sure you’ll have some new lessons for us all!