Monorepos - The Good, Bad, and Ugly
A monorepo is a single version control repository that holds all the code, configuration files, and components required for your project (including services like search) and it’s how most projects start. However, as a project grows, there is debate as to whether the project's code should be split into multiple repositories. In many cases, monorepos are still useful since they are very effective at managing projects with a lot of individual components. They also ensure that anyone working on a p
A monorepo is a single version control repository that holds all the code, configuration files, and components required for your project (including services like search) and it’s how most projects start. However, as a project grows, there is debate as to whether the project's code should be split into multiple repositories.
However, as project complexity grows, it can make sense to break things out into various repositories. The structure of your team, your future plans for the project, and even which version control system you're using are all factors that go into making this decision.
In this article, you'll look at when having a monorepo is useful and explore the good, the bad, and the ugly in relation to the impact monorepos have on engineering culture.
Monorepos: The Good
With a monorepo, you never have to wonder where the particular piece of code you're looking for is. It's always in the same repository. This may seem like a relatively small concern, but when rolling out what should be a relatively simple code change requires you to make pull requests against multiple repositories, this particular advantage of a monorepo becomes more important.
For this same reason, large-scale refactors and global refactoring are much easier. If your codebase is split into multiple repos, you may have to make numerous pull requests as part of one refactor. In addition, you will potentially have to time your release to make sure all these pull requests are merged and deployed at the same time.
However, if you use a monorepo, all your code changes can be part of one pull request and can be merged and deployed whenever the various infrastructure and support teams need the deployment to be ready.
If you use a CI/CD system, monorepos make working with these systems easier. With a monorepo, your CI/CD system can spin up the entire project to run tests or whatever else needs to be run.
On the other hand, if you split your project across multiple repositories, you have to take one of two approaches. Either each repository needs to be run separately through CI/CD (and hopefully doesn't have any components that depend on code from the other repos) or your CI/CD system needs to handle the extra complexity of pulling in each repository and installing them in some sort of environment so they can work together.
If your project has dependencies such as composer packages, node modules, or even third-party dependencies that need to be run, having all your code in a single repository can make installing and using these dependencies easier.
If your work is split across multiple repositories, you will need to ensure that each repository is updated every time one of your dependencies changes. In addition, you’ll need to ensure that you don't make any dependency-specific changes to one of the repos but not the others.
Since dependencies are already complex to manage, keeping that complexity low by keeping your code in a single repository is often a good compromise to make.
A Successful Monorepo Implementation
Monorepos are accused of not working well with large teams, however Google, Facebook, Microsoft, Uber, Airbnb, and Twitter all employ very large monorepos with varying strategies to scale build systems and version control software with a large volume of code and daily changes. Therefore, we can assume that the structure of the monorepo is what's important. If teams of hundreds or thousands of people can work successfully on monorepos, what makes a good monorepo structure?
Monorepos work when they are structured around the people working on them. When you structure around the code and best coding practices, things can go wrong. For example, if all the code that one person needs to edit as part of a change is grouped logically together into components or a similar structure, you’ll be much better off when it comes to avoiding merge conflicts and streamlining developer productivity. Ultimately this means enforcing separation of concerns and ensuring that code isn't duplicated across the codebase, which would result in the need for one developer to make changes in many places.
A project might be a good fit for a monorepo if concerns are well-separated already, and sub-projects are split into components that can be well-organized.
Monorepos: The Bad
Even though there are some advantages to monorepos, the drawbacks and limitations need to be considered prior to implementation. If your team hasn't worked with a monorepo before, or isn't familiar with some of the tooling that's been created to support monorepos, you might run into one of the following challenges.
Although CI/CD can be made more simple by using a monorepo, if your monorepo grows to a large size, CI/CD pipelines can take much longer to run. With slow pipelines, developer productivity can start to be affected and, depending on how your CI/CD provider bills you for their services, it can become expensive. If you're not prepared to spend at least some time optimizing your CI/CD pipeline as your project and your monorepo grows, you should strongly consider dividing up your project into multiple repositories.
If you are now building different pieces of your monorepo, the build becomes more complicated because you have to ensure that these sub-builds occur in the correct order, rather than just building the entire repository at once.
This concern is similar to pulling multiple repositories into one pipeline if you have your repositories split up. But having code split into multiple repositories allows you to pull those other repositories in like dependencies and manage them in a way that can be more clear, especially if the alternative is a very large monorepo.
If everyone is working in one repo, team communication becomes very important. Because it’s easier to refactor large pieces of the codebase with one single PR, these changes need to be communicated more clearly before they're rolled out. This ensures that there are no conflicts with what other team members are working on.
In addition, because one large repo can lead to larger pull requests, it's important to ensure that your team has the time dedicated to reviewing these large pull requests. You may want to require multiple individuals to review each set of changes. This wouldn't be as important if the changes were smaller and scoped more tightly, like in a single monorepo.
An Unsuccessful Monorepo Implementation
An unsuccessful monorepo implementation, essentially, is the opposite of the successful implementation you looked at above.
If code is tightly coupled together, making one change requires edits in many unrelated files. Or, if there are a lot of pieces of the codebase that only work exactly how they're built right now, keeping those pieces in separate repositories may be the best way to go. Trying to force a codebase like this into a single monorepo without refactoring can lead to many of the challenges listed above.
In addition, if you have microservices that are meant to be deployed independently with independent teams working on them, a monorepo may not be the best choice for these either. If teams are unlikely to cross the boundary from their service into another service and codebases truly are separated, multiple repos can give you gains in developer efficiency that you wouldn't get working in a single monorepo.
As the debate surrounding monorepos rages on, you can see that having one single repository for all your codebase can be a great asset to your team, but it doesn't come without some drawbacks.
If you're not careful, you can find yourself with merge conflicts, long-running CI/CD pipelines, and increasingly complicated build processes.
However, if you can enforce separation of concerns in your code and implement developer processes that make sure everyone's on the same page with what's being worked on and what's being deployed, having all your code in the same repository might be the efficient boost and workflow simplification you need.
(A big thank you to Keanan Koppenhaver for his contribution to this article)