gitgithub

How can we handle private config files in public repositories?


I have a few projects on github that I release in such a way that it's ready to deployed by anyone. I have configuration files like:

my-project
├── config.yaml
├── dev.yaml

I would like to deploy from the same codebase except that I need to add a couple more files that should be private. The files would then look like this:

my-project
├── config.yaml
├── dev.yaml
├── confidential-stuff.yaml
├── more-confidential-stuff.yaml

This means that the files in the public project are a strict subset.

How can we handle such private configuration files? with a private fork? if so, how do you ensure they are in sync? My goal is to reduce duplication, i.e use only one codebase if at all possible.

Edit: I could put the files in .gitignore but they would no longer be under version control, hence not visible by the deployment pipeline.


Solution

  • My usual advice is that local (or private) configuration information should be kept out of source control using techniques like templates in place of real config files.

    What's different here is that you want to exclude the existence of entire config files, while still wanting the represented in source control.

    Your stated reason to have them in source control is visibility to the build pipeline. There are other ways to make a file available to your build pipeline, and if that's the only reason for source-controlling the files, then I recommend using one of those mechanisms instead. Details would depend on your build tools, but surely you can give the build process access to a shared folder and copy the files when they're needed.

    On the other hand, if you need to sync historical versions of the files, that's a more difficult problem. That is, maybe at release 5 the required contents changed, but you still need the build process to use the old contents if you check out / build version 4.

    In that case, you could consider creating a branch to hold your build versions (including the private config files). There are two good reasons not to do this:

    (!) As a general rule, having different branches to store different sets of content can lead to problems. People coming from tools like TFVC sometimes make a general practice of having "a branch for this project and another branch for that project", which leads to trouble. Or people want a branch to represent a subset of the overall content, but when they merge it back they're surprised because a bunch of their files get deleted from master. Etc...

    In this case, with the "odd" branch being a superset of the other content, many of the usual problems won't apply, but it's still not a practice to consider without a very solid reason.

    (2) You'd have to be careful to never leak info from the "private' branch; there are a lot of potential ways to make a mistake. Obviously don't merge from that branch to any other branch, etc.

    How to keep the branch private? Well, if the hosting software for the remote supports branch-level permissions, that might help. If not, you never push the private branch to the public remote; instead maybe you have a second remote for your build process.

    So the first time you're going to do a build, you check out the release version, create the "private build" branch, on the branch add the config files, and off you go. Then each subsequent release version, you merge the release to the "private build" branch, editing the private config files as needed (either during the merge, or in a commit right before the merge).