reactjsgitgithubgit-submodulesglitch-framework

How do I commit/push my build folder into another git repository and not into the main repository?


So, I've recently made a React app which I have posted on GitHub. However, I would like to post the output (build folder after I run npm run build) to a Glitch application. Since all Glitch applications have a git repository, I thought that would be the best way to go about doing this. Here is my desired structure:

I've seen people using submodules, but I can't figure out how to make my main git repo ignore the build folder and have the submodule just push the build folder.

I'm also confused on how to setup a submodule in general, so an example/explanation for that as well would be appreciated.

~ Ayush


Solution

  • I'm not entirely sure that you want a submodule here, but submodules will let you do what you are describing. Submodules are tricky, though. There's a reason people call them sob-modules. 😀

    Long

    First, it will help a great deal if you get your definitions—actors and actions—straight:

    The act of adding a new commit to a branch consists of the following steps:

    1. You check out that commit (with git checkout or git switch) by checking out that branch (with the same command), so that this is now the current branch. This action fills in both Git's index—which holds your proposed next commit—and your working tree, where Git copies out all the files into a usable form. The internal, de-duplicated form is generally unusable to everything except Git itself.

    2. You do some stuff in your working tree. Git has zero control or influence over this part, a lot of the time, since you'll be using your own editor or compiler or whatever. You can use Git commands here and then Git will be able to see what you did, but mostly, Git doesn't have to care, because we move on to step 3:

    3. You run git add. This instructs Git to take a look at the updated working tree files. Git will copy these updated files back into Git's index (aka the staging area), in their updated form, re-compressing and de-duplicating them and generally making them ready for the next commit.

    4. You run git commit. This packages up new metadata—your name, the current date and time, a log message, and so on—and adds the current commit's hash ID to make up the metadata for the new commit. The new commit's parent will thus be the current commit. Git then snapshots everything in the index at this time (which is why git checkout filled it in, in step 1, and then git add updated it in step 3), along with the metadata, to make the new commit. This gives the new commit its new hash ID, which is actually just a cryptographic checksum of the entire data set here.

      It's at this point that the magic happens: git commit writes the new commit's hash ID into the current branch name. So now, the last commit on the branch is your new commit. This is how a branch grows, one commit at a time. No existing commit changes—none can change—but the new commit points back to what was the last commit, and is now the second-to-last commit. The branch name moves.

    You really need to have all of these down pretty cold to make submodules work, because submodules actually use all of this stuff, but then violate some rules. Now it starts to get tricky. We also need to look more closely at git push, just for a moment.

    git push: cross-connecting one Git repository with another

    Making a new Git commit, in some Git repository, just makes a new snapshot-plus-metadata. The next trick is to get that commit into some other Git repository.

    If we start with two otherwise-identical Git repositories, each has some set of commits and some branch names identifying the same last commit:

    ... <-F <-G <-H   <--branch-name   [in Repo A]
    

    and the same in Repo B. But then, over in Repo A, we do:

    git checkout branch-name
    <do stuff>
    git commit
    

    which causes repo A to contain:

    ...--F--G--H--I   <-- branch-name
    

    (I get lazy and don't bother drawing the commit-to-commit arrows correctly here). New commit I—I, like H and G and F, stands in for some big ugly random-looking hash ID—points back to existing commit H. You might even make more than one new commit:

    ...--F--G--H--I--J   <-- branch-name
    

    Now you run git push origin branch-name, to send your new commits, in your repository, back to the "origin" repo (which we were calling "repo B" before, but let's call it origin now).

    Your Git software suite ("your Git") calls up theirs. Your Git lists out the hash ID of your latest commit, i.e., commit J. Their Git checks in their repository, to see if they have J, by hash ID. They don't (because you just made it). So their Git tells your Git: OK, gimme! Your Git is now obligated to offer J's parent I. They check and don't have I either, so they ask for that one too. Your Git is now obligated to offer commit H. They check and—hey!—this time they do have commit H already, so they say: no thanks, I have that one already.

    Your Git now knows not only that you must send commits J and I, but also which files they already have. They have commit H, so they must have commit G too, and commit F, and so on. They have all the de-duplicated files that go with those commits. So your Git software suite can now compute a minimal set of stuff to send them so that they can reconstruct commits I-J.

    Your Git does so; that's the "counting" and "compressing" and so on that you see. Their Git receives this stuff, unpacks it, and adds the new commits to their repository. They now have:

    ...--F--G--H   <-- branch-name
                \
                 I--J
    

    in their Git repository. Now we hit a really tricky bit: How does a Git, in general, find a commit? The answer is always, ultimately, by its hash ID—but that just brings another question, which is: how does a Git find a hash ID? They look random.

    We already said this earlier though: a Git (the software suite) often finds some specific commit in some specific repository through the use of a branch name. The branch name branch-name, in your repository, finds the last commit, which is now J. We'd like the same name in their repository to find the same last commit.

    So, your Git software now asks their Git to set their repository's branch name branch-name to identify commit J. They will do this if you are allowed to do this. The "allowed" part can get arbitrarily complicated—sites like GitHub and Bitbucket add all kinds of permissions and rules here—but if we assume that it's OK, and that they'll do that, then they will end up with:

    ...--F--G--H--I--J   <-- branch-name
    

    in their repository, and your Git repository and their Git repository will be in sync again, at least for this particular branch name.

    So that's how git push normally works: you make new commits, adding them on to the end of your branch, and then you send your new commits to some other Git, and ask their software to add the same commits to the end of a branch of the same name in their repository. (Whew!)

    Submodules

    A submodule, in Git, is little more than two separate, mostly-independent Git repositories. This of course needs a lot of explanation:

    First, like any repository, a submodule repository is a collection of commits, each with a unique hash ID. We—or Git at least—like to refer to one of the two repositories as the superproject and the other as the submodule. Both of these start with the letter S, which is annoying, and both words are long and klunky, so here I'll use R (in bold like this) as the superproject Repository, and S as the Submodule.

    (Side note: the hash IDs in R and S are independent from each other. Git tries pretty hard—and usually succeeds—at making hash IDs globally unique across every Git repository everywhere in the universe. So there's no need to worry about "contaminating" R with S IDs or vice versa. In any case we can just treat every commit hash ID as if it's totally unique. Normally, with a normal non-R non-S repository, we don't even have to care about IDs, as we just use names. But submodules make you have to be more aware of the IDs.)

    What makes R a superproject in the first place is that it lists raw hash IDs from S. It also has to list instructions: if we've done a git clone of R, we don't even have a clone of S yet. So R needs to contain the instructions so that your Git software can make a clone of S.

    The instructions you give to git clone are pretty simple:

    git clone <url> <path>
    

    (where the path part is even optional, but here, R will always specify a path—using those forward slash path names we mentioned earlier). This set of instructions goes into a file named .gitmodules. The git submodule add command will set up this file in R for you. It's important to use it, to set up the .gitmodules file. Git will still make a submodule even if you don't set this up, but without the cloning instructions, the submodule won't actually work.

    Note that there's no proper place to put authentication (user and password names) in here. That's a generic submodule issue. (You can put them in as plaintext in the .gitmodules file, but don't do it, it's a very bad idea, they're not encrypted or protected.) As long as you have open access to cloning the submodule, it doesn't normally present any real problem. If you don't, you'll have to solve this problem somehow.

    In any case, you will need, just once, to run:

    git submodule add ...
    

    (filling in the ... part) in what will thus become superproject R, so as to create the .gitmodules file. You then need to commit the resulting .gitmodules file, so that people who clone R and check out a commit that contains that file, get that file, so that their Git software can run the git clone command to create S on their system.

    You'll also need to put S somewhere they can clone it. This, of course, means that first you need to create a Git repository to hold S. You do this the way you make any Git repository:

    git init
    

    or:

    git clone
    

    (locally, on your machine) along with whatever you do on whatever hosting site that creates the repository there.

    Now that you have a local repository S, you need to put some commit(s) into it. What goes into these commits?

    Well, you already said that you'd like your R to have a build/ directory (folder) in it, but not actually store any of the built files in any of the commits made in R. This is where submodules actually work. A submodule, in R, for S, works by saying: create me a folder here, then clone the submodule into the folder. Or, if the submodule repository already exists—as it will when you're setting all this up in the first place, with you just now having created S—you simply put that entire repository into your working tree for R, under the name build.

    Note that build/.git will exist in R's working tree at this point. That's because a Git repository hides all the Git files in the .git directory (folder) at the top level of the working tree. So your new, empty S repository consists of just a .git/ containing Git files.

    You can now run that git submodule add command in R, because now you have the submodule in place:

    git submodule add <url> build
    

    (You might want to wait just a little bit, but you can definitely do it at this point—and this is the earliest point at which you can do it, since up until now, S didn't exist or was not in the right place yet.)

    You can now fill the build/ directory that lives in R's working tree with files, e.g., by running npm run build, or whatever it is that populates the build/ directory. Then you can:

    (cd build; git add .)
    

    or equivalent, so as to add the build output in S. You can now create the first commit in S, or maybe as the second commit in S if you like to create a README.md and LICENSE and such as your initial commit. You can now have branches in S as well, since you now have at least one commit in S.

    Now that you're back in R though, it's time to git add build—or, if you chose to delay it, run that first git submodule add. In the future you'll use git add build. This directs the Git that is manipulating the index / staging-area for R to enter the repository S and run:

    git rev-parse HEAD
    

    to find the raw hash ID of the current commit in S.

    The superproject's Git repository's index now acquires a new gitlink entry. A gitlink entry is like a regular file, except that instead of git checkout checking it out as a file, it provides a raw hash ID. That's basically all it is: a pathname—in this case, build/—and a raw hash ID.

    This gitlink is like one of those read-only, compressed, and de-duplicated files that goes in a commit. It's just that instead of storing file data, it stores a commit hash ID. That hash ID is that of some commit in S, not some commit in R itself. But now that you've updated the index (or staging area) for R, you will need to make a new commit in R. The new commit will contain any updated files, plus the right hash ID for S, as found just now by the git add you ran (or that git submodule add ran for you).

    The next commit you make in R (not in S) will list the hash ID of the current commit in S. So once you've committed the built files in S, you can git add them in R and git commit in R.

    The last and trickiest part

    Now comes the last part, which—if you thought all of the above was complicated and tricky—is the trickiest:

    You can see how this can get pretty messy. It's very easy for the various separate commits to get de-synchronized in various ways. Once you have the procedures down, and have scripts around everything that make sure that all the steps happen at the right times, it can work pretty well. But there are many ways for things to go wrong.