I am new to git and now facing a git repo 'housekeeping' challenge after deciding to clean up the code structure. There are two dimensions to my challenge:
Need to rename suboptimally titled REPO, and some SUBFOLDERS and FILES to a clean python standard notation (from using dashes to shorter names with underscores etc.).
Split out test code into .py files saved in a dedicated \tests folder.
I discovered that doing the above code and file structure clean up in git is hard with preserving the change history. The other answers on the topic appear to cover a part of this effort. I attempted renaming files via git online but, while the history is formally kept, it only stores the bulk deletion act of a big chunk of test code that has been moved to \test folder. The newly created \tests\basic_test.py and \tests\advanced_test.py files are apparently treated as new by git i.e. have zero prior history of changes.
In short, I need to split out the test code into new files stored in a new \tests subfolder and then rename the root code folder finishing up by renaming the repo. Is this possible to be done without use of git command line? If not I guess it's my time to learn it and I appreciate the guidance to implementing exactly what I need above to jump in the water but not get bogged down in the git command line tutorials i.e. effect the change I need with minimum theory acquisition.
Thanks a lot for sharing the wisdom!
- mt code structure 1.0
\money-tracker # local dir and git repo name
money_tracker_v01_9.4
- mt code structure 2.0
\money_tracker # app root_dir (local dir and git repo name)
\mt # code_dir (shared code base named after main mod)
mt.py
\tests
test_basic.py
test_advanced.py
\data_in (private, local)
coa.csv
trxn_data_x.csv
\data_out (private, local)
cf_report_x.txt
* each mt_dir may contain aux files (f.e. __init__.py, context.py)
The minimum theoretical part that you must learn is this: Git doesn't have file history. Git has commits, and the commits are the history. Each commit has a full snapshot of every file.1
Git can, at any time, compare any two existing commits. If there is a file named F in the old commit, and a file named F in the new commit, we generally assume that this is the same file. But suppose that the old commit has a file named old/path/to/name1.py
and the new commit has a file named new/name/of/name2.py
.2 Then maybe those should be considered "the same file", even though they have different names.
If some commit renames some file, Git can try to detect that rename. This rename detection depends on the files being similar enough in terms of content. A 100% match on content guarantees that Git can find the rename pretty easily. So when you have a commit that just renames the files, telling Git tell me what changed in this one commit, and by the way, detect renames while you're doing that 3 will make Git compare the "before" snapshot to the "after one", and it will find all the renames.
In order to show you a pretend "file history" with git log --follow -- path
, Git simply looks at each commit. Git starts at the end and works backwards (it always does this), comparing the before-and-after snapshots, with rename detection enabled. If path
is in the "after" commit, and Git finds that it's renamed from some previous path in the "before" commit, Git tells you about that, and then starts looking for the old path name.
That's essentially all you get. Your best bet when renaming a file or restructuring a project, then, is to commit just the renaming, as one commit, then commit any other changes required. You do not have to do this, as the rename detector can often detect a renamed-and-changed file as renamed, but you get a better rename-detection guarantee when you have the rename committed separately, so that each file 100%-matches the previous one.
Note that whether any particular GUI turns on rename-detection, and if so, how, is up to that GUI. All Git provides are the commits.
1The files inside a commit are stored in a special, read-only, Git-only, compressed and de-duplicated format. This means that if you make a thousand commits in a row, and only change README.md
once, you have, say, 998 shared copies of the old one and 2 shared copies of the new one, or 400 shared copies of the old one and 600 shared copies of the new one, so that either way, it's really only in the repository twice, rather than a thousand times.
This also, however, means that the files you see and work on, when you work with a Git repository, are not in the Git repository. The files you see and work with are copies that were extracted from the repository, and turned back into usable files in the process. This explains a lot about why Git behaves the way it does.
2Note that the slashes—which go forwards, though you can use backslashes on Windows—are part of each file's name: the name is old/path/to/name1.py
, for instance. That's not a folder named old
containing a folder named path
and so on, that's just a file whose name is old/path/to/name1.py
.
3From the command line, use git diff --find-renames
or git show --find-renames
to enable the rename detector, or set diff.renames
to true
. In Git version 2.9 and later, diff.renames
is set to true
by default; in earlier versions, it is set to false
by default.