My goal is to access existing Git repos from Python. I want to get repo history and on demand diffs.
In order to do that I started with dulwich. So I tried:
from dulwich.repo import Repo
Repo.init('/home/umpirsky/Projects/my-exising-git-repo')
and got OSError: [Errno 17] File exists: '/home/umpirsky/Projects/my-exising-git-repo/.git
The doc says You can open an existing repository or you can create a new one.
.
Any idea how to do that? Can I fetch history and diffs with dulwich? Can you recommand any other lib for Git access? I am developing Ubuntu app, so it would be appriciated to have ubuntu package for easier deployment.
I will also check periodically to detect new changes in repo, so I would rather work with remote so I can detect changes that are not pulled to local yet. I'm not sure how this should work, so any help will be appriciated.
Thanks in advance.
Most of Dulwich' documentation assumes a fair bit of knowledge of the Git file formats/protocols.
You should be able to open an existing repository with Repo
:
from dulwich.repo import Repo
x = Repo("/path/to/git/repo")
or create a new one:
x = Repo.init("/path/to/new/repo")
To get the diff for a particular commit (the diff with its first parent)
from dulwich.patch import write_tree_diff
commit = x[commit_id]
parent_commit = x[commit.parents[0]]
write_tree_diff(sys.stdout, x.object_store, parent_commit.tree, commit.tree)
The Git protocol only allows fetching/sending packs, it doesn't allow direct access to specific objects in the database. This means that to inspect a remote repository you first have to fetch the relevant commits from the remote repo and then you can view them:
from dulwich.client import get_transport_and_path
client, path = get_transport_and_path(remote_url)
remote_refs = client.fetch(path, x)
print x[remote_refs["refs/heads/master"]]