pythongitdulwich

Getting started with Git Python


My goal is to access existing Git repos from Python. I want to get repo history and on demand diffs.

In order to do that I started with dulwich. So I tried:

from dulwich.repo import Repo
Repo.init('/home/umpirsky/Projects/my-exising-git-repo')

and got OSError: [Errno 17] File exists: '/home/umpirsky/Projects/my-exising-git-repo/.git

The doc says You can open an existing repository or you can create a new one..

Any idea how to do that? Can I fetch history and diffs with dulwich? Can you recommand any other lib for Git access? I am developing Ubuntu app, so it would be appriciated to have ubuntu package for easier deployment.

I will also check periodically to detect new changes in repo, so I would rather work with remote so I can detect changes that are not pulled to local yet. I'm not sure how this should work, so any help will be appriciated.

Thanks in advance.


Solution

  • Most of Dulwich' documentation assumes a fair bit of knowledge of the Git file formats/protocols.

    You should be able to open an existing repository with Repo:

    from dulwich.repo import Repo
    x = Repo("/path/to/git/repo")
    

    or create a new one:

    x = Repo.init("/path/to/new/repo")
    

    To get the diff for a particular commit (the diff with its first parent)

    from dulwich.patch import write_tree_diff
    commit = x[commit_id]
    parent_commit = x[commit.parents[0]]
    write_tree_diff(sys.stdout, x.object_store, parent_commit.tree, commit.tree)
    

    The Git protocol only allows fetching/sending packs, it doesn't allow direct access to specific objects in the database. This means that to inspect a remote repository you first have to fetch the relevant commits from the remote repo and then you can view them:

    from dulwich.client import get_transport_and_path
    client, path = get_transport_and_path(remote_url)
    remote_refs = client.fetch(path, x)
    print x[remote_refs["refs/heads/master"]]