pythonwindowsgit-p4

os.path.split seems to be returning wrong


I can't understand what os.path.split is doing. I'm debugging a program (specifically git's interface with Perforce: git-p4) and seeing that os.path.split is splitting the incoming path in ways the script isn't expecting, and also seems inconsistent with the documentation. I made some simpler tests and can't figure out what it's doing myself.

The path I want to split is //a/b (The path is actually a Perforce path, not a local filesystem path), and I need b in the second half of the returned pair. I'm running on Windows, and suspect the issue has something to do with the path not looking very Windows-esque. When I tried running my test code in an online sandbox it worked as expected unlike my Windows machine.

I've read the documentation:

os.path.split(path)

Split the pathname path into a pair, (head, tail) where tail is the last pathname component and head is everything leading up to that. The tail part will never contain a slash; if path ends in a slash, tail will be empty. If there is no slash in path, head will be empty. If path is empty, both head and tail are empty. Trailing slashes are stripped from head unless it is the root (one or more slashes only). In all cases, join(head, tail) returns a path to the same location as path (but the strings may differ). Also see the functions dirname() and basename().

My test code:

import os
print os.path.split("//a")
print os.path.split("//a/b")
print os.path.split("//a/b/c")

What I'd expect:

('//', 'a')
('//a', 'b')
('//a/b', 'c')

What I actually get on a couple online sandboxes:

('//', 'a')
('//a', 'b')
('//a/b', 'c')

What I actually get on my PC:

('//', 'a')
('//a/b', '')
('//a/b/', 'c')

Python 2 because the git-p4 code is written for Python 2.

So my first question is just for my own understanding. What's going wrong here? An OS difference?

And then beyond my own curiosity, I need a fix. I've been able to modify git-p4, but I'd of course prefer to edit it as little as possible as I'm not trying to understand it! I'm not a python expert. Is there a comparable method that can get ('//a', 'b') returned?


Solution

  • You are using the wrong tool to handle these paths. On Windows, paths that start with //foo/bar or \\foo\bar are seen as UNC network paths, and os.path.split() will first use os.path.splitdrive() to make sure the UNC portion is not split. The UNC or drive portion is then re-attached after splitting the remainder.

    You can use the posixpath module instead, to get the POSIX behaviour:

    import posixpath
    
    posixpath.split(yourpaths)
    

    Quoting from the top of the os.path module documentation:

    Note: Since different operating systems have different path name conventions, there are several versions of this module in the standard library. The os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths. However, you can also import and use the individual modules if you want to manipulate a path that is always in one of the different formats. They all have the same interface:

    • posixpath for UNIX-style paths
    • ntpath for Windows paths
    • [...]

    On Windows, os.path is the same module as ntpath, the online sandboxes must all have been POSIX systems.

    Treating your Perforce paths as POSIX paths is fine, provided you always use forward slashes as path separators.