macosziplispcommon-lispunzip

Correct Unzip of Excel file in Common Lisp


this is a follow up question to Creating File with Square Brackets in the name in Common Lisp in a MacBook creates problems. How can I do it?

which I answered myself, since ChatGPT gave me the answer.

But on the way trying to unzip an Excel file (which contains a "[Content_Type].xml" file with square brackets in the name which create here the problem, I realized that the zip library - with its unzip function also has a problem to unzip this square brackets name.

Source code of the unzip function is here https://github.com/mcna/zip/blob/master/zip.lisp And it goes:

(defun unzip (pathname target-directory &key (if-exists :error) verbose)
  ;; <Xof> "When reading[1] the value of any pathname component, conforming
  ;;       programs should be prepared for the value to be :unspecific."
  (when (set-difference (list (pathname-name target-directory)
                              (pathname-type target-directory))
                        '(nil :unspecific))
    (error "pathname not a directory, lacks trailing slash?"))
  (with-zipfile (zip pathname)
    (do-zipfile-entries (name entry zip)
      (let ((filename (merge-pathnames name target-directory)))
        (ensure-directories-exist filename)
        (unless (char= (elt name (1- (length name))) #\/)
          (ecase verbose
            ((nil))
            ((t) (write-string name) (terpri))
            (:dots (write-char #\.)))
          (force-output)
          (with-open-file
              (s filename :direction :output :if-exists if-exists
               :element-type '(unsigned-byte 8))
            (zipfile-entry-contents entry s)))))))

Let's take an excel file. And unzip it using this function. I created manually an excel file. And then specified the path to it:

(defparameter *xlsx* "/Users/<user-name>/Downloads/test.xlsx")
(ql:quickload :zip)
(zip:unzip *xlsx* "/Users/<user-name>/Downloads/test_zip/")

It has exactly the same problem I encountered in my previous post.

bad place for a wild pathname
   [Condition of type SB-INT:SIMPLE-FILE-ERROR]

Restarts:
 0: [RETRY] Retry SLY mREPL evaluation request.
 1: [*ABORT] Return to SLY's top level.
 2: [ABORT] abort thread (#<THREAD tid=5123 "sly-channel-1-mrepl-remote-1" RUNNING {70051404E3}>)

Backtrace:
 0: (SB-KERNEL::%FILE-ERROR #P"/Users/<user-name>/Downloads/test_zip/[Content_Types].xml" "bad place for a wild pathname")
      Locals:
        ARGUMENTS = NIL
        DATUM = "bad place for a wild pathname"
        PATHNAME = #P"/Users/<user-name>/Downloads/test_zip/[Content_Types].xml"
 1: (ENSURE-DIRECTORIES-EXIST #P"/Users/<user-name>/Downloads/test_zip/[Content_Types].xml" :VERBOSE NIL :MODE 511)
      Locals:
        #:.DEFAULTING-TEMP. = NIL
        #:.DEFAULTING-TEMP.#1 = 511
        SB-IMPL::CREATED-P = NIL
        PATHNAME = #P"/Users/<user-name>/Downloads/test_zip/[Content_Types].xml"
        SB-IMPL::PATHSPEC = #P"/Users/<user-name>/Downloads/test_zip/[Content_Types].xml"
 2: (ZIP:UNZIP "/Users/josephus/Downloads/test.xlsx" "/Users/<user-name>/Downloads/test_zip/" :IF-EXISTS :ERROR :VERBOSE NIL :FORCE-UTF-8 NIL)
      Locals:
        #:.DEFAULTING-TEMP. = NIL
        FILENAME = #P"/Users/<user-name>/Downloads/test_zip/[Content_Types].xml"
        FORCE-UTF-8 = NIL
        IF-EXISTS = :ERROR
        PATHNAME = "/Users/<user-name>/Downloads/test.xlsx"
        TARGET-DIRECTORY = "/Users/josephus/Downloads/test_zip/"
        ZIP = #S(ZIP:ZIPFILE :STREAM #<SB-SYS:FD-STREAM for "file /Users/<user-name>/Downloads/test.xlsx" {700888DB43}> :ENTRIES #<HASH-TABLE :TEST EQUAL :COUNT 11 {70088A23A3}>)
 3: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ZIP:UNZIP *XLSX* "/Users/<user-name>/Downloads/test_zip/") #<NULL-LEXENV>)
 4: (EVAL (ZIP:UNZIP *XLSX* "/Users/<user-name>/Downloads/test_zip/"))
 5: ((LAMBDA NIL :IN SLYNK-MREPL::MREPL-EVAL-1))

Does anybody know how to handle this?

I asked chatGPT, and it suggested:

 (defun better-unzip (pathname target-directory &key (if-exists :error) verbose)
  "Unzip function that handles square brackets in filenames correctly."
  (zip:with-zipfile (zip pathname)
    (zip:do-zipfile-entries (name entry zip)
      (let ((filename (make-pathname :directory target-directory
                                     :name (pathname-name (parse-namestring name))
                                     :type (pathname-type (parse-namestring name)))))
        (ensure-directories-exist filename)
        (unless (char= (elt name (1- (length name))) #\/)
          (ecase verbose
            ((nil))
            ((t) (write-string name) (terpri))
            (:dots (write-char #\.)))
          (force-output)
          (with-open-file (s filename :direction :output :if-exists if-exists
                            :element-type '(unsigned-byte 8))
            (zip:zipfile-entry-contents entry s)))))))

I then did:

(better-unzip *xlsx* "/Users/<your-user-name>/Downloads/test_zip/")

But I encountered again:

bad place for a wild pathname
   [Condition of type SB-INT:SIMPLE-FILE-ERROR]

Restarts:
 0: [RETRY] Retry SLY mREPL evaluation request.
 1: [*ABORT] Return to SLY's top level.
 2: [ABORT] abort thread (#<THREAD tid=5123 "sly-channel-1-mrepl-remote-1" RUNNING {70051404E3}>)

Backtrace:
 0: (SB-KERNEL::%FILE-ERROR #P"//Users/<user-name>/Downloads/test_zip//[Content_Types].xml" "bad place for a wild pathname")
 1: (ENSURE-DIRECTORIES-EXIST #P"//Users/<user-name>/Downloads/test_zip//[Content_Types].xml" :VERBOSE NIL :MODE 511)
 2: (BETTER-UNZIP "/Users/josephus/Downloads/test.xlsx" "/Users/<user-name>/Downloads/test_zip/" :IF-EXISTS :ERROR :VERBOSE NIL)
 3: (SB-INT:SIMPLE-EVAL-IN-LEXENV (BETTER-UNZIP *XLSX* "/Users/<user-name>/Downloads/test_zip/") #<NULL-LEXENV>)
 4: (EVAL (BETTER-UNZIP *XLSX* "/Users/<user-name>/Downloads/test_zip/"))
 5: ((LAMBDA NIL :IN SLYNK-MREPL::MREPL-EVAL-1))

Solution

  • The underlying problem here is that you've crashed into the edges of the CL pathname specification. Those edges are unfortunately never very far away.

    In this case the problem is SBCL-specific, but it's not because SBCL has bugs: it's because SBCL does something it's allowed to do.

    The very brief answer is almost that you need to either use or find something which turns strings into pathnames in a well-defined way for common platforms. And the practical answer is probably to use ASDF's UIOP, and in particular to use uiop:ensure-pathname whenever you wish to turn a string into a pathname. This will arrange for things to work. So if you modify the unzip function to be like this:

    (defun unzip (pathname target-directory &key (if-exists :error) verbose)
      ;; <Xof> "When reading[1] the value of any pathname component, conforming
      ;;       programs should be prepared for the value to be :unspecific."
      (when (set-difference (list (pathname-name target-directory)
                                  (pathname-type target-directory))
                            '(nil :unspecific))
        (error "pathname not a directory, lacks trailing slash?"))
      (with-zipfile (zip pathname)
        (do-zipfile-entries (name entry zip)
          (let ((filename (merge-pathnames (uiop:ensure-pathname name)
                                           target-directory)))
            (ensure-directories-exist filename)
            (unless (char= (elt name (1- (length name))) #\/)
              (ecase verbose
                ((nil))
                ((t) (write-string name) (terpri))
                (:dots (write-char #\.)))
              (force-output)
              (with-open-file
                  (s filename :direction :output :if-exists if-exists
                   :element-type '(unsigned-byte 8))
                (zipfile-entry-contents entry s)))))))
    

    Then things will work.

    Here is some debugging output from a version of what appears to be the current zip:

    > (unzip "x.zip" #P"./")
    name "x/", parsed name #P"./x/", not wild
    name "x/a.a", parsed name #P"./x/a.a", not wild
    name "x/[b].b", parsed name #P"./x/\\[b].b", not wild
    

    Also, don't rely on ChatGPT: it's just not being helpful here, if it ever is.


    Below are some details which I think are worth recording.

    The specific problem is that SBCL (and I think SBCL only among current implementations) has an extended notion of what a wild pathname is. It is allowed to do this. So in SBCL the result of (parse-namestring "[foo].xml") is a wild pathname. And in particular it is a wild pathname which matches "f.xml" and "o.xml". The same is true for this syntax used in other pathname components, and this syntax should be familiar from the syntax supported by many Unix shells, and indeed from regular expressions.

    Now, in SBCL, if you construct a pathname with make-pathname and use only strings as components, you will not get a wild pathname. So, for instance

    > (wild-pathname-p (pathname "[foo].xml"))
    t
    > (wild-pathname-p (make-pathname :name "[foo]" :type "xml"))
    nil
    > (wild-pathname-p (pathname "foo.*"))
    (:wild :wild-inferiors)
    > (wild-pathname-p (make-pathname :name "foo" :type "*"))
    

    So you can always construct a pathname this way which is not wild. In order to construct a wild pathname you need to use some non-string value (like :wild). I do not know if there is a way to construct the thing corresponding to [abc] other than by grabbing it from another pathname: I can't find any mention of this in the SBCL manual.

    However this fails horribly to be portable. The CLHS says in 19.2.2.3:

    [...] conforming programs must be prepared to encounter any of the following additional values in any component or any element of a list that is the directory component:

    • The symbol :wild, which matches anything.
    • A string containing implementation-dependent special wildcard characters.
    • Any object, representing an implementation-dependent wildcard pattern.

    [My emphasis]

    The second case means that yes, a pathname is allowed to have components which are strings but still be wild.

    It is, perhaps, possible to argue that if you provide a string as a pathname component to make-pathname then that pathname is not allowed to be wild in that component. I don't think the spec really says this, and I think it probably can't, because I would expect that if some pathname p has a wild name then (make-pathname :name (pathname-name a) :type "foo")) also has a wild name. Yet, from above, the wild name can be a string.

    SBCL on the other hand appears to take the sensible approach that if you provide a string for a component in make-pathname, that component is never wild. This is the right thing to do, I think.

    However none of this is going to help you much: zip files (and many other sorts of files) contain pathnames represented as strings. Something at the Lisp level needs to parse those pathnames. And if those pathnames look wild, it will parse them as wild pathnames. There may be specific workarounds, as I think Rainer has given you, but in general I think the only solution to this problem is either to implement your own pathname parsing (horrible) or to rely on someone else to have done that for you. And in this case the ASDF people have done it for you.