phpms-worddocxwebdavsabredav

Why is my docx, xlsx, pptx file corrupted?


PROBLEM :

I need files on my server to be encrypted and it works perfectly fine for .txt, .doc, .xls, .ppt but not with .docx, .xlsx and .pptx.

The problem when I try to edit a docx (or xlsx, pptx) is that the file gets corrupted by the way I encrypt/decrypt since it's not a proper way to edit a docx. So when Microsoft Word tries to open it, it says it's corrupted and it opens it as 'Document1.docx' and not as'MyFileName.docx' and when saving I have to give the name again and with pptx I even have to give the path to the webdav folder the document is in.

QUESTION :

Is there any way to get it to save in the right place without having to type the path ?

CODE :

Here is the code I use to encrypt the files :

$ext = explode( '.', basename($path));
if (in_array("doc", $ext) || in_array("docx", $ext)) {
    $handle = fopen("$davPath/$path", "rb");
    $data_file = fread($handle, filesize("$davPath/$path"));
    fclose($handle);
} else {            
    $data_file = file_get_contents("$davPath/$path");
}

$encrypt_data_file = $encryption->encrypt($data_file);

if (file_put_contents("$davPath/encrypt_" . basename($path),$encrypt_data_file)) {
    unlink("$davPath/" . basename($path));
    rename("$davPath/encrypt_" . basename($path),"$davPath/" . basename($path));
    return true;
} else {
    return false;
}

And here is the code I use to decrypt them :

$ext = explode( '.', basename($uri));
if(is_file($davPath."/".$uri)) {
    if (in_array("doc", $ext) || in_array("docx", $ext)) {
        $handle = fopen("$davPath/$uri", "rb");
        $data_file = fread($handle, filesize("$davPath/$uri"));
        fclose($handle);
    } else {
        $data_file = file_get_contents("$davPath/$uri");
    }   
}
if ($data_file != false) {
    $decrypt_data_file = $encryption->decrypt($data_file);

    header('Content-Description: File Transfer');
    header('Content-Type: application/octet-stream');
    header('Content-Disposition: attachment; filename='.basename($uri));
    header('Content-Location: '.$_SERVER['SCRIPT_URI']);
    header('Expires: 0');
    header('Cache-Control: must-revalidate');
    header('Pragma: public');
    ob_clean();
    flush();
    echo $decrypt_data_file;
    return false;
}

PS : I did find a workaround which consists in having the file decrypted on the server during the modification but I would really like not to have to do that.


Solution

  • Your issue has been solved, but I'd like to add an answer to it.

    When you have a corrupted docx, here are some steps to find out what's wrong :

    First, try to unzip the zip. If it does work, your problem is with the content of the docx. If the unzip doesn't work, your zip seems to be corrupted

    Problems with the content of the docx

    When you open the docx, word will probably tell you where the problem lies, if the zip is not corrupted.

    It will tell you for example: Parse error on line 213 of document.xml

    Here's the "normal" structure of a docx, after unzipped.

    +--docProps
    |  +  app.xml
    |  \  core.xml
    +  res.log
    +--word //this folder contains most of the files that control the content of the document
    |  +  document.xml //Is the actual content of the document
    |  +  endnotes.xml
    |  +  fontTable.xml
    |  +  footer1.xml //Containst the elements in the footer of the document
    |  +  footnotes.xml
    |  +--media //This folder contains all images embedded in the word
    |  |  \  image1.jpeg
    |  +  settings.xml
    |  +  styles.xml
    |  +  stylesWithEffects.xml
    |  +--theme
    |  |  \  theme1.xml
    |  +  webSettings.xml
    |  \--_rels
    |     \  document.xml.rels //this document tells word where the images are situated
    +  [Content_Types].xml
    \--_rels
       \  .rels
    

    As shown in the docx tag wiki.

    Corrupted zip

    If the zip is corrupted, in most of the cases, they are some characters at the beginning or at the end of the file that shouldn't be there (or that should and are not).

    The best is to have a valid docx of the same document, and use the hexadecimal representation of both the documents to see what's the difference.

    I usually use the hexdiff tool for this (apt-get install hexdiff).

    This will usually show you where the extra characters are situated.

    Quite often, the problem is that you have the wrong headers.