python-3.xcomolestructured-storage

Python: Writing a bytestream to overwrite an existing Microsoft Structured Storage OLE Stream


Some background to what I am doing:

I am writing a program in Python 3 in the hopes to develop a process to read and write to Microsoft OLE Structured Storage file types. I am able to create a simple GUI that allows the user to choose which storages and streams that they would like to read and write to using tkinter, PySimpleGUI. I am using the olefile, pandas, and numpy packages to perform most of my programs legwork, but I have encountered a known issue with olefile, which is:

That the size of the bytestream that is being written must be the same size as the existing bytestream in the OLE file. This became an issue for me relatively quickly after I began debugging my program.

What am I needing to do?

After some extensive research on the main programming sites and buying the book, Python Programming on Win32 (specifically reading Ch12 on COM storage); I have ran myself into a dead end.

https://github.com/joxeankoret/nightmare/blob/master/mutators/OleFileIO_PL.py

https://github.com/decalage2/olefile/issues/6

https://github.com/decalage2/olefile/issues/95

https://github.com/decalage2/olefile/issues/99

The following is the watered down code I am using:

file_path = values[0]
xl_path = values[1]
data = olefile.OleFileIO(file_path)
storages = olefile.OleFileIO.listdir(data, streams=False, storages=True)
streams = olefile.OleFileIO.listdir(data, streams=True, storages=False)
stmdata = data.openstream(streams[index])
readData = data.openstream(streams[index]).read()
#Send the data into Excel to be manipulated by User
with pd.ExcelWriter(xl_path, engine='openpyxl') as ew:
   ew.book = xl.load_workbook(xl_path)
   df.to_excel(ew, sheet_name=tabNames)

Data is manipulated, now read it back.

Use Pandas to read the data into a DataFrame

df1 = pd.read_excel(xls, x, encoding='utf-8', header=None)
newDF = newDF[0].str.encode(encoding="utf-8")
byteString = newDF[0]

The following statement only allows equal size ByteStrings

data.write_stream(streams[setIndex], byteString)

ValueError: write_stream: data must be the same size as the existing stream

EDIT:

This question was answered by Decalade in the comments below. Here is the code I used to solve my problem:

istorage = pythoncom.StgOpenStorageEx(file_path, mode, STGFMT_STORAGE, 0, pythoncom.IID_IStorage)

istorage1 = istorage.OpenStorage(stgRelays, None, mode, None, 0)

istorage2 = istorage1.OpenStorage(storage_choice, None, mode, None, 0)

    for x in set_compArr:

        set_STM = x + '.TXT'

        istream = istorage2.OpenStream(set_STM, None, mode, 0)

        istream.Write(byteString)

Solution

  • A way to modify OLE/CFB files is to use pythoncom from the pywin32 extensions on Windows (and maybe Linux with WINE): https://github.com/mhammond/pywin32

    First, open the OLE file using pythoncom.StgOpenStorageEx: http://timgolden.me.uk/pywin32-docs/pythoncom__StgOpenStorageEx_meth.html

    Example:

    import pythoncom
    from win32com.storagecon import *
    
    mode = STGM_READWRITE|STGM_SHARE_EXCLUSIVE
    istorage = pythoncom.StgOpenStorageEx(filename, mode, STGFMT_STORAGE, 0, pythoncom.IID_IStorage)
    

    Then use the methods of the PyIStorage object: http://timgolden.me.uk/pywin32-docs/PyIStorage.html

    OpenStream returns a PyIStream object: http://timgolden.me.uk/pywin32-docs/PyIStorage__OpenStream_meth.html

    You can use its methods to read, write and change the size of a stream: http://timgolden.me.uk/pywin32-docs/PyIStream.html