windowswinapihookwm-paint

How to read content of WM_PAINT message?


My goal is to screen-scrape a portion of a program which constantly updates with new text. I have tried OCR with Tesseract but I believe it would be much more efficient to somehow intercept the text if possible. I have attempted using the GetWindowText() function, but it only returns the window title. Using Window Detective I have determined that whenever the window updates in the way I wish to capture, a WM_PAINT message is reliably sent to the window.

I have looked a bit into Windows API Hooks, but it seems that most of these techniques involving DLL injection are intended at sending new messages, not accessing the content of already sent messages.

How should I approach this problem?


Solution

  • When you say 'screen-scrape', is that what you really mean? Reading your post, it sounds like you actually want to get at the text in the child window or control in question - as text, and not just as a bitmap. To do that, you will need to:

    1. Determine which child window or control actually contains the text you want to get at. It sounds like you may have already done that but if not, the tool of choice is generally Spy++. (Please note: the version of Spy that you use must match the 'bitness' of your application.)

    2. Then, firstly, try to figure out whether the text in that window can be retrieved somehow. If it's a standard Windows control (specifically EDIT or RICHEDIT) then there are documented ways to do that, see MSDN.

    3. If that doesn't pan out, you might have some success hooking calls to ExtTextOut(), although that's not a pleasant proposition and I think you might struggle to achieve it. That said, I believe the accepted way (in some sense of the word 'accepted') is here.

    4. With reference to point 3, even if you achieve it, how would you know whether any particular call to ExtTextOut() was drawing to the window you're interested in? Answer, most likely, HWND WindowFromDC().

    I hope that helps a little. Please don't come back at me with a bunch of detailed questions about how this might apply to your particular use-case. I'm not really interested in that, these are just intended as a few signposts.