c++webbrowser-controlc++buildertwebbrowsermshtml

Load from IPersistMoniker takes long time to load unresolvable URL


I am loading an local disk drive _test.htm file through IPersistMoniker Load method. From what I believe, it is supposed to add the path to the relative URLs as base path. Problem is - it does not do so. Instead, it takes a very long time trying to resolve the path from Internet until it gives up (about 20-30 seconds). What I want is to give up instantly, as soon as the unsolvable path is detected (since it is a local disk file anyway).

This is an example HTML I am loading:

<html>
  <head>
    <script src="//test/test.js"></script>
  <head>
  <body>
    <img src="image.jpg">
    <img src="/image.jpg">
    <img src="//image.jpg">
  </body>
</html>

Simplified code (C++ Builder) with no error checking:

WideString      URL = "file:///" + StringReplace(ExtractFilePath(Application->ExeName), "\\", "/", TReplaceFlags() << rfReplaceAll) + "_test.htm";
TCppWebBrowser* WB  = CppWebBrowser1;

DelphiInterface<IMoniker> pMoniker;
OleCheck(CreateURLMonikerEx(NULL, URL.c_bstr(), &pMoniker, URL_MK_UNIFORM));

DelphiInterface<IHTMLDocument2> diDoc2 = WB->Document;
DelphiInterface<IPersistMoniker> pPrstMnkr;
OleCheck(diDoc2->QueryInterface(IID_IPersistMoniker, (LPVOID*)&pPrstMnkr));

DelphiInterface<IBindCtx> pBCtx;
OleCheck(CreateBindCtx(0, &pBCtx));

pPrstMnkr->Load(0, pMoniker, pBCtx, STGM_READWRITE);

Problem - image.jpg loads fine, but the paths //test/test.js and /image.jpg and //image.jpg take a very long time to resolve/load. From what I understand CreateURLMonikerEx is supposed to use file:///path/to/executable/ and prepend that automatically to these paths in which case they would fail instantly - file:///path/to/executable//test/test.js for example. That does not happen.

I additionally tried to move image.jpg to a subfolder and then create custom IMoniker interface with the GetDisplayName and BindToStorage implementation which loaded the image from a custom path. However it doesn't do the same for paths which start with // or /. Even though I output file:///path/to/executable/ in the GetDisplayName through the *ppszDisplayName parameter.

How can I avoid extended time loading such unusable links (discard them), or redirect them to local path as above?

I found a partial solution to use about:blank in the *ppszDisplayName but then it doesn't load images with the valid path image.jpg as then it loads them as about:image.jpg which again is invalid path.

Additionally - I've tried adding IDocHostUIHandler interface with the implementation of Invoke method (DISPID_AMBIENT_DLCONTROL) with the pVarResult->lVal = DLCTL_NO_SCRIPTS | DLCTL_NO_JAVA | DLCTL_NO_RUNACTIVEXCTLS | DLCTL_NO_DLACTIVEXCTLS | DLCTL_NO_FRAMEDOWNLOAD | DLCTL_FORCEOFFLINE; - it it blocks the download of images entirely, but still does check 20-30 seconds for the links starting with // or /.


Solution

  • Update - this doesn't work well!

    The code below doesn't work well! The problem is - it loses <BODY> tag attributes. BODY tag turns out entirely empty after loading. I ended up loading the message using IHTMLDocument2.write method.

    See: Assigning IHTMLDocument2 instance to a TWebBrowser instance

    After spending lots of time and no guidance of any kind here, I believe that it is not possible to avoid this wait 20-30 sec when the links are invalid. I found another solution and if someone wants to supplement this solution, feel free to do so.

    Instead what I had to do is to create an instance of CLSID_HTMLDocument (IHTMLDocument3 or IHTMLDocument2 interface) and then load the document into that container and parse the links prior to doing anything with them. This is described on:

    https://learn.microsoft.com/en-us/previous-versions/aa703592(v=vs.85)

    This also helped:

    How to load html contents from stream and then how to create style sheet to display the html file in preview pane (like HTML preview handler)

    After parsing the document URLs and fixing the invalid ones, it can be saved/displayed in the actual TWebBrowser.

    Rough solution (C++ Builder):

    try
        {
        DelphiInterface<IHTMLDocument2> diDoc2;
        OleCheck(CoCreateInstance(CLSID_HTMLDocument, NULL, CLSCTX_INPROC_SERVER, IID_IHTMLDocument2, (void**)&diDoc2));
    
        DelphiInterface<IPersistStreamInit> diPersist;
        OleCheck(diDoc2->QueryInterface(IID_IPersistStreamInit, (void**)&diPersist));
        OleCheck(diPersist->InitNew());
    
        DelphiInterface<IMarkupServices> diMS;
        OleCheck(diDoc2->QueryInterface(IID_IMarkupServices, (void**)&diMS));
    
        DelphiInterface<IMarkupPointer> diMkStart;
        DelphiInterface<IMarkupPointer> diMkFinish;
    
        OleCheck(diMS->CreateMarkupPointer(&diMkStart));
        OleCheck(diMS->CreateMarkupPointer(&diMkFinish));
    
        // ...Load from file or memory stream into your WideString here...
    
        DelphiInterface<IMarkupContainer> diMC;
        OleCheck(diMS->ParseString(WideString(MsgHTMLSrc).c_bstr(), 0, &diMC, diMkStart, diMkFinish));
    
        DelphiInterface<IHTMLDocument2> diDoc;
        OleCheck(diMC->QueryInterface(IID_PPV_ARGS(&diDoc)));
    
        DelphiInterface<IHTMLElementCollection> diCol;
        OleCheck(diDoc->get_all(&diCol));
    
        long ColLen = 0;
        OleCheck(diCol->get_length(&ColLen));
    
        for (int i = 0; i < ColLen; ++i)
            {
            DelphiInterface<IDispatch> diItem;
            diCol->item(OleVariant(i), OleVariant(i), &diItem);
    
            DelphiInterface<IHTMLElement> diElem;
            OleCheck(diItem->QueryInterface(IID_IHTMLElement, (void**)&diElem));
    
            WideString wTagName;
            OleCheck(diElem->get_tagName(&wTagName));
    
            if (StartsText("img", wTagName))
                {
                OleVariant vSrc;
                OleCheck(diElem->getAttribute(OleVariant("src"), 4, vSrc));
    
                // Make changes to vSrc here....
    
                // And save it back to src
                OleCheck(diElem->setAttribute(OleVariant("src"), vSrc, 0));
                }
            else if (StartsText("script", wTagName)) 
                {
                // More parsing here...
                }
            }
        }
    catch (EOleSysError& e)
        {
        // Process exception as needed
        }
    catch (Exception& e)
        {
        // Process exception as needed
        }
    

    After full parsing of all required elements (img/src, script/src, base/href etc.) save and load into TWebBrowser.

    I only now have to see if the parsed HTML IHTMLDocument2 can be directly assigned to TWebBrowser without loading it again, but that is another question (See - Assigning IHTMLDocument2 instance to a TWebBrowser instance)