I have recently moved from excel VBA automation to try out the autohotkey automation based on http://the-automator.com/web-scraping-intro-with-autohotkey/ tutorial, but I can't seem to understand well the code, could someone please point me in the right direction?
I am trying to make my F1 key to scrape some data on the current active.
F1::
pwb := ComObjCreate("InternetExplorer.Application") ;create IE Object
pwb.visible:=true ; Set the IE object to visible
pwb := WBGet()
;************Pointer to Open IE Window******************
WBGet(WinTitle="ahk_class IEFrame", Svr#=1) { ;// based on ComObjQuery docs
static msg := DllCall("RegisterWindowMessage", "str", "WM_HTML_GETOBJECT")
, IID := "{0002DF05-0000-0000-C000-000000000046}" ;// IID_IWebBrowserApp
;// , IID := "{332C4427-26CB-11D0-B483-00C04FD90119}" ;// IID_IHTMLWindow2
SendMessage msg, 0, 0, Internet Explorer_Server%Svr#%, %WinTitle%
if (ErrorLevel != "FAIL") {
lResult:=ErrorLevel, VarSetCapacity(GUID,16,0)
if DllCall("ole32\CLSIDFromString", "wstr","{332C4425-26CB-11D0-B483-00C04FD90119}", "ptr",&GUID) >= 0 {
DllCall("oleacc\ObjectFromLresult", "ptr",lResult, "ptr",&GUID, "ptr",0, "ptr*",pdoc)
return ComObj(9,ComObjQuery(pdoc,IID,IID),1), ObjRelease(pdoc)
}
}
}
I understand this code creates a new IE application, but what if I don't want to create one? Which is just to get the current active window? I saw a few codes that allow me to get the current active browser URL, but I can't seem to get the current active browser elements.
So far I have tried this. Can someone tell me how do I get it to point to the active page and get some of its data?
F1::
wb := WBGet()
if !instr(wb.LocationURL, "https://www.google.com/")
{
wb := ""
return
}
doc := wb.document
h2name := rows[0].getElementsByTagName("h2")
FileAppend, %h2name%, Somefile.txt
Run Somefile.txt
return
WBGet(WinTitle="ahk_class IEFrame", Svr#=1) { ;// based on ComObjQuery docs
static msg := DllCall("RegisterWindowMessage", "str", "WM_HTML_GETOBJECT")
, IID := "{0002DF05-0000-0000-C000-000000000046}" ;// IID_IWebBrowserApp
;// , IID := "{332C4427-26CB-11D0-B483-00C04FD90119}" ;// IID_IHTMLWindow2
SendMessage msg, 0, 0, Internet Explorer_Server%Svr#%, %WinTitle%
if (ErrorLevel != "FAIL") {
lResult:=ErrorLevel, VarSetCapacity(GUID,16,0)
if DllCall("ole32\CLSIDFromString", "wstr","{332C4425-26CB-11D0-B483-00C04FD90119}", "ptr",&GUID) >= 0 {
DllCall("oleacc\ObjectFromLresult", "ptr",lResult, "ptr",&GUID, "ptr",0, "ptr*",pdoc)
return ComObj(9,ComObjQuery(pdoc,IID,IID),1), ObjRelease(pdoc)
}
}
}
Try to test if the variable would write onto the somefile.txt, not too sure how it should test with msgbox. It kept writing the whole script instead of showing the result.
To work on the active window's active tab (if it's an Internet Explorer window):
q::
WinGet, hWnd, ID, A
WinGetClass, vWinClass, ahk_id %hWnd%
if !(vWinClass = "IEFrame")
Return
wb := WBGet("ahk_id " hWnd)
MsgBox % wb.document.activeElement.tagName "`r`n" wb.document.activeElement.innerText
wb := ""
Return
To work on the first found Internet Explorer window's active tab:
w::
WinGet, hWnd, ID, ahk_class IEFrame
wb := WBGet()
;wb := WBGet("ahk_class IEFrame") ;this line is equivalent to the one above
MsgBox % wb.document.activeElement.tagName "`r`n" wb.document.activeElement.innerText
wb := ""
Return
Regarding h2name, I don't believe that this will do anything, because 'rows' is not defined anywhere in the script.
h2name := rows[0].getElementsByTagName("h2")
The following might work:
h2name := ""
try h2name := wb.document.getElementsByTagName("h2").item[0].name
MsgBox % h2name
MsgBox % wb.document.getElementsByTagName("h2").item[0].tagName
MsgBox % wb.document.getElementsByTagName("h2").item[0].innerText
In your link I think by 'name' they are referring to LocationName (the tab's title):
MsgBox % wb.LocationName
MsgBox % wb.document.title ;more reliable
For the entire page's innerText:
MsgBox % wb.document.documentElement.innerText
HTH