pythonhtmlseleniumweb-scripting

Grab information from inside the form using web scraping - python 3


I'm creating a project using web scraping , I'm having trouble extracting information from an Iframe form, when I try to extract the values ​​of the name , position and company field.

Code I'm testing:

replay = browser.switch_to.frame(browser.find_element(By.XPATH, "/html/body/div[1]/text()[1]")).get_text().strip()

it is giving the following error:

"selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: The result of the xpath expression "/html/body/div[1]/text()[1]" is: [object Text]. It should be an element .

I send an image of the form and the information I'm trying to get as an attachment, can anyone give me some tips?enter image description here

<iframe id="content210835787_ifr" frameborder="0" allowtransparency="true" title="Área de texto formatado.Pressione ALT-F9 para exibir o menu, ALT-F10 para exibir a barra de ferramentas ou ALT-0 para exibir a ajuda" style="width: 100%; height: 465px; display: block;" data-mce-style="width: 100%; height: 465px; display: block;"class="selectorgadget_selected"></iframe>



<body id="tinymce" class="mce-content-body " data-id="content210835787" contenteditable="true"style="overflow-y: hidden; padding-left: 1px; padding-right: 1px; padding-bottom: 50px;" data-mce-style="overflow-y: hidden; padding-left: 1px; padding-right: 1px; padding-bottom: 50px;"><h2><strong>Formulário - Confecção de usuário de acesso</strong></h2><div>Nome Completo: &nbsp; Solicitação aberta para teste<br><br>Matrícula:&nbsp; 2354<br><br>Centro de Custo:&nbsp; VS | 123 </div><div>&nbsp; <br><br>Cargo: &nbsp; Analista de Teste</div><div><br></div><div><br><br>&nbsp; <br><br>&nbsp; <br><br>Tipo de Acesso: &nbsp; Rede<br><br>Empresa que o colaborador foi cadastrado pelo RH? &nbsp; VS EMpresarial</div></body>

The yellow markings are the information I'm trying to get


Solution

  • This could be optional:

    browser.switch_to.frame(By.ID,"content210835787_ifr")
    

    Mandatory

    elem=browser.find_element(By.XPATH,"//body[@id='tinymce']/div")
    print(elem.text)
    

    Without a url to verify off I'm not too sure but switch to your iframe and then look for the body with that id and that div. Then print it's .text or .get_attribute('innerHTML')