pythonjsonweb-scrapingng-view

How to get the last table from this site?


I'm trying to get the last table from this site with python. Below you find my actual trying to do it.

The table is named as "Dados Colocação, nos Termos do Anexo VII da Instrução CVM nº 400, de 2003".

lin_cvm_oferta = 'http://web.cvm.gov.br/app/esforcosrestritos/#/enviarFormularioEncerramento?type=dmlldw%3D%3D&ofertaId=MTE3NDE%3D&state=eyJhbm8iOiJNakF4T0E9PSIsInZhbG9yIjoiTVRVPSIsImNvbXVuaWNhZG8iOiJNUT09Iiwic2l0dWFjYW8iOiJNZz09In0%3D'
html = requests.get(lin_cvm_oferta).text
print(html)

And when I print the html it doesn't get any of the data.

The first part of the table I already got with Json as my friend @JackFleeting helped me in this other question (here). PS: I know that there is a similar solution here. But I don't want to use Selenium.


Solution

  • This one is different from your previous question - the page uses the post, not get method. You have to use the developer/network/xhr tool in your browser to extract the url and the payload, and then post it like this:

    import requests      
    import json  
    
    url = 'http://web.cvm.gov.br/app/esforcosrestritos/comunicado/getUltimoComunicado'
    
    payload = {"id":931,"dataInclusao":"2016-05-20T09:26:00Z", "dataInicio":"2016-05-18T00:00:00Z","dataEnceramento":"2016-07-05T00:00:00Z", "numeroEmissao":1,"quantidadeSerie":140,"valorMobiliario":{"id":11,
        "dataInclusao":"2015-12-01T00:00:00Z",
        "descricao":"CERTIFICADOS DE RECEBÍVEIS IMOBILIÁRIOS - CRI",
        "relacionadoFundoInvestimento":False,"situacao":"ATIVO"},
        "tipoEspecie":{"id":3,"descricao":"Sem Preferência"},
        "tipoClasse":{"id":4,"descricao":"Não Aplicável"},
        "tipoOferta":{"id":1,"descricao":"Primária"},"tipoForma":{"id":3,"descricao":"Nominativa e Escritural"},"ofertante":{"id":1860,"nomeResponsavel":"RB CAPITAL COMPANHIA DE SECURITIZAÇÃO","cnpj":2773542000122,"paginaWeb":"http://www.rbcapital.com/","tipoSocietario":{"id":4,"descricao":"Sociedade Anônima de Capital Aberto"}},"emissor":{"id":1859,"nomeResponsavel":"RB CAPITAL COMPANHIA DE SECURITIZAÇÃO","cnpj":2773542000122,"paginaWeb":"http://www.rbcapital.com/","tipoSocietario":{"id":4,"descricao":"Sociedade Anônima de Capital Aberto"}},"lider":{"id":931,"nrPfPj":17298092000130,"dataRegistro":"1998-10-15T00:00:00Z","codigoTipoPessoa":"PJ","codigoTipoParticipante":12},"instituicoesIntermediarias":[{"id":1089,"nrPfPj":59588111000103,"dataRegistro":"1991-08-12T00:00:00Z","codigoTipoPessoa":"PJ","codigoTipoParticipante":12,"denominacaoSocial":"BANCO VOTORANTIM SA"},{"id":1090,"nrPfPj":90400888000142,"dataRegistro":"1990-12-20T00:00:00Z","codigoTipoPessoa":"PJ","codigoTipoParticipante":12,"denominacaoSocial":"BANCO SANTANDER (BRASIL) S.A."}],
                   "valorPrecoUnitario":"1.000,00","inativo":False,
                   "qtdValoresMobiliarios":0,"valorTotalOferta":0,"variasSeries":True}
    
    
    headers = {'content-type': 'application/json'}
    
    resp = requests.post(url, data=json.dumps(payload), headers=headers)    
    data = json.loads(resp.content)
    print(data)
    

    Note that, depending on your IDE, you may have to manually change boolean values to True and False (uppercase, as I did above), although the site's post request itself uses lowercase.