javascriptandroidweb-scrapingjsoup

Scrape full html data from weather website


I'm trying to get weather data from this website:

https://www.ilmeteo.it/meteo/Magenta/previsioni-orarie?refresh_ce

with the code:

 try {
                int i = 0;
                if (googlefirst3.startsWith("http")) {
                    Document document = Jsoup.connect("https://www.ilmeteo.it/meteo/Magenta/previsioni-orarie?refresh_ce").userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11 Firefox/19.0").timeout(0).get();
                    Elements temp = document.select("tr");

                    String verifica;
                    verifica=document.html();
                    for (Element movielist : temp) {
                        i++;
                        html = (i + "|||" + movielist.getElementsByTag("td").first().html());
                        array3b[i] = html;

                    }
                }

            } catch (IOException e) {
                e.printStackTrace();}

I'm trying to get the table rows with temperature, wind and time data:

dataimtryingtoget

but I'm unable to get it. The document I get doesn't contain this data and seems to be incomplete. I thought this was due to javascript generated html, but even with this method:

How do I get the web page contents from a WebView?

I was unable to get it. I'm not sure javascript is the issue. Can anybody help me at least trying too identify the problem nature?

Many thanks in advance.


Solution

  • The page you're trying to parse includes content with data using iframe.

    <iframe name="frmprevi" id="frmprevi" 
    src="https://www.ilmeteo.it/portale/meteo/previsioni1.php?citta=Magenta&amp;c=3749&amp;gm=25" 
    width="660" height="600" marginheight="0" marginwidth="0" scrolling="no"
    frameborder="0" style="margin:0px;padding:0px"></iframe>
    

    That's why it's not accessible to Jsoup. To get the data you want just parse directly the URL from iframe src: https://www.ilmeteo.it/portale/meteo/previsioni1.php?citta=Magenta&c=3749&gm=25

    Now it should be easy, but be aware that the paremeter gm=25 in the URL may represent 25th day of month so you'll have to change it accordingly to get the data for different day.