javaandroidxpathhtmlcleaner

Using XPath causing problems


So I'm learning how to use XPath and HtmlCleaner to parse HTML but I have a problem. This is the code:

public class ScheudeleWithDesign extends Activity {

static final String urlToParse = "https://www.easistent.com/urniki/263/razredi/18221";
static final String xpathTableContents = "//div[@id='text11']";
TextView tw1;

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_scheudele_with_design);

    tw1 = (TextView) findViewById(R.id.urnikText);

    String value = "";
    value = new getScheudele().execute().toString();
    tw1.setText(value);

}//End of onCreate

private class getScheudele extends AsyncTask<Void, Void, String> {

    @Override
    protected String doInBackground(Void... params) {
        String stats = null;

        //cleaner properties
        HtmlCleaner cleaner = new HtmlCleaner();
        CleanerProperties props = cleaner.getProperties();
        props.setAllowHtmlInsideAttributes(false);
        props.setAllowMultiWordAttributes(false);
        props.setRecognizeUnicodeChars(true);
        props.setOmitComments(true);

        URL url;
        try {
            url = new URL(urlToParse);
            TagNode root = cleaner.clean(url);
            Object[] node = root.evaluateXPath(xpathTableContents);
            //Vzemi podatke če najdeš element
            if (node.length > 0) {
                TagNode resultNode = (TagNode)node[10];
                stats = resultNode.getText().toString();
            }
        } catch (MalformedURLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (XPatherException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        return stats;
    }

}

So I'm obviously trying to parse a certain data and set it as textview. This isn't working though, the result is completely wrong:

com.whizzapps.stpsurniki.ScheudeleWithDesign$getScheudele@421a7d90

My guess is that the problem is in the XPath here:

static final String xpathTableContents = "//div[@id='text11']";

I've never worked with XPath before so I'm almost sure I screwed that part up. This is the site from which I'm trying to parse data by the way. This code should only get one table element for starters, once I know how to do it I'll parse the whole table.


Solution

  • My guess is that the problem is in the XPath here:

    static final String xpathTableContents = "//div[@id='text11']";
    

    As I mentioned in my comment, the //div[@id='text11'] XPath is trying to select any div element with an id attribute equal to text11, however there are no such div elements in the referenced HTML page.

    Could you please show me an example on how to select ANY table content? Just show me an XPath code for any table content you want so that I somehow "get the structure".

    Using the HTML page you referenced, to select the div containing "2. ura", for example:

    //*[@id="seznam_ur_teden"]/table/tbody/tr[3]/td[1]/div[1]
    

    To select just the text there,

    //*[@id="seznam_ur_teden"]/table/tbody/tr[3]/td[1]/div[1]/text()
    

    To select the entire ancestral table:

    //*[@id="seznam_ur_teden"]/table