pythonhtmlperlrenderingrendering-engine

Finding rendered HTML element positions using WebKit (or Gecko)


I would like to get the dimensions (coordinates) for all the HTML elements of a webpage as they are rendered by a browser, that is the positions they are rendered at. For example, (top-left,top-right,bottom-left,bottom-right)

Could not find this in lxml. So, is there any library in Python that does this? I had also looked at Mechanize::Mozilla in Perl but, that seems difficult to configure/set-up.

I think the best way to do this for my requirement is to use a rendering engine - like WebKit or Gecko.

Are there any perl/python bindings available for the above two rendering engines? Google searches for tutorials on how to "plug-in" to the WebKit rendering engine is not very helpful.


Solution

  • I was not able to find any easy solution (ie. Java/Perl/Python :) to hook onto Webkit/Gecko to solve the above rendering problem. The best I could find was the Lobo rendering engine written in Java which has a very clear API that does exactly what I want - access to both DOM and the rendering attributes of HTML elements.

    JRex is a Java wrapper to Gecko rendering engine.