javascriptnode.jsweb-scrapingweb-crawlerx-ray

Selecting multiple tags with the same className?


Using this syntax:

x('http://www.viadeo.com/fr/company/unicef', 
    '.page-content',
    [{
    img:'img@src',
    bio:'.pan-desc-description',

    org:'.pan-desc-footer-element @element-value',
    link: '.element-value a@href',
    **twitter:'.element-value a@href'** // I get the previous link not the twitter one 

}]).write('result.json')

There are multiple items within the website with that specific classname, but it only returns the first one. Is there a way to grab all of them and maybe I can do a .limit with that return? I apologize if it's in the documentation, I've read through it twice and it looks like it's not explicitly said anywhere.


Solution

  • You can take advantage of the chrome inspector tool to get proper selector,

    Here, this code worked for me,

    var Xray = require('x-ray');
    var x = Xray();
    x('http://www.viadeo.com/fr/company/unicef', 
    '.page-content',
     [{
      img:'img@src',
      bio:'.pan-desc-description',
      org:'.pan-desc-footer-element @element-value',
      link: '.element-value a@href',
      twitter:'.mbs:nth-child(4) a@href' // or use div.element-value.gu.gu-last a@href
    }]).write('result.json')
    

    And there, we get this result.

    [
      {
        "img": "http://static8.viadeo-static.com/fzv6VNzGukb7mt5oV0Nl-wQxCDI=/fit-in/200x200/filters:fill(white)/7766b960b98f4e85affdab7ffa9863c7/1434471183.jpeg",
        "bio": "Le Fonds des Nations unies pour l'enfance (abrégé en UNICEF ou Unicef pour United Nations International Children's Emergency Fund en anglais) est une agence de l'ONU consacrée à l'amélioration et à la promotion de la condition des enfants. Son nom était originellement United Nations International Children's Emergency Fund, dont elle a conservé l'acronyme. Elle a activement participé à la rédaction, la conception et la promotion de la convention relative aux droits de l'enfant (CIDE), adoptée suite au sommet de New York en 1989. Son revenu total en 2006 a été de 2 781 millions Dollar US.\r\n          L'UNICEF a reçu le prix Nobel de la paix en 1965.",
        "link": "http://www.unicef.org/",
        "twitter": "http://www.twitter.com/UNICEF "
      }
    ]
    

    Here is how you can get a proper selector on chrome:

    First you right click and click inspect. enter image description here

    Then you click copy selector, and use it. enter image description here

    When you copy the selector, it'll say something like,

    #pan-desc > div.pan-desc-grey > div > div:nth-child(4) > div.element-value.gu.gu-last > a
    

    You can use it directly, or refine it.