javascriptregexrubyweb-scraping

RegEx problem or maybe another solution altogether?


The problem I'm having is that I have a block of JavaScript I've successfully scraped out of a websites source and now I have to sift through to get the specific values I'm looking for.

I need to find flvFileName and get all the file names listed. In this case it's 'trailer1,trailer2,trailer3'.

At first I started using regex to match the start and end tags and then match the file names and extract them to an array, but the problem is that there isn't always three videos in the list. There could be zero or more, so matching doesn't work. Any thoughts on a way to approach this that won't make me continue to abuse my laptop?

... ,flashvars: {flvFileName: 'trailer1,trailer2,trailer3', age: 'no', isForced: 'true'} }); });

Solution

  • Assuming it's a string (or you can get it to be a string)

    p str.split(/flvFileName: '|', age/)[1].split(',')
    #=> ["trailer1", "trailer2", "trailer3"]
    

    This will split the thing in 3 parts:

    Then split the good stuff on a comma.