javascriptstringtypescriptparsingifttt

Parsing a song title from /r/listenToThis for an IFTTT applet


I have an array of song titles, coming from this subreddit, looking like this:

[
  "Lophelia -- MYTCH [Acoustic Prog-Rock/Jazz] (2019)",
  "Julia Jacklin - Pressure to Party [Rock] (2019)",
  "The Homeless Gospel Choir - I'm Going Home [Folk-Punk] (2019) cover of Pat the Bunny | A Fistful of Vinyl",
  "Lea Salonga and Simon Bowman - The last night of the world [musical] (1990)",
  "$uicideboy$ - Death",
  "SNFU -- Joni Mitchell Tapes [Punk/Alternative] (1993)",
  "Blab - afdosafhsd (2000)",
  "Something strange and badly formatted without any artist [Classical]",
  "シロとクロ「ミッドナイトにグッドナイト」(Goodnight to Midnight - Shirotokuro) - (Official Music Video) [Indie/Alternative]",
  "Victor Love - Irrationality (feat. Spiritual Front) [Industrial Rock/Cyberpunk]"
  ...
]

I am trying to parse the title and artist from them but am really struggling with regex.

I tried splitting it using "-" but it's really annoying to only get the artist afterwards.

I tried using regex too but I can't really get something working properly. This is what I had for the artist: /(?<= -{1,2} )[\S ]*(?= \[|\( )/i and this for the title: /[\S ]*(?= -{1,2} )/i.

Every entry is a song title. Before the song title could be the song's artist followed by one or two (or maybe 3?) dashes. Then the genres could be added in square brackets and/or the release date in parentheses. I do not expect perfect accuracy, some formats might be weird, in those cases, I would rather have artist be undefined than some strange parsing.

For exemple:

[
  { title: "MYTCH", artist: "Lophelia" },
  { title: "Pressure to Party", artist: "Julia Jacklin" },
  { title: "I'm Going Home", artist: "The homeless Gospel Choir" },
  { title: "The last night of the world", artist: "Lea Salonga and Simon Bowman" },
  { title: "Death", artist: "$uicideboy$" },
  { title: "Joni Mitchell Tapes", artist: "SNFU" },
  { title: "afdosafhsd", artist: "Blab" },
  { title: "Something strange and badly formatted without any artist" },
  { title: "Goodnight to midnight", artist: "shirotokuro" }, // Probably impossible with some kind of AI
  { title: "Irrationality" artist: "Victor Love" }
]

Solution

  • You can use this regex that captures the title and artist part as you described in your post.

    ^([^-[\]()\n]+)-* *([^[\]()\n]*)
    

    Regex Demo (deliberately shown in PCRE flavor to preserve group colors for visual appeal but it works in Javascript flavor too)

    JS Code demo,

    const songs = ["Lophelia -- MYTCH [Acoustic Prog-Rock/Jazz] (2019)",
    "Julia Jacklin - Pressure to Party [Rock] (2019)",
    "The Homeless Gospel Choir - I'm Going Home [Folk-Punk] (2019) cover of Pat the Bunny | A Fistful of Vinyl",
    "Lea Salonga and Simon Bowman - The last night of the world [musical] (1990)",
    "Lophelia -- MYTCH [Acoustic Prog-Rock/Jazz]",
    "Death - $uicideboy$",
    "SNFU -- Joni Mitchell Tapes [Punk/Alternative] (1993)",
    "Title - Aritst (2000)",
    "Something strange and badly formatted without any artist [Classical]"]
    
    songs.forEach(song => {
      m = /^([^-[\]()\n]+)-* *([^[\]()\n]*)/.exec(song)
      console.log("Title: " + m[1] + ", Artist: " + m[2])
    })