c++c++builder-xe2c++builder-6c++builder-xec++builder-xe5

C++ Builder AnsiString delete all except < ... >


I have: Memo2->Text= IdHTTP1->Get("http://www.twitch.tv/starladder1");

In Memo2:

`<!DOCTYPE html>
<html lang='en' style='overflow: hidden;' xml:lang='en' xmlns:fb='http://www.facebook.com/2008/fbml' xmlns:og='http://opengraphprotocol.org/schema/' xmlns='http://www.w3.org/1999/xhtml'>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
<title>Twitch</title>
<meta content='IE=edge,chrome=1' http-equiv='X-UA-Compatible'>
<meta content='app-id=460177396, app-argument=twitch://open' name='apple-itunes-app'>
<meta content='Twitch' name='description'>
<link href='/favicon.ico' rel='shortcut icon' type='image/x-icon'>
<meta content='general' name='rating'>
<meta content='NIH9y45AePyUB62Ur2myinvJOvH77ufgjd6wKiQB6sA' name='google-site-verification'>
<a href='https://plus.google.com/115463106831870703431' rel='publisher'></a>
<meta content='Twitch' property='og:site_name'>
<meta content='161273083968709' property='fb:app_id'>
<meta content='streamname' property='og:title'>
<meta content='STREAM NAME STREAM NAME' property='og:description'>
<meta content='http://static-cdn.jtvnw.net/jtv_user_pictures/starladder1-profile_image-557367f831a49ebb-600x600.png' property='og:image'>
<meta property='og:url'>
<meta content='video.other' property='og:type'>
<meta content='http://www-cdn.jtvnw.net/swflibs/TwitchPlayer.swf?channel=starladder1&playerType=facebook' property='og:video'>
<meta content='https://www-cdn.jtvnw.net/swflibs/TwitchPlayer.swf?channel=starladder1&playerType=facebook' property='og:video:secure_url'>
<meta content='application/x-shockwave-flash' property='og:video:type'>
<meta content='378' property='og:video:height'>
<meta content='620' property='og:video:width'>`

How to delete ALL exept 'STREAM NAME STREAM NAME'. I need Label1->Caption='STREAM NAME STREAM NAME'.


Solution

  • The site in question is using XHTML, which is compatible with XML, so you can use any XML parser to extract values, like Embarcadero's TXMLDocument component or any number of third party parsers (I prefer libXML2 myself). You are interested in the content attribute of the meta element whose property attribute is og:description. Once the XHTML is parsed, you can manually loop through the elements looking at the property attributes until you find the one you want, or you can use an XPath query to find that specific element, eg: /html/head/meta[@property='og:description']

    Update: it turns that the site in question is using malformed XHTML, so an XML parser will not process it correctly. So either find a third-party HTML/XHTML parser instead, or just do a simple substring search manually, eg:

    String Resp = IdHTTP1->Get("http://www.twitch.tv/starladder1");
    String StreamName;
    
    int i = Resp.Pos("property='og:description'");
    if (i != 0)
    {
        i = PosEx("content='", Resp, RPos("<meta ", Resp, i)) + 9;
        StreamName = Resp.SubString(i, PosEx("'", Resp, i) - i);
    }
    Label1->Caption = StreamName;