progress-4glopenedgewebspeed

OpenEdge: how to remove HTML tags from a string?


I have tried doing this:

REPLACE(string, "<*>", "").

but it doesn't seem to work.


Solution

  • REPLACE doesn't work like that. There's no wildcard matching in it.

    I've included a simple way of doing this below. However, there's lots of cases that this wont work in - non well formed html etc. But perhaps you can start here and move forward by yourself.

    What I do is look for < and > in the text and replace everything between it with a pipe (|) (you could select any character - preferably something not present in the text. When that's done all pipes are removed.

    Again, this is a quick and dirty solution and not safe for production...

    PROCEDURE cleanHtml:
        DEFINE INPUT  PARAMETER pcString  AS CHARACTER   NO-UNDO.
        DEFINE OUTPUT PARAMETER pcCleaned AS CHARACTER   NO-UNDO.
    
        DEFINE VARIABLE iHtmlTagBegins AS INTEGER     NO-UNDO.
        DEFINE VARIABLE iHtmlTagEnds   AS INTEGER     NO-UNDO.
        DEFINE VARIABLE lHtmlTagActive AS LOGICAL     NO-UNDO.
    
        DEFINE VARIABLE i AS INTEGER     NO-UNDO.
    
        DO i = 1 TO LENGTH(pcString):
            IF lHtmlTagActive = FALSE AND SUBSTRING(pcString, i, 1) = "<" THEN DO:
                iHtmlTagBegins = i.
                lHtmlTagActive = TRUE.
            END.
    
            IF lHtmlTagActive AND SUBSTRING(pcString, i, 1) = ">" THEN DO:
                iHtmlTagEnds = i.
                lHtmlTagActive = FALSE.
    
                SUBSTRING(pcString, iHtmlTagBegins, iHtmlTagEnds - iHtmlTagBegins + 1) = FILL("|", iHtmlTagEnds - iHtmlTagBegins).
            END.
        END.
    
        pcCleaned = REPLACE(pcString, "|", "").
    
    END PROCEDURE.
    
    DEFINE VARIABLE c AS CHARACTER   NO-UNDO.
    
    RUN cleanHtml("This is a <b>text</b> with a <i>little</i> bit of <strong>html</strong> in it!", OUTPUT c).
    
    MESSAGE c VIEW-AS ALERT-BOX.