parsingrebolrebol3

Parsing string input for keywords followed by content


I'm trying to parse some string input but I'm struggling to see the solution. However, this must be a well-known pattern-- it's just one I don't encounter frequently.

Background: I have a short list of string keywords ("HEAD", "GET", "POST", "PUT") each of which are followed by additional string data. There can be multiple of the sequence, in any order ("KEYWORD blah blah blah KEYWORD blah blah blah"). There are no termination characters or ending keywords as XML would have -- there's either a new occurance of a keyword clause or the end of the input. Sample:

    str: {HEAD stuff here GET more stuff here POST other stuff here GET even more stuff here PUT still more stuff here POST random stuff}

The output I'd like to achieve:

    results: [
        "HEAD" ["stuff here"] 
        "GET"  ["more stuff here" "even more stuff here"] 
        "POST" ["other stuff here" "random stuff"] 
        "PUT"  ["still more stuff here"]
    ]

My poor attempt at this is:

    results: ["head" [] "get" [] "post" [] "put" []]
    rule1: ["HEAD" (r: "head") | "GET" (r: "get") | "POST" (r: "post") | "PUT" (r: "put")]
    rule2: [to "HEAD" | to "GET" | to "POST" | to "PUT" | to end]

    parse/all str [
        some [
            start: rule1 rule2 ending: 
            (offs: offset? start ending 
            append select results r trim copy/part start offs
            ) :ending 
        | skip]
    ]

I know that rule-2 is the clunker-- the use of the "to" operators is not the right way to think about this pattern; it skips to the next occurrance of the first available keyword in that rule block when I want it to find any of the keywords.

Any tips would be appreciated.


Solution

  • How about this...

    ;; parse rules
    keyword: [{HEAD} | {GET} | {POST} | {PUT}]
    content: [not keyword skip]
    
    ;; prep results block... ["HEAD" [] "GET" [] "POST" [] "PUT" []]
    results: []
    forskip keyword 2 [append results reduce [keyword/1 make block! 0]]
    
    parse/case str [
        any [
            copy k keyword copy c some content (
                append results/:k trim c
            )
        ]
    ]
    

    Using your str then results will have what you wanted....

    ["HEAD" ["stuff here"] "GET" ["more stuff here" "even more stuff here"] "POST" ["other stuff here" "random stuff"] "PUT" ["still more stuff here"]]