tidy

how to pass tag name to the tidy custom-tags option


I try to indent an HTML file with tidy, the HTML file has with "non HTML" tags starting by pin-.

You can got it with:

$ URL="https://policy.pinterest.com/en/digital-services-act-transparency-report"
$ curl -s $URL > dsatr.html

The starting pin- tags:

$ grep "pin-" dsatr.html -c
300
$ grep "pin-" dsatr.html |head -2
  <pin-header
</pin-header>

tidy in HTML mode (the default) can not clean/indent this HTML because of this pin tags (output recommend to use custom-tags):

$ tidy -o out.html dsatr.html
line 46 column 3 - Error: <pin-header> is not recognized! Did you mean to enable the custom-tags option?
line 66 column 7 - Error: <pin-page-header> is not recognized! Did you mean to enable the custom-tags option?
line 74 column 7 - Error: <pin-paragraph> is not recognized! Did you mean to enable the custom-tags option?
line 84 column 5 - Error: <pin-grid> is not recognized! Did you mean to enable the custom-tags option?
line 95 column 15 - Error: <pin-heading> is not recognized! Did you mean to enable the custom-tags option?
$

With custom-tags we get no better results

$ cat ct.txt
custom-tags=pin-header,pin-page-header,pin-paragraph,pin-grid,pin-heading
$ tidy --config ct.txt -o out.html dsatr.html
custom-tags=pin-header,pin-page-header,pin-paragraph,pin-grid,pin-heading
line 46 column 3 - Error: <pin-header> is not recognized! Did you mean to enable the custom-tags option?
line 66 column 7 - Error: <pin-page-header> is not recognized! Did you mean to enable the custom-tags option?
line 74 column 7 - Error: <pin-paragraph> is not recognized! Did you mean to enable the custom-tags option?
line 84 column 5 - Error: <pin-grid> is not recognized! Did you mean to enable the custom-tags option?
line 95 column 15 - Error: <pin-heading> is not recognized! Did you mean to enable the custom-tags option?
$

Neither with new-blocklevel-tags:

$ cat nbt.txt 
new-blocklevel-tag=pin-header,pin-page-header,pin-paragraph,pin-grid,pin-heading
$ tidy --config nbt.txt -o out.html dsatr.html 
new-blocklevel-tag=pin-header,pin-page-header,pin-paragraph,pin-grid,pin-heading
line 46 column 3 - Error: <pin-header> is not recognized! Did you mean to enable the custom-tags option?
line 66 column 7 - Error: <pin-page-header> is not recognized! Did you mean to enable the custom-tags option?
line 74 column 7 - Error: <pin-paragraph> is not recognized! Did you mean to enable the custom-tags option?
line 84 column 5 - Error: <pin-grid> is not recognized! Did you mean to enable the custom-tags option?
line 95 column 15 - Error: <pin-heading> is not recognized! Did you mean to enable the custom-tags option?
$ 

On the official documentation no way to find a working example of custom-tags.

$ tidy -help-option custom-tags

`--custom-tags`

This option enables the use of tags for autonomous custom elements, e.g.      
<flag-icon> with Tidy. Custom tags are disabled if this value is no. 
Other settings - blocklevel, empty, inline, and pre will treat all    
detected custom tags accordingly.                                             
                                                                              
The use of new-blocklevel-tags, new-empty-tags,             
new-inline-tags, or new-pre-tags will override the treatment
of custom tags by this configuration option. This may be useful if you have   
different types of custom tags.                                               
                                                                              
When enabled these tags are determined during the processing of your document 
using opening tags; matching closing tags will be recognized accordingly, and 
unknown closing tags will be discarded.
$

Solution

  • You use the values suggested in the help text. I'm not sure if other values are supported, at least on v5.7.45 that I'm getting from my package manager.

    For example

    $ URL="https://policy.pinterest.com/en/digital-services-act-transparency-report"
    $ curl -s $URL > dsatr.html
    $ tidy --custom-tags blocklevel dsatr.html
    

    At the head of the output you'll see that it's interpreting the tags as block level elements.

    line 46 column 3 - Info: detected autonomous custom tag <pin-header>; will treat as block level
    line 66 column 7 - Info: detected autonomous custom tag <pin-page-header>; will treat as block level
    line 74 column 7 - Info: detected autonomous custom tag <pin-paragraph>; will treat as block level
    line 84 column 5 - Info: detected autonomous custom tag <pin-grid>; will treat as block level
    line 95 column 15 - Info: detected autonomous custom tag <pin-heading>; will treat as block level
    line 158 column 141 - Info: detected autonomous custom tag <pin-link>; will treat as block level