I am stuck in the HTML Purifier configuration to not removed any href attribute of anchor tags.
Current output:
Expected output: (with href attr)
Below is my HTML Purifier function:
function html_purify($content)
{
if (hooks()->apply_filters('html_purify_content', true) === false) {
return $content;
}
$CI = &get_instance();
$CI->load->config('migration');
$config = HTMLPurifier_HTML5Config::create(
HTMLPurifier_HTML5Config::createDefault()
);
$config->set('HTML.DefinitionID', 'CustomHTML5');
$config->set('HTML.DefinitionRev', $CI->config->item('migration_version'));
// Disables cache
// $config->set('Cache.DefinitionImpl', null);
$config->set('HTML.SafeIframe', true);
$config->set('Attr.AllowedFrameTargets', ['_blank']);
$config->set('Core.EscapeNonASCIICharacters', true);
$config->set('CSS.AllowTricky', true);
// These config option disables the pixel checks and allows
// specifiy e.q. widht="auto" or height="auto" for example on images
$config->set('HTML.MaxImgLength', null);
$config->set('CSS.MaxImgLength', null);
//Customize - Allow image data
$config->set('URI.AllowedSchemes', array('data' => true));
//allow YouTube and Vimeo
$regex = hooks()->apply_filters('html_purify_safe_iframe_regexp', '%^(https?:)?//(www\.youtube(?:-nocookie)?\.com/embed/|player\.vimeo\.com/video/)%');
$config->set('URI.SafeIframeRegexp', $regex);
hooks()->apply_filters('html_purifier_config', $config);
$def = $config->maybeGetRawHTMLDefinition();
if ($def) {
$def->addAttribute('p', 'pagebreak', 'Text');
$def->addAttribute('div', 'align', 'Enum#left,right,center');
$def->addElement(
'iframe',
'Inline',
'Flow',
'Common',
[
'src' => 'URI#embedded',
'width' => 'Length',
'height' => 'Length',
'name' => 'ID',
'scrolling' => 'Enum#yes,no,auto',
'frameborder' => 'Enum#0,1',
'allow' => 'Text',
'allowfullscreen' => 'Bool',
'webkitallowfullscreen' => 'Bool',
'mozallowfullscreen' => 'Bool',
'longdesc' => 'URI',
'marginheight' => 'Pixels',
'marginwidth' => 'Pixels',
]
);
}
$purifier = new HTMLPurifier($config);
return $purifier->purify($content);
}
What is the correct configuration to be added in order to allow href attr in any anchor tags?
URI.AllowedSchemes
is a whitelist, so the setting you're plugging into it allows only data
URLs to the exclusion of others. Since this marks the URL https://google.com
as a disallowed value for href
, the href
is empty, and the empty href
is stripped.
If you want to expand the default whitelist, here it is for reference:
array (
'http' => true,
'https' => true,
'mailto' => true,
'ftp' => true,
'nntp' => true,
'news' => true,
'tel' => true,
)