In my ASP.NET Core library, I'm implementing a TagHelper
that targets <form>
elements. To operate, it needs to know the effective value of the name
attribute of all <input>
elements inside its <form>
element. How can that be achieved?
For a little more context about what I'm trying to achieve with this, I'm implementing a reusable honeypot for detecting spambots. I want the consumers of my library to just set an attribute in their form (like <form use-honeypot="true">
) and have the corresponding tag helper appending some bait (visually hidden) input fields to the form. So, bots would be tricked into filling in those fields, while humans won't even see them.
For the bait input fields to be effective, they should have attractive common names (like "email", "phone", or "message") to encourage bots to fill them. The tag helper will choose the appropriate names from an internal list of common names, but it needs to make sure that it doesn't pick a name that is already in use by a legit input field in the form. That's why I need a way to obtain the names of each descendant input/control in the form.
I'm aware that TagHelper
s can get the child content of the element they target, through members of TagHelperOutput
. However, as far as I know, these members only return the content as a string
. This would require parsing the string
, which doesn't seem ideal. It also seems counterproductive, and potentially performance heavy.
I thought of creating another TagHelper
that targets the <input>
elements (and some other form controls) and has an Order
of int.MaxValue
, that only reads the name
attribute of its element and stores it in the HttpContext
. This way, the <form>
tag helper could then get those names from the HttpContext
.
However, I believe the problem of this approach is that I don't have a guarantee that this "reader" tag helper will ever be executed, because it depends on how/if the consumer uses @addTagHelper
to import the tag helper and can be opted-out using the !
tag helper opt-out operator. Also, there could be dynamically generated form fields, that are not targeted by tag helpers.
name
attribute on descendant elements of the form
element, not just its immediate children.<form use-honeypot="true">
.ASP.NET does not implement features that matches all of your criteria. Here's the list that I parsed from your question:
name
attribute on descendant elements of the form element, not just its immediate children".The solution that would've worked is described in the following archived posts from 2015 on the Razor
subsection of the aspnet
project on GitHub.
Post Title | Comments |
---|---|
Different TagHelpers with the same element name, depending on scope #474 | Note the comment: "I'm thinking as long as there's a parent (no matter how far up) that fulfills the parent tag then the TagHelper should apply. Thoughts?" And then the follow-up comment: "That doesn't sound like it's the feature people want. I would think it has to be immediate parent only." |
Hierarchical Parent Tag restriction #4981 | This post fits your situation: "We're working on a custom Tag Helpers package, and one feature we really miss is a parent tag restriction based on tag hierarchy (not only immediate parent). ... We have made a research and found that it is fairly easy to implement hierarchical parent restriction." The resolution to the post: "We're closing this issue as there was not much community interest in this ask for quite a while now." |
My own program management spec assessment would've been to change the name of the ParentTag
property in the HtmlTargetElement
attribute to DescendentOfTag
and the implementation to be applied to all nested elements. The following example would be executed for any nested element in the tag <form-x>
that had a name
attribute.
[HtmlTargetElement(DescendentOfTag = "form-x", Attributes = "name")]
Solution | Pros | Failed to meet criteria |
---|---|---|
ParentTag | Uses built-in methods to exchange data between parent and child tags. | Only works for elements that are direct children of a parent. It does not work for nested elements. |
RegEx | Simple RegEx pattern to extract attribute value. | Requires string parsing that can be brittle for unexpected variations of inner HTML strings. |
HTML Parser | Well-established with years of development; active on GitHub; "tolerant of malformed HTML"; once HTML is parsed, more values are easily extracted. | Creates a project dependency. |
Sample summary
Solution | Sample |
---|---|
ParentTag | Parent tag helper[HtmlTargetElement("form-x")] Child tag helper [HtmlTargetElement(ParentTag = "form-x")] |
RegEx | Regex regex = new(@"(?<=\bname="")[^""]*"); IEnumerable<string> existingNames = regex.Matches(childContentHtml) .Cast<Match>() .Select(match => match.Value) .AsEnumerable(); |
HTML Parser | HTML Agility PackIEnumerable<string> existingNames = htmlDocument.DocumentNode .SelectNodes("//*/@name") .Select(node => node.GetAttributeValue("name", "")); .ToList() AngleSharp IEnumerable<string?> existingNames = document.QuerySelectorAll("*") .Select(m => m.GetAttribute("name")) .Where(s => s != null); |
(Note: The "hidden" controls in the following class below include a custom attribute data-type="hidden"
so that they can displayed for testing purposes. See the image after the 'Razor Page view' heading. Replace the custom data-type
attribute to type="hidden"
.)
Add the following NuGet
packages in your project: AngleSharp and HtmlAgilityPack.
using AngleSharp.Dom;
using AngleSharp;
using HtmlAgilityPack;
using Microsoft.AspNetCore.Razor.TagHelpers;
using System.Text.RegularExpressions;
namespace WebApplication1.Helpers
{
/* --------------------------------------------------
* RegEx
*/
[HtmlTargetElement(Attributes = "use-honeypot-regex")]
public class HoneyPotTagHelper : TagHelper
{
public override async Task ProcessAsync(TagHelperContext context, TagHelperOutput output)
{
// Remove the attribute that triggered this Tag Helper
output.Attributes.Remove(output.Attributes.First(t => t.Name == "use-honeypot-regex"));
TagHelperContent childContent = await output.GetChildContentAsync();
string childContentHtml = childContent.GetContent();
// https://stackoverflow.com/questions/5526094/regex-to-extract-attribute-values
Regex regex = new(@"(?<=\bname="")[^""]*");
// https://stackoverflow.com/questions/12730251/convert-result-of-matches-from-regex-into-list-of-string/12730562#12730562
IEnumerable<string> existingNames = regex.Matches(childContentHtml).Cast<Match>().Select(match => match.Value).AsEnumerable();
IEnumerable<string> knownNames = ["email", "phone", "message"];
// var result = list1.Except(list2);
// "will give you all items in list1 that are not in list2."
// https://stackoverflow.com/questions/11418942/linq-find-all-items-in-one-list-that-arent-in-another-list
IEnumerable<string> unusedNames = knownNames.Except(existingNames, StringComparer.OrdinalIgnoreCase);
// Set a default value if there are no unused entries from the known names list
string nameStr = unusedNames.Any() ? unusedNames.First() : "default";
// 'data-type' attribute is used for testing purposes. Change to 'type'.
output.PostContent.SetHtmlContent($"<input data-type=\"hidden\" name=\"{nameStr}\" placeholder=\"{nameStr}\" />");
}
}
/* --------------------------------------------------
* AngleSharp
* https://github.com/AngleSharp/AngleSharp
* https://anglesharp.github.io
* https://github.com/AngleSharp/AngleSharp/issues/199#issuecomment-164123291
*/
[HtmlTargetElement(Attributes = "use-honeypot-angle")]
public class HoneyPotAngleTagHelper : TagHelper
{
public override async Task ProcessAsync(TagHelperContext context, TagHelperOutput output)
{
// Remove the attribute that triggered this Tag Helper
output.Attributes.Remove(output.Attributes.First(t => t.Name == "use-honeypot-angle"));
TagHelperContent childContent = await output.GetChildContentAsync();
string childContentHtml = childContent.GetContent();
// Create a new context for evaluating web pages with the default config
IBrowsingContext browsingContext = BrowsingContext.New(Configuration.Default);
// Create a document from a virtual request / response pattern
IDocument document = await browsingContext.OpenAsync(req => req.Content(childContentHtml));
IEnumerable<string?> existingNames = document.QuerySelectorAll("*")
.Select(m => m.GetAttribute("name"))
.Where(s => s != null);
IEnumerable<string> knownNames = ["email", "phone", "message"];
// var result = list1.Except(list2);
// "will give you all items in list1 that are not in list2."
// https://stackoverflow.com/questions/11418942/linq-find-all-items-in-one-list-that-arent-in-another-list
IEnumerable<string?> unusedNames = knownNames.Except(existingNames, StringComparer.OrdinalIgnoreCase);
// Set a default value if there are no unused entries from the known names list
string? nameStr = unusedNames is not null && unusedNames.Any() ? unusedNames.First() : "default";
// 'data-type' attribute is used for testing purposes. Change to 'type'.
output.PostContent.SetHtmlContent($"<input data-type=\"hidden\" name=\"{nameStr}\" placeholder=\"{nameStr}\" />");
}
}
/* --------------------------------------------------
* HTML Agility Pack
* https://github.com/zzzprojects/html-agility-pack
* https://html-agility-pack.net
* https://stackoverflow.com/questions/tagged/html-agility-pack
* https://stackoverflow.com/questions/11526554/get-all-the-divs-ids-on-a-html-page-using-html-agility-pack
*/
[HtmlTargetElement(Attributes = "use-honeypot-agility")]
public class HoneyPotAgilityTagHelper : TagHelper
{
public override async Task ProcessAsync(TagHelperContext context, TagHelperOutput output)
{
// Remove the attribute that triggered this Tag Helper
output.Attributes.Remove(output.Attributes.First(t => t.Name == "use-honeypot-agility"));
TagHelperContent childContent = await output.GetChildContentAsync();
string childContentHtml = childContent.GetContent();
HtmlDocument htmlDocument = new();
htmlDocument.LoadHtml(childContentHtml);
IEnumerable<string> existingNames = htmlDocument.DocumentNode
.SelectNodes("//*/@name")
.Select(node => node.GetAttributeValue("name", ""));
IEnumerable<string> knownNames = ["email", "phone", "message"];
// var result = list1.Except(list2);
// "will give you all items in list1 that are not in list2."
// https://stackoverflow.com/questions/11418942/linq-find-all-items-in-one-list-that-arent-in-another-list
IEnumerable<string> unusedNames = knownNames.Except(existingNames, StringComparer.OrdinalIgnoreCase);
// Set a default value if there are no unused entries from the known names list
string nameStr = unusedNames.Any() ? unusedNames.First() : "default";
// 'data-type' attribute is used for testing purposes. Change to 'type'.
output.PostContent.SetHtmlContent($"<input data-type=\"hidden\" name=\"{nameStr}\" placeholder=\"{nameStr}\" />");
}
}
/* --------------------------------------------------
* Parent/Child Tag Helper
*/
[HtmlTargetElement(ParentTag = "form-x")]
public class GetDescendantNamesTagHelper : TagHelper
{
public override void Init(TagHelperContext context)
{
string? valueOfNameAttr = context.AllAttributes.FirstOrDefault(t => t.Name == "name")?.Value.ToString();
if (string.IsNullOrEmpty(valueOfNameAttr)) { return; }
NameContext.ExistingNames.Add(valueOfNameAttr);
}
}
[HtmlTargetElement("form-x")]
public class GetDescendantNamesParentTagHelper : TagHelper
{
public override async Task ProcessAsync(TagHelperContext context, TagHelperOutput output)
{
output.TagName = "form";
// "Only if I called await output.GetChildContentAsync(); inside
// the parent ProcessAsync method then it awaits the children
// Init to be run before process method of the parent" – OvadyaShachar Commented Nov 6, 2019 at 4:34
// https://stackoverflow.com/questions/56625896/tag-helper-order-of-execution/58722955#58722955
await output.GetChildContentAsync();
IEnumerable<string> knownNames = ["email", "phone", "message"];
// var result = list1.Except(list2);
// "will give you all items in list1 that are not in list2."
// https://stackoverflow.com/questions/11418942/linq-find-all-items-in-one-list-that-arent-in-another-list
IEnumerable<string> unusedNames = knownNames.Except(NameContext.ExistingNames, StringComparer.OrdinalIgnoreCase);
string nameStr = unusedNames.Any() ? unusedNames.First() : "default";
// 'data-type' attribute is used for testing purposes. Change to 'type'.
output.PostContent.SetHtmlContent($"<input data-type=\"hidden\" name=\"{nameStr}\" placeholder=\"{nameStr}\" />");
// Clear the existing names list for processing the next `form-x` Tag Helper
NameContext.ExistingNames = [];
}
}
public static class NameContext
{
public static List<string> ExistingNames { get; set; } = [];
}
}
Razor Page view
@page
@model WebApplication1.Pages.IndexModel
@addTagHelper WebApplication1.Helpers.HoneyPotTagHelper, WebApplication1
@addTagHelper WebApplication1.Helpers.HoneyPotAgilityTagHelper, WebApplication1
@addTagHelper WebApplication1.Helpers.HoneyPotAngleTagHelper, WebApplication1
@addTagHelper WebApplication1.Helpers.GetDescendantNamesParentTagHelper, WebApplication1
@addTagHelper WebApplication1.Helpers.GetDescendantNamesTagHelper, WebApplication1
<style>
form {
display: flex;
grid-gap: 1rem;
align-items: start;
}
input[data-type="hidden"] {
border: 1px solid red;
}
.textarea-container {
padding: 1rem;
border: 1px solid #e1e1e1;
}
</style>
<h3>RegEx</h3>
<form use-honeypot-regex name="myForm">
<input type="text" name="name" placeholder="Name" />
<input type="tel" name="email" placeholder="Email" />
<div>
<textarea name="message" placeholder="Message">Message</textarea>
</div>
</form>
<h3>HTML Agility Pack - HTML Parser</h3>
<form use-honeypot-agility name="myForm">
<input type="text" name="name" placeholder="Name" />
<input type="tel" name="email" placeholder="Email" />
<div>
<textarea name="message" placeholder="Message">Message</textarea>
</div>
</form>
<h3>Angle Sharp - HTML Parser</h3>
<form use-honeypot-angle name="myForm">
<input type="text" name="name" placeholder="Name" />
<input type="tel" name="phone" placeholder="Phone" />
<div>
<textarea name="message" placeholder="Message">Message</textarea>
</div>
</form>
<h3>ParentTag - Form with direct child controls</h3>
<form-x name="myForm-2">
<input type="text" name="name" placeholder="Name" />
<input type="tel" name="phone" placeholder="Phone" />
<textarea name="message" placeholder="Message">Message</textarea>
</form-x>
<h3>ParentTag - Form with nested child controls</h3>
<form-x name="myForm">
<input type="text" name="name" placeholder="Name" />
<input type="email" name="email" placeholder="Email" />
<div class="textarea-container">
<textarea name="message">The name of this text area is not retrieved because this control is within a div, and therefore, is not a direct child of the form.</textarea>
</div>
</form-x>
Output sample
<form name="myForm">
<input type="text" name="name" placeholder="Name">
<input type="tel" name="phone" placeholder="Phone">
<div>
<textarea name="message" placeholder="Message">A text area.</textarea>
</div>
<input type="hidden" name="email">
</form>
("Hidden" elements are displayed with a red border in the following image.)
Using RegEx to extract all values of name
attributes
The RegExt solution uses a RegEx
pattern to parse the child HTML output. This SO answer provided the RegEx
pattern (?<=\btitle=")[^"]*
(modified to search for the name
attribute instead of title
).
The RegEx
pattern handles nested child elements.
ParentTag
The ParentTag
solution works only for child elements of a form
that are direct descendants of the form
tag. It does not work for nested control elements as in the following sample:
<form-x name="myForm">
<input type="text" name="name" placeholder="Name" />
<input type="email" name="email" placeholder="Email" />
<div>
<textarea name="message">A text area.</textarea>
</div>
</form-x>
The ParentTag
approach uses a unique tag name for the form
to trigger the Tag Helpers versus triggering the Tag Helper with a custom attribute.
A unique tag name can trigger a Tag Helper for the parent form
element and for each child element. This is enabled by setting the ParentTag
parameter of the HtmlTargetElement
attribute.
[HtmlTargetElement(ParentTag = "form-x")]
targets all child elements, regardless of the tag type (i.e. <input>
, <textarea>
, etc.).
[HtmlTargetElement("form-x")]
targets the parent form
element.
Parent/child Tag Helpers execution order
Note the use of await output.GetChildContentAsync();
. This is important because it forces the required order of execution Tag Helpers. The child Tag Helper needs to execute first so that a list of existing name
attributes can be built. After the Tag Helper for all child elements are executed, the ProcessAsync
method on the parent Tag Helper is executed.
From this source:
we call
output.GetChildContentAsync()
which will cause the child tag helpers to execute and set the properties.
This SO post has a diagram showing the order of execution if output.GetChildContentAsync()
is used.
I used a simple static list property to store existing name
attribute values instead of using the Items
dictionary off the context
object. See this link for more information. The static list made it simple to use LINQ compare existing names to the list of known names.
3rd-Party HTML Parsers
The HTML parser from the HTML Agility Pack is "very tolerant with real world malformed HTML" per their description on NuGet.org or on GitHub.com.
"AngleSharp is a .NET library ... standard DOM features such as querySelector or querySelectorAll work for tree traversal."