htmlregexvalidationunicodehtml-input

How to use HTML5 input pattern attribute to validate Latin and extended Latin characters only


I am working on a web form where users are required to input text that should only contain Latin characters, including extended Latin characters (such as ñ, é, ü, etc.), but should exclude any non-Latin characters (like Cyrillic, Chinese, Arabic, etc.). I want to use the HTML5 pattern attribute for an element to enforce this validation client-side.

So far, I have tried using a regular expression pattern like ^[A-Za-z]+$ to match Latin characters, but this does not include the extended Latin characters. Here is the code I currently have:

<input type="text" pattern="^[A-Za-z]+$" title="Please enter Latin characters only">

This works well for basic Latin letters but fails to validate extended Latin characters. I am looking for a way to modify this pattern to include all Latin characters, including extended ones.

Could someone help me with the correct regex pattern for this purpose? Also, are there any potential pitfalls or considerations I should be aware of when using the pattern attribute for this type of validation?


Solution

  • There is an interesting work around where you should just be able to use \p{sc=Latin}* as the pattern for it to identify all extended Latin characters, you should also be able to specify it based on the Unicode Script tag (wiki here). Just make sure the default list covers everything you need.