I'm working on an app that adapts text to braille specifications and it has some tricky rules on how to handle uppercase, I'd like some help. The rules are:
:This is an :Example
:This is ::ANOTHER ex::AMple, ::ALRIGHT
:This is -::A VERY LONG SENTENCE WITH A SEQUENCE OF ALL ::CAPS to serve ::AS ::AN :Example
:This is my fin:A;l ::EXAM;ple
Working with regex, I was able to solve for the simple ones but not all rules.
// adds : before any uppercase
var firstChange = text.replace(/[A-Z]+/g,':$&');
// adds : to double+ uppercase
var secondChange = firstChange.replace(/[([A-Z]{2,}/g,':$&');
// adds ; to upper-lower change
var thirdChange = secondChange.replace(/\B[A-Z]+(?=[a-z]/g,'$&;')
I was trying to build up from simple to complex, then I tried the other way around, then I tried merging some rules, either way they conflict. I'm new to regex and I could use any insight on how to solve this. Thank you.
Edit: To make it more clear, I made a final example that combines all rules.
This is an Example. This is ANOTHER exAmple, ALRIGHT? This is A VERY LONG SENTENCE WITH A SEQUENCE OF ALL CAPS to serve AS AN Example. This is my finAl EXAMple.
Should become:
:This is an :Example. :This is ::ANOTHER ex::AM;ple, ::ALRIGHT? :This is -::A VERY LONG SENTENCE WITH A SEQUENCE OF ALL ::CAPS to serve ::AS ::AN :Example. :This is my fin:A;l ::EXAM;ple
SOLVED: With the help of @ChrisMaurer and @SaSkY, here is the code to solve the above problem:
(edit: fixed fourth change thanks to @Sasky)
var original = document.getElementById("area1");
var another = document.getElementById("area2");
function MyFunction(area1) {
// include : before every uppercase
var firstChange = original.value.replace(/[A-Z]+/g, ':$&');
// add one more : before multiple uppercase letters
var secondChange = firstChange.replace(/([([A-Z]{2,}|\b[|A-Z]+\b)/g, ':$&');
// add - to beggining of long uppercase sequence
var thirdChange = secondChange.replace(/\B(::[A-Z]+(\s+::[A-Z]+){3,})/g, '-$&');
// removes extra :: before words within long uppercase sequence
var fourthChange = thirdChange.replace(/(?<=-::[A-Z]+\s(?:::[A-Z]+\s)*)::(?=[A-Z]+\s)(?![A-Z]+\s(?!::[A-Z]+\b))/g, '');
// add a lowercase symbol when it changes from uppercase to lowercase mid word
var fifthChange = fourthChange.replace(/\B[A-Z](?=[a-z])/g, '$&;');
// update
area2.value = fifthChange;
}
<html>
<body>
<textarea id="area1" rows="4" cols="40" onkeyup="MyFunction()">
</textarea>
<textarea id="area2" rows="4" cols="40"></textarea>
</body>
</html>
So I think your approach is good, and the first replace seems to get the single colons into the right place. The second one screws up on single letter words like A and I. I would fix that with an added alternation:
/([([A-Z]{2,}|\b[A-Z]+\b)/g
Now you need to add two more replacements; one to add the hyphen, and the other to remove the double colons.
For the hyphen you just search for three or more ::ALLCAPS whitespace combos like this:
/\B(::[A-Z]+(\s+::[A-Z]+){2,})/g
The \B handles caps at the very beginning of the string. I replaced with hyphen and $1.
To remove the double colons, I got a little trickier with a lookbehind and a lookahead:
/(?<=::[A-Z]+\s*)::([A-Z]+)(?=\s*::[A-Z]+)/g
This one is just replaced with $1. Luckily Javascript supports variable length lookbehinds.
Here it is working on Regex101:
I did not look at your last replacement. Superficially it seemed to be OK.