Regex 101 Exercise I7 - Make sure all characters inside <> are uppercase

Article
02/06/2006

Regex 101 Exercise I7 - Make sure all characters inside <> are uppercase

Comments

Anonymous
February 06, 2006
Hmm... you mean, replace each lowercase character inside <> with its uppercase equivalent? Probably best done with a MatchEvaluator...

Have the regex look like this: (<.*?>)
Have the MatchEvaluator return the String.ToUpper of the captured string

That should do it!

Of course, a cheap way to do it is just ToUpper()-ify the whole darn string... meets the requirements ;)
Anonymous
February 07, 2006
Good call Maurits, but if you actually implement it in code, you will find it's really quite tricky:
Regex regex = new Regex(@"<(?<slash>/?)(?<tag>[^>]+)>", RegexOptions.Compiled | RegexOptions.IgnoreCase);
String resultHtml = regex.Replace(inputHtml, delegate(Match match)
{
String slash = match.Groups["slash"].Value;
String tag = match.Groups["tag"].Value.ToUpperInvariant();
return String.Equals(slash, "/") ? String.Format("</{0}>", tag) : String.Format("<{0}>", tag);
});
Anonymous
February 07, 2006
Wait a minute, why not ensure all html tag names are lowercase? anyway lowercase tag names are in compliance with XHTML specs.

Sheva
Anonymous
February 07, 2006
I was thinking more like this... after checking the docs I realized I don't even need the parentheses.

using System;
using System.Text;
using System.Text.RegularExpressions;

class RegExSample
{
static string CapText(Match m)
{
return m.Value.ToUpper();
}

static void Main()
{
string text = "example";
string pattern = "<.*?>";
System.Console.WriteLine("text=[" + text + "]");
string result = Regex.Replace(text, pattern,
new MatchEvaluator(RegExSample.CapText));
System.Console.WriteLine("result=[" + result + "]");
}
}
Anonymous
February 07, 2006
Sheva, how did you get your lines to indent in the comment?
Anonymous
February 07, 2006
Great, Maurits, you tell me something important, actually using capture here doesn't make any sense.
Regex regex = new Regex(@"<[^>]+>", RegexOptions.IgnoreCase);
String resultHtml = regex.Replace(inputHtml, delegate(Match match)
{
return match.Value.ToUpperInvariant();
});

Console.WriteLine(resultHtml);

As to your question, I just write thecode in VS, and copy it from there to here:)

Sheva
Anonymous
February 08, 2006
I too am confused about what Eric is trying to do with "Make sure all characters inside <> are uppercase". We seem to be missing some context, like make sure with what action? Should we replace with uppercase or do we just want to reject those tags that have lowercase in them for some reason? What is the point of this exercise?

Since Maurits and Sheva have shown ways to match and ToUpper, I'll go the second route. A match with the following pattern is a reject:
(?<=<[^>]*)a-z
Anonymous
February 08, 2006
Kbiel, your regex pattern can only match the single charactor tag name for instance etc, use this instead: (?<=<[^>])[a-zA-Z]+(?=[^>]>)
Anonymous
February 09, 2006
Actually, kbiel's pattern does work and correctly rejects <aBC>, <AbC>, <ABc>, etc. while permitting <ABC>.
Anonymous
February 09, 2006
Here's another way, inspired by a simplified version of kbiel's regex:

< # the start of a tag
[^>]* # any amount of stuff INSIDE THE TAG
[p{IsLower}] # EGADS! A HORRIBLE LOWER-CASE CHARACTER! GET THE TORCHES AND PITCHFORKS!
.*? # the rest of the tag
> # the end of the tag
Anonymous
June 07, 2009
PingBack from http://greenteafatburner.info/story.php?id=5280
Anonymous
June 08, 2009
PingBack from http://quickdietsite.info/story.php?id=3785
Anonymous
June 09, 2009
PingBack from http://greenteafatburner.info/story.php?id=2234

Share via

Regex 101 Exercise I7 - Make sure all characters inside <> are uppercase

Comments

Additional resources