Regular Expressions in C# – Part 5 – Groups

In regular expression we can use groups to parse our subject with sub strings in our pattern. These groups are also found in our Match object, so we can retrieve the matches for each group. Groups are expressed by using parentheses '()'. Groups are counted from left to right starting with the whole pattern.

Access to group match values

We test some real life examples of regular expression patterns with groups. But before we do that we need a little helper to print out the actual group matches. We do this by printing the Groups collection from the Match object.

using System.Diagnostics;
using System.Text.RegularExpressions;
 
namespace RegularExpressions.Tests.Helpers
{
    public class DebugWriter
    {
        public static void WriteGroups(Match match)
        {
            var index = 0;
 
            foreach (var group in match.Groups)
            {
                Debug.WriteLine("Group {0}: {1}", index, group);
                index++;
            }
        }
    }
}

Match a Postal Code

In the Netherlands we use a postal code format of four digits and two uppercase alphabetic characters (1234 AB). Sometimes there is a space between the numeric and alphabetic characters… sometimes not, but both are valid postal codes. See the test below for a regular expression that matches both occurrences and returns the numeric and alphabetic part as separate groups.

using System.Text.RegularExpressions;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using RegularExpressions.Tests.Helpers;
 
namespace RegularExpressions.Tests.Part05
{
    [TestClass]
    public class Groups
    {
        [TestMethod]
        public void Match_PostcalCode_With_Space_Character()
        {
            const string pattern = @"^([0-9]{4}) ?([A-Z]{2})";
            const string subject = "4841 AB";
            var regEx = new Regex(pattern);
            MatchCollection matches = regEx.Matches(subject);
 
            foreach (Match match in matches)
            {
                DebugWriter.WriteGroups(match);
            }
 
            Assert.AreEqual(1, matches.Count);
 
            // Debug Trace:
            // Group 0: 4841 AB
            // Group 1: 4841
            // Group 2: AB
        }
 
        [TestMethod]
        public void Match_PostcalCode_Without_Space_Character()
        {
            const string pattern = @"^([0-9]{4}) ?([A-Z]{2})";
            const string subject = "4841AB";
            var regEx = new Regex(pattern);
            MatchCollection matches = regEx.Matches(subject);
 
            foreach (Match match in matches)
            {
                DebugWriter.WriteGroups(match);
            }
 
            Assert.AreEqual(1, matches.Count);
 
            // Debug Trace:
            // Group 0: 4841AB
            // Group 1: 4841
            // Group 2: AB
        }
    }
}

The first group is the numeric character sequence of four ([0-9]{4}). The second group is a pair of uppercase alphabetic characters ([A-Z]{2}). Between these group we have a space character with a question mark to express that there can be a space between these groups ' ?'. Three groups are returned; the whole pattern, the first and the second group. Both tests return the same result.

Comments are closed.