Iterating Strings with Regex in JavaScript
Demonstrating matchAll by reformatting Markdown with named capture groups

Regex Iteration is the key tool for reformatting content for modern apps. The ability to re-format content via the combination of regular expressions and iteration is a powerful concept that provides a range of benefits for application builders, and ultimately the end user experience. So much so, that is is now built into JavaScript with the relatively newly introduced matchAll
function — a function this article explores in detail.
This article builds upon a previous introductory piece to regular expressions in JavaScript. If you’re new to exploring Regex and would like to follow the concepts of this piece that build upon the foundations, check out this article first: Regular Expressions in JavaScript: An Introduction.
So why is matchAll
’s iteration abilities in conjunction with Regex a big deal? Well, let’s take a look at some high level use cases of today, before delving into the code in more detail.
Why is iteration with Regex important — and useful?
If you need to reformat text content from one format to another — such as taking Markdown or HTML and transforming it into readable text, or even mobile components — then Regex iteration is the solution you need to process that transformation.
Take some common scenarios that have become common in modern app and web development:
- Reformatting legacy content. Think taking an old archive of HTML formatted articles to render inside a React Native application with native components. As React Native is a library of components such as
<View>
and<Text>
, and not tags, those HTML tags within the article need to be replaced with these components. This requires a total transformation of that content, from a string to an array of components. Sure, you could simply render the HTML content from a webpage in aWebView
component, but this is a far inferior solution (in all aspects) than native integration. - Expanding metadata. Being able to tap or click certain text content to bring up more metadata about the text in question — a powerful concept. Stock trading publications do this by wrapping stock indices with API endpoints for fetching the live price. Dictionary apps do this by wrapping certain words with definitions and other meta-characteristics. This can be done with HTML, Markdown, or even in-house formatting rules designed to be transformed via Regex.
- Formatting character-based languages such as Mandarin or Kanji. For example, it is common for written content in Chinese to display Pinyin above each character in educational settings, or even just the tone of the character.
- Mathematical equations. It is hard to display complex equations with simple text input. Iterating through a bulk of text and reformatting the equations into the standard scientific format is a great use case of iterative regex.
The above scenarios entail quite dramatic changes from the original content to the transformed content, oftentimes being a totally different format, such as the case with transforming the string into an array of components for mobile apps.
Let’s next look at how this is achieved, with the matchAll
JavaScript API.
Introduction to Iteration with matchAll
matchAll
is a Regex-centric function that matches a string against a regular expression, and returns these matches as an iterable. It requires a regular expression as its only argument.
An “iterable” is a piece of data that can be iterated through. Arrays, maps, sets, and strings are built-in iterables of JavaScript. Unlike matchAll
, match
simply returns complete matches that are conventionally indexed in an array.
matchAll
was introduced in the 2019 edition of the ECMA Script standard, and has seen modest browser support since its inception. At the time of writing, Edge, IE and Safari on iOS still lag behind supporting matchAll
.
If you require universal browser support however, there are polyfills available. Node JS support is also built in from version 12 and above. With the API here to stay, now is a great time to explore its capabilities.
Once matchAll
is executed on a string, an iterator is returned that allows us to loop through the resulting matching items:
const matches = myString.matchAll(regex);for (const match of matches) {
...
}
Concretely, matchAll
returns an iterable containing the resulting matches against the supplied regular expression. The contents of each match
however is dependant on the regular expression — let’s explore this further.
How a Match is Structured
Each match
is an array that contains an arbitrary number of elements depending on the supplied Regex of matchAll
:
- The first element at index 0 contains the entire matching text.
- The subsequent elements contain the individual matches of each capture group of the regular expression — if capture groups exist.
- A
match.index
property is also given, providing the position in the string where the match began. - Where named capture groups are present in a regular expression,
match
groups those, too, in thematch.groups
property. We’ll use named capture groups further down, as it requires more syntax within the regular expression itself. The following explanation uses the default numbered capture groups, where each group is simply indexed as array elements.
To illustrate match
with a minimal example, let’s take a regular expression with two capture groups, that will match any text within two square bracket enclosures:
// a regular expression testing two capture groupsconst regexp = /match \[(.*?)\]\[(.*?)\]/ig;const myString = "Testing a match [group 1] [group 2]";
The bolded portion of the string will be the only match of this example; we can simply expect one match in the resulting iterable. Let’s run matchAll
on this string now, and extract each piece of data from the resulting match
:
const matchAll = myString.matchAll(regexp);for (const match of matches) {
// the complete match
const fullMatch = match[0];
// isolating each capture group match
const group1 = match[1];
const group2 = match[2]; // index of where the match starts
const cursorPos = match.index;
}
I’ve termed the index
property of the match cursorPos
, so there is no confusion between an array index and the index at which the match was found. You can think of match.index
as being the position a cursor would be before the matched result is typed out. cursorPos
will become important further down when we reformat an entire bulk of text from Markdown into HTML.
Logging out each match index will give us the corresponding values:
console.log(match[0]);
> match [group 1] [group 2]console.log(match[1]);
> group 1console.log(match[2]);
> group 2console.log(cursorPos);
> 10
More data can now be derived from these values. For example, match[0].length
and cursorPos
can be used together to calculate the string position where the match ends:
const matchEndPos = cursorPos + match[0].length;
This is vital for reformatting entire bulks of text where you also need the content that isn’t matched —the “in-between” content that exists between matches. This will be demonstrated later in the article.
Advantages of the iterator
Beyond the additional data supplied by each match
, and the simple syntax of calling matchAll
, there are other fundamental advantages that using iterators bring:
- Iterators do not need to be indexed, and can be of any data type that conforms to the iterable protocol, that range from strings, arrays, sets and maps.
- Iterators work great with the
for…of
loop, the newest member of the for loop family of JavaScript. This makes for minimal syntax to loop through potentially complex objects. The more capture groups your Regex contains, the more complex each matching element will be. - The ability to refer to named capture groups.
Let’s touch on the last point next — where the power of matchAll
really becomes apparent, when we work with named capture groups within match
.
Named Capture Groups within `match`
The previous example highlighted how match
automatically indexes each capture group within its resulting array. This is very useful, but in the event we’re dealing with large regular expressions with many capture groups, working with many array elements will become confusing.
Let’s go one step further by utilising named capture groups.
Fundamentally, match
also groups each named capture groups in a separate property: match.groups
. Let’s modify the previous regular expression slightly to name the two capture groups. Let’s call them mygroup
and anothergroup
:
const myString = "Testing a match [group 1] [group 2]";const regexp = /match \[(?<mygroup>.*?)\]\[(?<anothergroup>.*?)\]/ig;
The bolded syntax above “tags” or “names” a capture group. To name a capture group, the question mark ?
immediately follows the opening of the group, followed by the name of the group in angle brackets. The regex follows immediately after the closing angle bracket.
The ?
in this case is does not mean “optional”. Indeed, there are no characters to test before it, as it appears directly after the opening of the capture group.
With names now hardcoded in the Regex itself, we can now access them when iterating through each match
:
for (const match of matches) {
// accessing groups via destructuring `match`
const { groups: { mygroup, anothergroup }, index } = match; console.log(mygroup);
> group 1 console.log(anothergroup);
> group 2
}
Notice the great destructuring syntax used here to get each group match, plus match.index
, in one line of code. mygroup
and anothergroup
become separate constants that can now be used for further processing within the iteration in question.
Let’s now take a look at a real-world example of using matchAll
, to reformat a bulk of Markdown text.
Matching Markdown Rules with matchAll
In this section we will take matchAll
to the next level and test multiple Markdown rules in one regular expression, with each of those rules having named capture groups. We will then test which rule was matched within the iteration in question, and format the text accordingly.
Combining markdown rules with |
In order to test multiple Markdown rules in a singular Regex, the vertical bar (|
) — also known as the alternation operator — can be used to act as an “or” operator. It can be used on individual characters, character classes, and capture groups.
Take the following regular expression, that tests for bold text, italic text, and links, with fully configured named capture groups. For simplicity, I have omitted testing asterisk characters, that also represent bold and italic text in markdown in addition to the underscore:
// either match bold, italic or link formatted markdownconst regex = /(__(?<bold>.*?)__)|(_(?<italic>.*?)_)|(?<link>\[(?<text>[\w\s\d]+)\]\((?<url>https?:\/\/([a-z0-9@#/.-]+))\))/ig
The Regex testing a Markdown link has been taken from my introductory article on Regex. Note that the i
and g
flags are very commonly used for testing bulks of text where more than one case insensitive matches need to be tested.
Here are the three rules broken down further:
// bold text - __text__
(__(?<bold>.*?)__)|// italic text - _text_
(_(?<italic>.*?)_)|// link - [link text](url...)
(?<link>\[(?<text>[\w\s\d]+)\]\((?<url>https?:\/\/([a-z0-9@#/.-]+))\))
Each markdown rule is wrapped within its own capture group, with an additional named capture group surrounding the content we are interested in reformatting.
All these groups will become accessible in each match
result, with the un-named groups accessible through match
elements, and named groups accessible via match.groups
. Where each group is unmatched, a value of undefined
is assigned.
Notice too that the <link>
group also has two other groups within it — <text>
and <url>
. These will also be present in match.groups
.
Let’s go ahead and test a string now, that will contain one of each Markdown rule we are testing for:
// matching a string with three Markdown rulesconst str = "Testing some __bold text__ and _italic text_ with my Medium link: [Here](https://medium.com/@rossbulat)";let matches = str.matchAll(regex);for (let match of matches) {
console.log(match.groups);
}
Running this will demonstrate that each of the rules are successfully matched. Let’s take a look at match.groups
when the link is matched:
// `match.groups` of a matched Markdown linkconsole.log(match.groups);>
{
bold: undefined,
italic: undefined,
link: "[Here](https://medium.com/@rossbulat)",
text: "Here",
url: "https://medium.com/@rossbulat"}
}
Notably, all capture groups are listed in a one-dimensional object —groups embedded in other groups, such as <url>
and <text>
within the <link>
capture group, are not treated differently in match.groups
.
We now have the ability to reformat a bulk of Markdown text. Let’s demonstrate this in the final section of this piece.
Reformatting Markdown into HTML
The following gist takes the three markdown rules from above and reformats a Markdown string into HTML:
formattedStr
is initialised as an empty string, with reformatted matches as well as non-matched content being appended to it as the iteration continues.
Some key points about this implementation:
- Destructure syntax has again been used to extract each group from
matches.groups
— a handy shortcut that simplifies syntax. We can now reference each group further down when we determine what rule was matched:
// destructuring `match.groups`const { groups: { bold, italic, link, text, url }, index } = match;
cursorPos
keeps a record of where each match ends. Within the next iteration,match.index
will supply the match starting position. With both these data points, we can fetch thesubstring
of the content between matches. This is the first thing we do upon each iteration, appending the content between matches:
// append string content from the last matchreformattedStr += myStr.substr(cursorPos, (index - cursorPos));
- We have leveraged both
match.groups
to fetch particular Markdown content, as well as indexed capture groups, such asmatch[0]
, to fetch the entire match content.match[0]
is important to calculate the length of its content in relation to the original string. - The HTML reformatting and appending to
reformattedStr
is simply implemented as a conditional statement, testing whether each group isundefined
, and reformatting accordingly when a group value exists:
// test each rule and append to reformatted stringif (bold !== undefined) {
reformattedStr += '<b>' + bold + '</b>';
}
else if (italic !== undefined) {
reformattedStr += '<i>' + italic + '</i>';
}
else if (link !== undefined) {
reformattedStr += '<a href="' + url + '">' + text + '</a>';
}
- Once complete, an additional check is made to append any further content after the last match. If
cursorPos
is less than the length of the original string, we know there is more content to be appended:
// appending content after last matchif (cursorPos < myStr.length) {
reformattedStr += myStr.substr(cursorPos, (myStr.length - cursorPos));
}
Although we are still constructing a string from another string, this example clearly demonstrates the power and comprehensiveness of matchAll
, making our lives as developers easier by referencing clearly grouped match content.
In Summary
This article has built upon my introduction to Regex, demonstrating how to use the newly introduced JavaScript API matchAll
to iterate through matches of a Regex test on a string. This is very useful for reformatting data for a range of use cases, some of which were mentioned at the top of this article.
Capture groups are well supported with matchAll
, that supply every group match, as well as a separate named capture groups object with every match. In addition to this, the full matching content, as well as an index
of where the match began, are also given.
The final example of this piece reformats Markdown into HTML, but we are not limited to reconstructing a string — you could for example reformat a match into a component, or even just an object for further processing, and push each onto an array. Further articles will be linked here for more advanced use cases of matchAll
and Regex iteration.