Поделиться через


Nested/Recursive Regex and .NET Balancing Groups - Detect a function with a Regex

Problem Overview

Regex is a powerful tool allowing us to extract content from a string based on a Regular Expression. When you need to identify a string that contains nested elements, it might be more difficult to understand how to create a Regex solving this specific problem.

Let’s say you want to extract a specific function from a JavaScript file, how are you going to manage the nested braces of condition or loop...

This post will explain how to perform this operation with a simple Regex (in .NET).

Here is the input string:

 

 exampleChars*&^%(((!£)!)_*$)({}
function myFunc()
{
   var piloupe = true
   while (piloupe)
  {
      if (piloupe)
     {
          alert('Hello Nested Regex');
         piloupe = false;
     }
  }
}}
exampleChars*&^%(((!£)!)_*$))

 

> We want the output string to be :

 function myFunc()
{
    var piloupe = true
   while (piloupe)
  {
      if (piloupe)
     {
          alert('Hello Nested Regex');
         piloupe = false;
     }
  }
}

 

The Regex!

1. The identification of the first line is a classic Regex

                function myFunc\(\)\s*

 

2. Identify the code inside the function

Basic example without nested braces:

 function myFunc()
{
    var piloupe = true;
}
}

Regex :

 

function myFunc \(\)\s*\{(?:[^{}])\}

 

The blue element identifies the beginning of the function.

The red elements identify the opening and closing braces.

The black bracketscreate a non-capturing group.

The green identifies the elements inside the braces that are not (nested) braces.

 

3. Identify Nested braces

If our function has nested braces we need to use Balancing Groups:

Regex:

 

function myFunc\(\)\s*\{(?:[^{}] | (?<counter>\{) | (?<-counter>\})) + (?(counter)(?!))\}

 

The blue element identifies the beginning of the function.

The red elements identify the opening and closing braces.

The black bracketscreate a non-capturing group that will be used as a “loop”.

The green identifies the elements inside the braces that are not (nested) braces.

The orange elements identify the opening and closing nested braces. When a first opening brace is discovered the counter increases to one. The counter has initially a value of 0. The greenidentified all the elements before any brace.

  • If the character detected after the green is a closing brace the regex analyses the current value of the counter and leaves the Black “loop” if its value is 0. If the current value of the counter is not 0 it decrements the value of the counter by one ( (?<-counter>\}) ).

 

  • If the character detected after the green is an opening brace it increments the value of the counter by one ( (?<counter>\{) ).

 

The pink elements indicates that if the current value of the counter is different from 0 (meaning that there is less closing brackets than opening brackets) the regex is invalid.

 

Hope you looped the Regex.

Linvi