Tore Lervik

How to regex anything except a given word

One feature I wanted for my blog was a bbcode system where I can wrap [tag]tags[/tag] around text to format it according to my presets. However, the regex to match this proved to be somewhat a challenge.

The first solution

"\[tag\]((.|\n)*)\[\/tag\]"

This regex works, but if you have multiple occurences it will find the first [tag] and the last [/tag] in your text.

Regex validation

[tag]some text[/tag] = "some text"
[tag]some text[/tag][tag]more[/tag] = "some text[/tag][tag]more"

The regex has to match "A(Anything except B)B". The (.|\n)* syntax above matches anything. But since regex by default is greedy, multiple occurences will result in the above.
 

The final solution

"\[tag\]((.|\n)*?)\[\/tag\]"

The ((.|\n)*?) syntax matches anything but is less greedy. With multiple occurences this will now match correctly.

Regex validation

[tag]some text[/tag] = "some text"
[tag]some text[/tag][tag]more[/tag] = "some text" and "more"

C# example

String outputText = Regex.Replace(inputText, @"\[b\]((.|\n)*?)\[\/b\]", "<strong>$1</strong>", RegexOptions.Multiline | RegexOptions.IgnoreCase);

This will replace "[b]tags[/b]" with "<strong>tags</strong>"

PS: regexlib.com is the place to look if you often use regex. Especially their cheat sheet and regex tester.