Explaining Regular Expressions


Unless you get to work with regular expressions everyday, the details can easily blur. You remember enough to know when a regex could be useful, but not enough to write it without reaching for a reference manual.

There are a few tools out there to help you write and understand regular expressions, including some IDEs that can provide assistance (like Komodo).

Sometimes you just want a quick explanation of a regular expression you might have seen in some code. The YAPE::Regex::Explain perl module is one tool that helps you do just that. Turning the module into a simple command-line tool is a simple one-liner:
#!/usr/bin/perl -w
print YAPE::Regex::Explain->new($ARGV[0])->explain;
I've also turned it into a simple CGI utility for those times when my regex memory fails me. You can use it here: regexplainr (sorry, you may find that site offline but here's the source code).

In a previous post, On Parsing CSV and other Delimited/Quoted Formats, I used the following regular expression to parse a whitespace-delimited string:
"([^"]+?)"\s?|([^\s]+)\s?|\s
Regexplainr produces the following commentary:
The regular expression:

(?-imsx:"([^"]+?)"\s?|([^\s]+)\s?|\s)

matches as follows:

NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
" '"'

----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^"]+? any character except: '"' (1 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[^\s]+ any character except: whitespace (\n,
\r, \t, \f, and " ") (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
(Permalink to this regexplanation)

Book tip: (thanks to Tony) O'Reilly's Mastering Regular Expressions. Available on google books, and also from Amazon.