Regular expressions: “Now you have two problems”


I’ve used the Zsh shell as my primary command line and scripting shell for the past seven years; and before that Korn shell for over a decade. Recently on the zsh-users mailing list someone asked for help that resulted in a recommendation to use a negative look-ahead regular expression.

Mikael Magnusson correctly pointed out

As a sidenote, (^foo)* is always useless to write,
since (^foo) will expand to the empty string, and then
the * will consume anything else. A useful way to think
of (^foo) is a * that will exclude any matches that
don't match the pattern foo.

To which I replied that people should Google “regular expression negative lookahead”. Which will result in numerous articles talking about Jamie Zawinski’s observation:

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

I wholeheartedly agree with that sentiment. Notwithstanding the fact I still employ regular expressions every single day. The important thing being that I avoid them outside of ad-hoc interactive searches unless I have expended considerable thought about their correctness and failure modes if handed malformed input.