Oct 08

Solving File Filtering Problems with this One Weird Trick

One Weird Trick
I was asked an interesting question — how to create a regular expression which would allow a user to specify an arbitrary case insensitive series of strings, all of which must be on a line, as well as a series of strings which must not be on a line in order to filter logs. In either case, the strings could be found in any order anywhere on the line. He was using Perl to process the arguments and run the filter. Generally I would have piped together several greps, but it sounded like an interesting challenge….

After some playing, I discovered that the following recipe would work — just add as many criteria to the regex as needed:

Breaking it down:

  • .* is used to find the pattern anywhere in the line.

  • (?=PATTERN) uses lookahead to find a pattern in the line

  • (?!PATTERN) uses lookahead to verify a pattern is not in the line

Note: The more patterns added, the longer it will take to run — given that each of the look ahead matches needs to start from the beginning of the line, this approaches O(n2).

He who claims to be an expert at Regex is a liar — Unknown

As for the title….. regex is full of ‘weird tricks’!

Leave a Reply

%d bloggers like this: