grep and regular expression revisited

Yesterday, while helping Laiq figure out the pattern he wanted to match against his data set, I bumped into grep’s odd behaviour with respect to regular expressions. Later that day, I went over the man page for grep at length to find out why grep behaved differently than I expected it to.

In the man page for grep there is a separate section on how regular expressions are interpreted within grep. The information provided in that section is brief. It is hard to say whether it provides documentation for a subset of the semantics of regular expressions in grep or the complete set in its entirety. That notwithstanding, I decided to write a (ironically) brief post on grep and its interpretation(s) of regular expressions.

According to the man page, grep understands two different versions of regular expression syntax. One is basic regular expression, and the other is extended regular expression. You would think that both of these versions would differ in more than one ways. However, in terms of functionality provided, basic regular expression is no different than extended regular expression.

The man page lists down only one difference between the two groups of regular expression syntax. With basic regular expression, the meta-characters ?, +, {, |, (, and ) lose their special meaning. To use them in their special capacities, they need to be back-slashed (escaped). In contrast, with extended regular expression, these meta-characters have to be back-slashed only when they are required to be used literally.

If you are trying to, for example, match the string “[000]”, you could work with grep in one of the following ways:

echo '[000]' | grep '\[0\+\]'
echo '[000]' | egrep '\[0+\]'

In the first example, note that the meta-character + is escaped because we want to use it in its special sense, that is as a repetition operator. In the second example, though, the + is already being used in special sense. Also note the use of egrep in second example as opposed to grep. egrep is essentially nothing more than grep -e grep -E (thanks to Zohair for pointing out the error) which triggers the extended regular expression syntax.

Another difference I could spot out is that the available set of named classes of characters do not work with basic regular expression. These named classes of characters include (but are not limited to) [:alnum:], [:alpha:], [:space:]. If there is indeed a way to make use of these named character classes with basic regular expression syntax, I would love to hear about it.

Additionally, I noticed that a limited subset of Perl-based character classes is understood by grep, in both basic and extended interpretations. For example, the \w and \W character classes are understood, but \d and \D are not. If there is a need to match against a numeric pattern, either the named character class [:digit:] or the expanded [0-9] form can be used. As I noted earlier, what is and what is not supported in terms of character classes and many more are not extensively documented in the man page. It is hard to take the absence of documentation of something in the man page to mean that that something is not supported within grep.

echo '[000]' | egrep '\[[0-9]+\]'
echo '[000]' | egrep '\[[:digit:]+\]'

That mostly sums about grep’s limited interpretation of regular expressions, as I see it. If you are a Perl guy and, not least, a regular expression one, you would feel at home with -P switch to grep which allows for regular expression patterns to be specified and interpreted in the same way as Perl interprets regular expressions. So, while the character class \d is not available with grep’s basic and extended regular expression syntax, it can be used readily with the -P option.

echo '[000]' | grep -P '\[\d+\]'

Regular expressions aside, there are a couple of very useful switches to grep that are commonly ignored or are generally never known of despite being clearly documented in the man page. One such gem is the -o switch. The default behaviour of grep is to display the entire line which matches the given pattern. Often it is desirable to display only the part which is matched, and not the entire line. The -o switch does just that.

There is also the -n switch. It prefixes each matched line or part of matched line with the line number from the file where it is found. This can be really helpful in many situations.

The -x switch forces the pattern to match exactly the whole line and not part of the line. This is a useful functionality which is unknown to many.

Like these, there are a handful more switches that can come in real handy. The man page lists all those.

Advertisements

When it is not fun, move on!

The following is an excerpt from Richard Branson’s “Screw It, Let’s Do It: Lessons In Life”.

    As soon as something stops being fun, I think it is time to move on. Life is too short to be unhappy. Waking up stressed and miserable is not a good way to live. I found this out years ago in my working relationship with my oldest friend, Nik Powel.

    Nik was with me from the very start of Virgin. I was the ideas person and Nik kept the books in order and handled the money. His main job was to run the Virgin record stores. They did very well. When we started the airline, we wanted it to be the best. We sank millions of pounds into it. Our main rivals, British Airways, tried to stop us. As the war between us heated up, we needed more and more money. It seemed an endless pit. Virgin Music was wealthy but the airline was eating up the cash. Nik didn’t enjoy taking such huge risks. That was when we both knew it was time for him to move on. I bought his shares in Virgin from him.

    Nik’s first love had always been films. He used his profit from Virgin to start Palace Pictures. He made great films, like The Company of Wolves, Mona Lisa and The Crying Game, which won an Oscar. He is still in the film business, still having fun and we are still friends. After a struggle, the airline finnally went into profit. If Nik had stayed with Virgin he might have made more money, but he would not have been happy. If we had gone on working together even after the fun had gone, we might have stayed friends. He made the right choice. This is the why I say, never just try to make money. Long-term success will never come if profit is the only aim.

I have been meaning for a long time to say along those lines. Richard Branson, however, has aptly put it into perspective. On an altogether different note, this book is an inspiring read.

Venting

Venting. Junaid earnestly thinks that it accurately describes me. I think in some ways (or a lot of ways, depending on where you’re coming from) it does!

Would you like to step out of the car, please?

Imagine having friends in the car driving to a far-off diner, finding a safe spot to park the car what may appear to be little more than five minutes walk away from the diner hoping to avoid the traffic police from subsequently towing the car away (otherwise eventually leaving you with no other option but to bribe the police officials or go to the trouble of getting and paying the ticket with money pulled out your own pocket), and the friends working up the cheek to whine about how far you are parking the car, how they will have to walk all the way down, and to rudely quip that they be dropped off instead in front of the diner while you go park the car. Stomps on the nerves real bad and hard, even when you know that they are merely having fun. Driving and finding a place to park a car in Karachi are trouble enough, that you have to gulp down crap like that from friends riding along.

You know who you are. Do be careful the next time you’re riding along in my car. I will have no qualms in screeching the car to a halt, asking you nicely (barring that, rudely) to step outside, and roaring away without you.

VMware, 64-bit processors, and Virtualisation Technology

Attempts to set up a 64-bit guest operating system using VMware workstation on a 64-bit Intel processor machine running Windows 2003 R2 Standard 64-bit Edition failed miserably this past week at work. As Chaz6 and larstr subsequently pointed out in #vmware on irc.freenode.net, the particular Intel processor in the machine being used does not support Virtulisation Technology (VT). It came as a thudding surprise as admittedly I had not known anything about VT and suspected that VMware would not ever need rely on a special [hardware] feature of 64-bit processors to be able to emulate 64-bit guest operating systems, but as things stand today, VMware cannot. I had discussion soon thereafter with Talha over the slightly shocking yet disappointing discovery, which mostly fell into a heated debate as all serious discussions with Talha do — he apparently mostly can’t seem to be able to take rigorous positive criticism in stride. However, he pointed out a link to a knowledge base article from VMware that explains sufficiently clearly why VMware has to confront this particular requirement.

I have my reservations about Xen being able to overcome this restriction in order to emulate 64-bit operating systems on 64-bit hardware without VT support (or the AMD equivalent). However, a rather pissed-off soul in ##xen on irc.freenode.net did, when I inquired after the issue, briefly comment that Xen should not have any problems. Beyond that, further inquisitions did not yield so much as a response from anyone. I may probably end up putting 64-bit version of some Linux on that machine and pulling off setting up Xen on it.

Additionally, if you are playing with VMware on 64-bit hardware, there are tools that will check beforehand for you whether the processor supports VT (or AMD equivalent) or not.