Wednesday, August 17, 2005

Removing newlines from a file using sed

I kept forgetting this as I never really realized that sed is a line editor.
So the newline at the end of the line isnt available by default unless we create a multiline pattern space using N.

Anyways here is the sed command that does the magic.

$sed ':a;N;$!ba;s/\n//g' filename

Among other things it uses labels and N command. :)

Friday, August 12, 2005

Python regex to match a floating point number

Here is the expression to match a float upto two places of decimal.
((?<\.|\d)\d+ (?:\.\d{1,2})?)So let me explain it in steps:

'(?<!..)' is a negative look behind,i.e,at that position the preceding text should not match the regex enclosed in '(?<! )'.In this case it is '\.|\d' which stands for a literal dot or a digit.This is not part of the match,the regex that needs to match starts with '\d+'.
So what we are saying is match 1 or more digits but make sure those digits are not preceded by a literal dot or a digit. Well one can understand why digits shouldnot be preceded by a literal '.' (dot) ,for example we dont want .99 to match but why do we need the digit part. The reason is quite subtle but without it .99 will be matched by the regex.

This is because when regex engine will try to match .99 initially it will fail \d+ matches 99 but as it is preceded by negative lookbehind for a literal '.',the match cant succeed so the engines shifts to next character and makes \d+ match only the rightmost 9 in 99.
Also as the rightmost 9 is preceded by a 9 the negative look behind is also satisfied so \d+ will be end up matching just the 9.
This is a false positive as we definitely dont want to match that so we make the negative look behind include a digit to rule out this result

Bash arrays

First off inside bash arrays ,elements are separated by space.
Have a look at this.

$cat new.sh
length=1
positions=(0 1 2)
positions[length+1]=4
echo "${positions[*]}"

./new.sh
0 1 4

Point to note is that 'length' as well as '$length' areboth valid indexes.

Thursday, August 11, 2005

bre,ere and pcre

Its been a while since I had a look at "Mastering Regular Expressions" by Jeff Friedl but I am glad i went through it once.

So the input is like

filename="http://really.big.url/test?foo=bar"

So my first attempt at extracting the url was

$cat output| sed 's/^.*filename=//'


So someone pointed out that the url could contain 'filename=',yeah thats quite possible and proposed a solution with back-references as sed doesnt have non greedy quantifiers which are common to pcre(perl compatible regular expressions).Sed only supports bre(basic regular expressions) and ere(extended regular expression).Dont ask me what they are :).



$cat output| sed -r 's/^[[:space:]]*filename="([^"]*)"[[:space:]]*$/\1/g


In addition to making space for "spaces" :),this moves the greedy "*" to the left hand side of "filename=" so that any extra "filename=" are matched by the greedy "*".

The same thing in perl could be simply done by

$cat output|perl -wpe 's/^.*?filename=//;'


Wednesday, August 10, 2005

Limits of sed

Well a user had a huge ascii file 1.5 gigs and all of that on a single line.Wow!

So the job was to make it a span many lines,by putting a newline after every 310 characters.

So the suggested sed expression was:

$sed 's/\(.\{310\}\)/\1\n/' filename


But it appears that sed crapped out with following error:

sed: Couldn't re-allocate memory


Maybe something like C is better suited for this as sed is basically a line editor,so maybe tries to read and operate on it in a single go :).

A bit of sed and awk magic

The task is to find the total cpu utilization of httpd processes.
Instead of doing all kinds of acrobatics with grep,bc and bash loops , one can simply use awk.

$ps aux | awk '/httpd/{s+=$3}END{print s}'


One a side note , gnu sed can change a file in-place .This is done by using the "i" option.

$sed -i 's/fun/bun/' test.txt

Thursday, August 04, 2005

Disable enter key in bash

Well this is very straightforward, we just need to unbind keybinding for
"\r".Thats easily done using bind.

bind -r '\r'

Now to get the binding back i used.

bind '"\r":accept-line'

Obviously i had to hit Ctrl-J instead of "Enter" :).