Wednesday, December 13, 2006

Sed and No DuPlication

I needed to remove duplicate names from a file so I thought sed should be a good choice for it.
uniq is too easy so I found out how sed does it. Herez how

$ sed '$!N; /^\(.*\)\n\1$/!P; D' filename

So now time for an explanation.

1 $!N - sed reads one line at a time and then works on it. It doesn't read in the newline at the end of the line into pattern space (what sed works on). So N command appends the newline and the next line to the pattern space.
$ denotes the last line and ! means NOT, so it means for the last line don't execute N command.Heck as if there is anything to read after the last line..read it if you can for all I care.

2. /^\(.*\)\n\1$/!P; - If you see start of pattern space and then anything followed by a newline which is followed by exactly that anything. Don't "print the first part of pattern space till the newline"(P) else print first part of pattern space till the newline

3. D - "just delete the first part of pattern space till the newline and restart the command cycle i.e go back to N"

0 Comments:

Post a Comment

<< Home