Systems Programming - Learning Journal

DateComment
2002.03.08

First lab session. All the questions were straightforward, given that I'm very comfortable with the UNIX shell, and have done a fair bit of perl programming before.

The last question was probably the most interesting, and I had to consult man perlvar to find out what $. refers to. Following is my revised code:

#!/usr/bin/perl -w

# Reads /etc/passwd and outputs each line preceded by its line number

open (INFILE, "/etc/passwd") or die "Canna open file: $!\n";

print "$. \t $_" while (<INFILE>);

For reference, here is the relevant section of the manpage:

$. The current input line number for the last file handle from which you read (or performed a seek or tell on). The value may be different from the actual physical line number in the file, depending on what notion of "line" is in effect--see the section on $/ on how to affect that. An explicit close on a filehandle resets the line number. Because "<>" never does an explicit close, line numbers increase across ARGV files (but see examples under eof()). Localizing $. has the effect of also localizing Perl's notion of "the last read filehandle". (Mnemonic: many programs use "." to mean the current line number.)

2002.03.15

Second lab session was held today. None of the questions were any real difficulty, but just a quick refresher for me in shell scripting.

I spent the majority of the lab session helping my friends in the class get familiar with the scripting. It was an interesting angle to look at the problem from, because I was trying to refer to what they'd learnt already, but the questions seemed quite a grade above what was taught in the lecture. For instance, the 'adder' script was very tricky to write without knowing about either $@, $# or shift, none of which were described.

I think there could have been more emphasis placed on the rationale for the spacing in the commands as well, since this seemed to be the thing most of the others had trouble with. An explanation or clear emphasis that the equals assignment is interpretted directly by the shell, whereas arguments to test in square brackets are in fact arguments would help to explain the apparent inconsistencies in spacing.

I suppose it seems to me, from a bit different point-of-view to most students, that the fundamentals are being glossed over in order to cover a great deal of material. Things such as the fact that every program has a return value, and this is how boolean values, etc work under shell was not clear, I thought.

The basics of perl were covered quite well. I found the examples no problem using the material covered so far. I had a bit of a nifty solution for 7, that I thought I might post:

#!/usr/bin/perl -w

# Reads numbers from STDIN, sums negative and positive separately

$sum{$_ / abs}+=$_ while(<>);

print "Positive sum: $sum{1}\n";

print "Negative sum: $sum{-1}\n";

Obviously one-liner potential, there... ;-)

2002.03.19

I noticed that I hadn't included any examples of shell scripting yet, and this is the area where I hope to improve most in this subject. Here's the adder script I wrote in last week's lab session:

#!/bin/sh

# Reads a string of numbers as arguments, sums them



total=0

for num in "$@"

do

	total=`expr $total + $num`

done

echo $total

During the past week I also had a bit of a play around with awk to see what it was about. Seems somewhat useful for extracting stuff from reports. In fact, I used it today to count the number of hits from each IP address in my apache log with the following command:

% awk '{ print $1 }' /var/log/httpd/access_log | sort | uniq -c | sort -rn

A little bit shorter than perl, perhaps.

2002.03.22

David James pointed out a shorter way to do the above shell command, using cut:

% cut -f1 -d' ' /var/log/httpd/access_log | sort ...

It saves five characters! ;-) I reckon it looks a bit less cryptic as well... I don't like cut as much as awk, however, because there doesn't seem to be an option to ignore leading whitespace, encountered for example in the question below.

Third lab session was on today, as well (you can guess the url now ;)). Lecture covered regular expressions, and I always think it's funny how many different "standards" there are for regexps. It seems completely random whether one program supports alternation (i.e. /abc|def/ which doesn't work in grep) or requires backslashes to indicate grouping with parentheses and braces (i.e. /\(.*\)foo\{1,\}/ in sed vs /(.*)foo{1,}/ in perl) or uses multipliers like +, * and ? to indicate various different amounts of things. It makes learning regular expressions rather hard, I think. Personally, I learnt the most of what I know about regexps in perl, and then expanded this to include the other shell utilities.

I'm sure someone will see a shorter way, but this is my answer to question 5:

% ls -sal /etc | awk '{print $10,$2"\t"$7,$8,$9}' | sed 's/ ...x.\{6\}/*/; s/ [a-z-]\{10\}//'

Looking at it, it seems like a reasonable solution, but I'm sure there are conditions where those final regular expressions could accidentally mangle a filename (file starting with space, maybe) so it's not perfect. It also messes up the first input line, which happens to be the total blocks used by all the files. A bit of explanation might be in order for people reading this...

First, the ls command's output is piped into awk, which is used to reformat the columns of input. The default input field separator is space, which works fine for the output of ls. $1, $2, etc represent the split fields from each line of input, and my script orders them like this: "field10 field2[tab]field7 field8 field9". This works because the default output separator for a list (those items separated by commas) is a space as well. Thus it places a space between fields 10 and 2, and 7, 8 and 9, but $2"\t"$7 puts just the tab between field2 and field7. $10 is the filename, $2 the permissions and $7, $8 and $9 make up the modification time.

The next step is two regular expressions through sed. The first one matches a string starting with a space and an x in the fourth position of 10 characters following the space. It replaces all this with an asterisk (*). This matches the permissions string (field $2 above) since the filename is first and has no space before it, and the fourth letter in the modification date string is always the space between the month and day. So this regexp marks with an asterisk files that are executable by owner.

The second regular expression matches a space, followed by 10 characters that are either lowercase letters or dashes. This is replaced by nothing. This matches the permission strings of those files that don't have the requisite executable bit set, and removes them from the string. Now we have the output we need!

Question 8 had a long and complex regular expression, but I thought that the question wasn't a very suitable example, since split would prove much better for separating space-delimited input. Unfortunately, $group is allowed to contain spaces so this won't be as good as I thought.

Instead, a good idea is to put comments throughout the regexp using the /x modifier. For example:

/^                        # Start of line

Networker\s+Savegroup:\s+ # Initial prefix which we ignore

\((\w+)\)\s               # Alphabetic level is taken from inside brackets

(.*)\s                    # Group name can contain any or no characters

(\w+),\s+                 # Alphabetic status

\d+\sclient\(s\)\s        # nnn client(s) (ignored)

\((.*)\)                  # Failed with any or no characters inside brackets

\n$/x;                    # End of line (\n strictly not required)

That also explains fairly well how the expression works.

I'm still working on nice solutions for some of the other problems. Will post when they're done.

2002.??.??

[Insert comment here!]

Last updated 2002.03.22 by Matt Ryall