Practical awk (for Apache logs)

Who is hotlinking?
[code lang=”bash”]awk -F\” ‘($2 ~ /\.(jpg|gif)/ && $4 !~ /^http:\/\/www\.yourdomain\.com/){print $4}’ access_log.processed \
| sort | uniq -c | sort[/code]

Blank referrers (usually indicates direct hits, such as a user typing in, or a script):
[code lang=”bash”]awk -F\” ‘($6 ~ /^-?$/)’ access_log.processed | awk ‘{print $1}’ | sort | uniq[/code]

How many different IPs visited on a specific day (and how often they visited):
[code lang=”bash”]grep ’12/Dec/2008′ access_log.processed | \
awk ‘{cnt[$1]++;} END{for (ip in cnt){printf(“%-15s visited: %04d time(s).\n”, ip, cnt[ip])}}'[/code]

Amount of data transferred for a specific date:
[code lang=”bash”]grep ’12/Dec/2008’ access_log.processed | awk ‘{ SUM += $10} END { print SUM/1024/1024 }'[/code]

Find the 50 largest files

What’s eating up all your disk space?
[code lang=”bash”]find / -path /dev -prune -o -path /sys -prune -o -path /proc -prune -o -type f \
-size ‘+1024k’ -printf “%s %h/%f\n” | sort -rn -k1 | head -n50 | \
awk ‘{ printf(“%5dMB\t%s\n”, $1/1048576, substr($0, index($0, ” “)+1, length($0))) }'[/code]

Extract a single table from a sql dump

The only caveat is that you have to know the table that comes after the one you’re trying to extract.  It’s alphabetical, if you can get a list of tables, otherwise a quick search of the SQL file will get that info for you.
[code lang=”bash”]awk ‘/Table structure for table .table1./,/Table structure for table .table2./{print}’ bigassdatabase.sql > table1.sql[/code]

Perl module infos

Check to see if a perl module is installed:
[code lang=”perl”]perl -MMODULENAME -e1[/code]
(no output == success)

Check perl module versions:
[code lang=”perl”]perl -MMODULENAME -e’print “$MODULENAME::VERSION\n”;'[/code]

Find out where a perl module is installed (docs, sources, etc):
[code lang=”perl”]perl -MExtUtils::Installed -e’$,=”\n”;print ExtUtils::Installed->new()->directories(“MODULENAME”),” “‘[/code]

Silly RPM tricks

Find all non-Red Hat-supplied packages:
[code lang=”bash”]rpm -qa –qf ‘%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH} %{VENDOR}\n’ | grep -v ‘Red Hat, Inc\.’ | sort[/code]

Handy for diagnosing issues where things seem a little “off”.

CPU affinity-aware `ps’

[code lang=”bash”]ps -eo pid,tid,class,rtprio,ni,pri,pcpu,stat,wchan:14,comm,psr[/code]

The last number is the CPU the process is currently waiting on.  Quite useful when used in conjunction with `top’, as hitting the number 1 while in interactive mode will display the per-CPU usage.  Helpful to find iowait.