Who’s connecting to Apache?

Spot DDoS’s and the like quickly:

[code lang=”bash”] netstat -plan | grep :80 | awk ‘{print $5}’ | sed ‘s/:.*$//’ | sort | uniq -c | sort -rn |head [/code]

Change all of Plesk’s FTP passwords to random

[code lang=”bash”]for i in $(mysql -NB psa -uadmin -p`cat /etc/psa/.psa.shadow` -e ‘select login from sys_users;’); do export PSA_PASSWD=”$(openssl rand 6 -base64)”; /usr/local/psa/admin/bin/usermng –set-user-passwd –user=$i; echo “$i: $PSA_PASSWD” >> ftp_passwords; done [/code]

Thanks Geoff!

Make sure your crons run on time

If you add an entry to crontab that is an interval, such as */3 (every 3 minutes), you can verify that it runs at the specified interval with a bit of awk:

[code lang=”bash”]cat /var/log/cron |grep cron-script |awk -F\: ‘{if ($2/3 == 0) print $0}’ |grep -v “:00:”[/code]

This essentially checks to see that the minute field of the timestamp is divisible by three — the interval. It’ll also run at 00 after the hour, not divisible by three, but expected.

Cron can “run late” at times due to high load situations, so if there are any irregularities in your intervals, you may wish to investigate deeper, looking for expensive processes that are chewing up precious cron time.

The 10 Golden Rules for Troubleshooting Linux

1. Man pages exist and should be used. Seriously, everything’s there, from application docs to syscall docs to syntax and formatting of log files.

2. Don’t reinvent the wheel. 99% of problems you’re experiencing or ever will experience, somebody’s already gone through it and figured it out. Google is your friend.

3. If you don’t know what something’s doing, or why it’s not working, strace it!

4. Logs exist for a reason. Read them.

5. Applications crash, servers don’t. If your server crashes, it’s either bad hardware or a kernel bug (fairly rare on popular distros).

6. Always make backups. Always.

7. Always mount NFS mounts with the ‘intr’ option. Having to reboot because of a network blip is uncool.

8. Learn to use `grep’, `sed’ and `awk’. Learning to manipulate text is surprisingly important for a text-based interface.

9. Load average does not mean CPU usage. 100% memory usage does not mean you don’t have any more available for new applications. You can run out of inodes before you run out of disk space.

10. TCP wrappers suck. If you’ve been hacking at an issue for over 3 hours, look to your TCP wrappers. /etc/hosts, /etc/hosts.allow and /etc/hosts.deny will hold the answer.

ext3, lots of files, and you

Don’t do it.

While there’s no technical file limit on a per-directory basis with ext3 (there is on a filesystem level — see inodes), there is *significant* performance degradation as you add more and more files to a single directory. I think the most I’ve seen without any user-noticeable sluggishness was about 300,000. Note that this is well beyond the point where you can’t `ls’ anymore and you have to resort to `find’ and `xargs’. This should be your first warning sign.

Approaching 5 million files in one directory, things start to get weird. Creating new files in that directory generates significant load, though resource usage is low. However, statting a specific file (as opposed to a lookup with a glob) is decently fast. As long as you know what you want, it’ll work acceptably fast.

The more files you add, the slower lookup-based operations (new file creation, for example) will go — we’re talking seconds and tens of seconds here, not a few milliseconds more. As long as you give it an exact filename, though, it will be of an acceptable speed.

The filesystem option dir_indexes will help, though not hugely once you start getting into millions of files. Compared to no dir_indexing, it’s faster, but it doesn’t make it magically work. Converting to ext2 is a terrible idea and should not be considered — journals are good things and well worth the extremely slight (comparatively) performance hit endured.

The real solution, however, is to not put that many files into a single directory. Subdirectories are always a good idea (though keep in mind the subdirectory limit — 32k subdirs per dir!). Heck, most code can almost trivially be modified to pull content from a hash from the filename, such as /var/www/images/y/o/yourmom.jpg and /var/www/images/y/i/yipee.jpg. When designing an application, one should be mindful of the limitations of the underlying OS (and in this case, the filesystem being used).

Mass IP changing in Plesk

Moving a Plesk server behind a firewall is always a pain, since the IPs are associated with domains within the Plesk database. I used to hack the database every time I had to update IPs, but doing this for 50 IPs is… not so good.

Luckily, I stumbled up on this Parallels knowledge base article, which introduces reconfigurator.pl — it reads a mapping of IPs and updates system interfaces as well as all the internal Plesky goodness.

Manually mounting a USB drive in Linux

Most modern distros are quite smart and will recognize the variety of USB devices plugged in automagically. Today, for whatever reason, RHEL5 refused to do so, and I had to do it the hard way.

First, make sure you’ve got USB modules loaded:

[code lang=”bash”]modprobe uhci_hcd
modprobe ohci_hcd
modprobe ehci_hcd
modprobe usb-storage[/code]

With the last one, wait a few moments, then run `dmesg’. You should see some useful information:

Initializing USB Mass Storage driver...
scsi1 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 8
usbcore: registered new driver usb-storage
USB Mass Storage support registered.
usb-storage: waiting for device to settle before scanning
usb 1-3.3: reset high speed USB device using ehci_hcd and address 8
Vendor: Seagate Model: FreeAgent Go Rev: 102D
Type: Direct-Access ANSI SCSI revision: 04
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
sda: Write Protect is off
sda: Mode Sense: 1c 00 00 00
sda: assuming drive cache: write through
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
sda: Write Protect is off
sda: Mode Sense: 1c 00 00 00
sda: assuming drive cache: write through
sda: sda1
sd 1:0:0:0: Attached scsi disk sda
sd 1:0:0:0: Attached scsi generic sg0 type 0
usb-storage: device scan complete

If you’re lucky, you can simple do an `fstab -l’ and see the drive and it’s partitions at the stated point (sda). I wasn’t so lucky, as this server didn’t have device nodes for sda. These, however, are easily created:

[code lang=”bash”]/dev/MAKEDEV sda[/code]

Now you should be able to mount it:

[code lang=”bash”]mount /dev/sda1 /mnt/usb[/code]

What is Apache doing?

Ever wish you knew what Apache was working on at any given moment, but kicking yourself because you forgot to enable a server-status directive? This snippet will help you diagnose timeouts and long-running scripts (for bad coders like myself):

[code lang=”bash”]for i in `ps -elf |grep http|awk ‘{print $4}’|sort|uniq`; do ls -la /proc/$i/cwd ; done|awk ‘{print $11}’|sort|uniq -c |sort -nr [/code]

Segmentation faults with up2date/rpm

Had a nasty one tonight that I had to turn over to Paul. up2date was segfaulting on a RHEL3 server, along with some rpm queries, such as `rpm -qa’. I was able to narrow it down to a specific package, though that did not help at all. The rpm database had some severe corruption.

I removed the __db.00* files from /var/lib/rpm and rebuilt the database with rpm –rebuilddb; however, this did not resolve the issue. I attempted to recreate the database, by doing the following:

[code lang=”bash”]rpm -qa > rpm-list ; rpm –initdb ; cat rpm-list | while read line ; do rpm –nodeps –noscripts –notriggers –excludepath / $line ; done[/code]

However, the `rpm -qa’ hung, as noted earlier. Tough spot.

I had to move on, but Paul dove in. After a few hours of fitful hacking, he declared himself the winner — he had solved it. What did he do?

He removed all of the files in /var/lib/rpm except for Packages, then ran `rpm –rebuilddb’. The __db.00* files are just lock files, and while removing them can help, rpm transactions and queries still read all the other files and databases, thus rereading the corruption. Removing all the other databases (except the base Packages database) and then running a –rebuilddb operation actually rebuilds the databases… and the corruption cleared.

I also found this neat snippet to see if anything has a lock on the rpm databases:

[code lang=”bash”]cd /var/lib/rpm
/usr/lib/rpm/rpmdb_stat -Cl

Intercepting “command not found” in bash

On Debian, bash is patched with an interesting new function: command_not_found_handle. This intercepts exit code 127 (“command not found”) and allows you to do neat things. Debian uses it to pass it through the apt database, letting you know if a command you tried to invoke is not available, but can be found in the apt repos, and how to install it. Pretty spiffy.

This, of course, can be modified. Where I work, we use numbers to identify servers. I have a script that grabs login credentials from our internal systems and auto-logs me into servers based on their number. For example, I’d run `connect 12345′ to connect to server 12345.

By adding the following to my .bashrc:

[code lang=”bash”]function command_not_found_handle {
/home/kale/bin/command-not-found $1

And creating the following script, with regex in place to only care about numbers, placed in /home/kale/bin/command-not-found:

[code lang=”bash”]#!/bin/bash

MYBOOL=`echo $1 | awk ‘$1 ~ /^[0-9]+-*[0-9]*$/ {print $0}’ | wc -l | awk ‘{print $1}’`

if [ “$MYBOOL” == “1” ]; then
/home/kale/bin/connect $1
exit 127

(where `connect’ is the path to my connect script, previously written)

This now allows me to do this awesome deal:

[code lang=”bash”]kale@bastion:~$ 12345
Connecting to server 12345…

If you didn’t catch it, I don’t need to specify a command — just the argument. As there’s no application in my $PATH named `12345′, it falls through to the command_not_found_handle function, which then launches my connect script.

Who needs commands? I just saved hundreds of wasted seconds per night on typing “connect”!

find with spaces in filenames

Set bash’s internal field seperator to an enter instead of a space, so grep (or whatever) doesn’t freak out:

[code lang=”bash”]IFS=’
‘ ; for i in `find -name “*.php” ` ; do grep foo $i ; done[/code]

Removing 1-liner malware code in webpages

More and more client workstations are being infected with keyloggers and trojans. In addition to stealing your WoW username and password (oh noes, my purpz!), they also have been stealing FTP logins.

This has manifested itself in the linux server world by seemingly legit users logging in over FTP, downloading a file, then uploading it a few seconds later with 100-ish bytes appended. A look at xferlog reveals this behavior, usually against a regex of pages (index.*, default.*, etc), and the connecting IP will often be foreign. A look at the secure log will reveal that the password was not brute-forced; rather, it was known.

The real solution is to change all passwords and force the end user to reformat their computer, since they’re infected and do not realize it. Alas, this is not quite practical (though if someone could invent a remote formatter, I’ll give you $10 for it). Rather, advise the end user of the situation and suggest reformatting — or, at the very least, using a collection of anti-spyware, anti-virus, and anti-everything software on their workstation. Change the affected user’s password.

To clean up the leftover malicious code that was appended, find out the exact string (usually a `tail index.php’ will reveal it) — it’s often a <JavaScript> line or an <iframe>. Copy the string completely and we’ll just sed it out:

[code lang=”bash”]sed -i “s#