Browse Category: tips and tricks

ext3, lots of files, and you

Don’t do it.

While there’s no technical file limit on a per-directory basis with ext3 (there is on a filesystem level — see inodes), there is *significant* performance degradation as you add more and more files to a single directory. I think the most I’ve seen without any user-noticeable sluggishness was about 300,000. Note that this is well beyond the point where you can’t `ls’ anymore and you have to resort to `find’ and `xargs’. This should be your first warning sign.

Approaching 5 million files in one directory, things start to get weird. Creating new files in that directory generates significant load, though resource usage is low. However, statting a specific file (as opposed to a lookup with a glob) is decently fast. As long as you know what you want, it’ll work acceptably fast.

The more files you add, the slower lookup-based operations (new file creation, for example) will go — we’re talking seconds and tens of seconds here, not a few milliseconds more. As long as you give it an exact filename, though, it will be of an acceptable speed.

The filesystem option dir_indexes will help, though not hugely once you start getting into millions of files. Compared to no dir_indexing, it’s faster, but it doesn’t make it magically work. Converting to ext2 is a terrible idea and should not be considered — journals are good things and well worth the extremely slight (comparatively) performance hit endured.

The real solution, however, is to not put that many files into a single directory. Subdirectories are always a good idea (though keep in mind the subdirectory limit — 32k subdirs per dir!). Heck, most code can almost trivially be modified to pull content from a hash from the filename, such as /var/www/images/y/o/yourmom.jpg and /var/www/images/y/i/yipee.jpg. When designing an application, one should be mindful of the limitations of the underlying OS (and in this case, the filesystem being used).

Mass IP changing in Plesk

Moving a Plesk server behind a firewall is always a pain, since the IPs are associated with domains within the Plesk database. I used to hack the database every time I had to update IPs, but doing this for 50 IPs is… not so good.

Luckily, I stumbled up on this Parallels knowledge base article, which introduces — it reads a mapping of IPs and updates system interfaces as well as all the internal Plesky goodness.

Segmentation faults with up2date/rpm

Had a nasty one tonight that I had to turn over to Paul. up2date was segfaulting on a RHEL3 server, along with some rpm queries, such as `rpm -qa’. I was able to narrow it down to a specific package, though that did not help at all. The rpm database had some severe corruption.

I removed the __db.00* files from /var/lib/rpm and rebuilt the database with rpm –rebuilddb; however, this did not resolve the issue. I attempted to recreate the database, by doing the following:

[code lang=”bash”]rpm -qa > rpm-list ; rpm –initdb ; cat rpm-list | while read line ; do rpm –nodeps –noscripts –notriggers –excludepath / $line ; done[/code]

However, the `rpm -qa’ hung, as noted earlier. Tough spot.

I had to move on, but Paul dove in. After a few hours of fitful hacking, he declared himself the winner — he had solved it. What did he do?

He removed all of the files in /var/lib/rpm except for Packages, then ran `rpm –rebuilddb’. The __db.00* files are just lock files, and while removing them can help, rpm transactions and queries still read all the other files and databases, thus rereading the corruption. Removing all the other databases (except the base Packages database) and then running a –rebuilddb operation actually rebuilds the databases… and the corruption cleared.

I also found this neat snippet to see if anything has a lock on the rpm databases:

[code lang=”bash”]cd /var/lib/rpm
/usr/lib/rpm/rpmdb_stat -Cl

find with spaces in filenames

Set bash’s internal field seperator to an enter instead of a space, so grep (or whatever) doesn’t freak out:

[code lang=”bash”]IFS=’
‘ ; for i in `find -name “*.php” ` ; do grep foo $i ; done[/code]

Memory management in Linux

This is a prefab for me to paste into tickets whenever a customer is confused about “free” and “used” memory.

Memory in Linux isn’t just black and white, “used” vs “free”. Rather, there are a few states, such as cached and buffered, in addition to used and free. Each of these states has a specific purpose — buffered memory is used for block devices, while cached memory is used for disk objects, to speed up access. Free and used are just as they sound.

The catch, however, is that both cached and buffered memory can be released instantly, should an application or the system require more memory just to run. For all intents and purposes, both cached and buffered memory can be considered “free”, even though they’re actively in use to speed up the running applications by reducing the amount of disk accesses.

top breaks down all of this, whereas Webmin abbreviates this information. The actual “free” memory is, essentially, used minus buffers minus cached. Right now, for example, your server is reporting 2007MB total RAM, 108MB of which is free. 42MB is buffered and the majority, 1612MB is cached — while only 108MB is completely free, 1763MB is available to be freed. Completely free memory is a waste of the fastest medium available in your server, and Linux makes sure to take advantage of it!


Very handy utility to see exactly what’s going over your intertubes. Requires libpcap (obviously).


Bash portscanner

Well, why not?

[code lang=”bash”]HOST=;for((port=1;port<=65535;++port));do echo -en "$port ";if echo -en "open $HOST $port\nlogout\quit" | telnet 2>/dev/null | grep ‘Connected to’ > /dev/null;then echo -en “\n\nport $port/tcp is open\n\n”;fi;done[/code]

Recover an ext3 journal

dmesg scrolling with “journal aborted”, filesystem in read-only

Give this a go (may need a rescue environment):
[code lang=”bash”]tune2fs -f O ^has_journal /dev/sda1
tune2fs -j /dev/sda1[/code]

Recover ext3 filesystem with missing superblock

[code lang=”bash”]mount: wrong fs type, bad option, bad superblock on /dev/sda1, or too many mounted filesystems[/code]

Usually, this is code for “you’re fucked”. Here’s something you can try, however:

List the proposed superblocks (filesystem must be unmounted):
[code lang=”bash”]mke2fs -n /dev/sda1[/code]

fsck the filesystem using a backup superblock (caution, should try with -n switch to fsck first):
[code lang=”bash”]fsck -b 24577 /dev/sda1[/code]

If it fails, scan for the superblocks and use one of those:
[code lang=”bash”]dumpe2fs /dev/sda1 |grep super[/code]

Then again, if using a backup superblock doesn’t work, you’re probably fucked, as originally thought.

MegaMon RAID monitoring for MegaRAID-based cards

Cleverly hidden RAID monitoring tool for MegaRAID cards. Creates a log file at /var/log/megaserv.log that spits out all kinds of useful data — patrol reads, battery cycles, SMART status changes, sense key changes, etc. Can be configured to email x address upon errors, such as… well… a failed drive, for example. Also installs MegaCtrl, which is a CLI interface to the RAID card and allows for scriptable actions, such as deleting a logical drive.

Installing MegaMon is easy, as it’s a standard RHEL rpm, contained within the PERC/CERC tools found here. Included in that tgz is the MegaMon rpm. Install and `service raidmon start` and you’re good to go!

Disk labels

See the current label (if any):
[code lang=”bash”]e2label /dev/sda[/code]

Set a disk label:
[code lang=”bash”]e2label /dev/sda /boot[/code]

Can use in fstab as follows:

[code lang=”bash”]LABEL=/boot /boot ext3 defaults 0 0[/code]

Word of warning: disk labels add another layer of abstraction — except it’s not really abstracting the device at all. Note that some operations, such as a ghost, will not take into account disk labels and they will not be copied.

Custom drivers in the RHEL rescue environment

If one or more of devices has a custom driver not present in the rescue environment, put them on an external media source such as a thumb drive, floppy, CD, etc. Once at the rescue environment prompt, start it with

[code lang=”bash”]linux dd[/code]

Follow the prompts for success.

  • 1
  • 2