ext3, lots of files, and you

Don’t do it.

While there’s no technical file limit on a per-directory basis with ext3 (there is on a filesystem level — see inodes), there is *significant* performance degradation as you add more and more files to a single directory. I think the most I’ve seen without any user-noticeable sluggishness was about 300,000. Note that this is well beyond the point where you can’t `ls’ anymore and you have to resort to `find’ and `xargs’. This should be your first warning sign.

Approaching 5 million files in one directory, things start to get weird. Creating new files in that directory generates significant load, though resource usage is low. However, statting a specific file (as opposed to a lookup with a glob) is decently fast. As long as you know what you want, it’ll work acceptably fast.

The more files you add, the slower lookup-based operations (new file creation, for example) will go — we’re talking seconds and tens of seconds here, not a few milliseconds more. As long as you give it an exact filename, though, it will be of an acceptable speed.

The filesystem option dir_indexes will help, though not hugely once you start getting into millions of files. Compared to no dir_indexing, it’s faster, but it doesn’t make it magically work. Converting to ext2 is a terrible idea and should not be considered — journals are good things and well worth the extremely slight (comparatively) performance hit endured.

The real solution, however, is to not put that many files into a single directory. Subdirectories are always a good idea (though keep in mind the subdirectory limit — 32k subdirs per dir!). Heck, most code can almost trivially be modified to pull content from a hash from the filename, such as /var/www/images/y/o/yourmom.jpg and /var/www/images/y/i/yipee.jpg. When designing an application, one should be mindful of the limitations of the underlying OS (and in this case, the filesystem being used).

Leave a Reply