Browse Tag: apache

Apache request-based throttling

Ok, theoretically my last post about mod_rpaf was supposed to lead to mod_qos working. It did, in the most technical way… it just made it instantly obvious that mod_qos was not the solution I was looking for! mod_qos performs qos on a URI but applies it to all connecting clients, not just offenders. It’s best used for resource limiting… not in API throttling to put a stop to abuse, which is my intent.

I grudgingly turned to mod_security. I’ve known all along that mod_security would be the best tool to help me reach my goal; however, mod_security is the least user-friendly piece of software that I’ve ever used, with a highly esoteric language and odd processing rules. Forced to sit down and make it work, however, I’ve come up with a few rules that may help others who wish to perform request-based throttling.

SecAction "phase:2,pass,nolog,initcol:IP=%{REQUEST_HEADERS.X-Forwarded-For}"
SecAction "phase:2,nolog,setvar:IP.hitcount=+1,deprecatevar:IP.hitcount=1/1"
SecRule IP:hitcount "@gt 3" "phase:2,pause:3000,nolog,allow,msg:'API abuser, throttling'"

First, I initialize a collection called “IP”, based on the X-Forwarded-For header. Because I’m using mod_rpaf, I could technically use the remote address, but “just in case” I opted for the X-Forwarded-For, since that’s much more important to me. It also prevents the load balancer from getting blocked… ever.

Second line is where I do the IP increment — and decrement. As you can see, for every hit from that IP I increment the IP.hitcount variable by 1; the ‘deprecatevar:IP.hitcount=1/1’ tells the variable to decrement the count by one per second. If the user makes one hit per second, they will never hit the limit. If they make 2 hits per second, the net gain will be 1 one first second, 2 the next, 3 the next, etc.

The last line, of course, is where we do our test. If the hitcount is greater than 3, I’m allowing the request to go through, but adding a 3000ms pause — 3 seconds.

I configured these rules within my VirtualHost definition, and used Location tags to specify the URIs that require throttling. It works like a champ. In each of the rules, I’ve specified ‘nolog’, as it’s pretty spammy, though you’ll want to change that to ‘log’ for testing. Because I’m disabling mod_security’s spammy logging, I’m timing requests with a custom log format:

LogFormat "%h %l %u %t \"%r\" %>s %B \"%{Referer}i\" \"%{User-Agent}i\" %D" combined-time
CustomLog "/var/log/httpd/access_log" combined-time

The %D at the end of the LogFormat spits out the total time taken by Apache to fulfill the request in microseconds, which will include the artificial delay. With this CustomLog definition, you can now easily visualize throttled requests:

tail -f access_log |awk '($NF > 3000000)'

mod_rpaf and Amazon ELB

Amazon’s ELB service is nice — magical load balancers that just work, sitting in front of your servers, that you can update and modify on a whim. Of course, because it’s a load balancer (a distributed load balancer infrastructure, to be more precise), Apache and other applications sitting behind it see all the incoming traffic as coming from the load balancer — ie, $REMOTE_ADDR is instead of the end client’s public IP.

This is normal behavior when sitting behind a load balancer, and it’s also normal behavior for the load balancer to encapsulate the original client IP in an X-Forwarded-For header. Using Apache, we can, for example, modify LogFormat definitions to account for this, logging %{X-Forwarded-For}i to log the end user’s IP.

Where this falls short, however, is when you want to *do* things with the originating IP beyond logging. The real-world scenario I ran into was using mod_qos to do rate-limiting based on URIs within Apache — mod_qos tests against the remote IP, not the X-Forwarded-For, so using the module as is, I’m unable to apply any QoS rules against anything beyond the load balancer… which of course defeats the purpose.

Luckily, I’m not the only person to have ever run into this issue. The Apache module mod_rpaf is explicitly designed to address this type of situation by translating the X-Forwarded-For header into the remote address as Apache expects, so that other modules can properly run against the originating IP — not the load balancer.

ELB makes implementation of mod_rpaf much more difficult that it should be, however. ELB is architected as a large network of load balancers, such that incoming outside requests bounce around a bit within the ELB infrastructure before being passed to your instance. Each “bounce” adds an additional IP to X-Forwarded-For, essentially chaining proxies. Additionally, there are hundreds of internal IPs within ELB that would need to be accounted for to use mod_rpaf as is, as you must specify the proxy IPs to strip.

So I patched up mod_rpaf to work with ELB. I’ve been running it for a day or so in dev and it appears to be working as expected, passing the original client value to mod_qos (and mod_qos testing and working against that), but of course if you run into issues, please let me know (because your issues will probably show up in my environment as well).

Here is the patch:

--- mod_rpaf-2.0.c	2008-01-01 03:05:40.000000000 +0000
+++ mod_rpaf-2.0.c~	2011-08-25 20:04:39.000000000 +0000
@@ -136,13 +136,25 @@
 static int is_in_array(const char *remote_ip, apr_array_header_t *proxy_ips) {
-    int i;
+   /* int i;
     char **list = (char**)proxy_ips->elts;
     for (i = 0; i < proxy_ips->nelts; i++) {
         if (strcmp(remote_ip, list[i]) == 0)
             return 1;
     return 0;
+    */
+    return 1;
+static char* last_not_in_array(apr_array_header_t *forwarded_for,
+			       apr_array_header_t *proxy_ips) {
+    int i;
+    for (i = (forwarded_for->nelts)-1; i > 0; i--) {
+	if (!is_in_array(((char **)forwarded_for->elts)[i], proxy_ips))
+	    break;
+    }
+    return ((char **)forwarded_for->elts)[i];
 static apr_status_t rpaf_cleanup(void *data) {
@@ -161,7 +173,7 @@
     if (!cfg->enable)
         return DECLINED;
-    if (is_in_array(r->connection->remote_ip, cfg->proxy_ips) == 1) {
+    /* if (is_in_array(r->connection->remote_ip, cfg->proxy_ips) == 1) { */
         /* check if cfg->headername is set and if it is use
            that instead of X-Forwarded-For by default */
         if (cfg->headername && (fwdvalue = apr_table_get(r->headers_in, cfg->headername))) {
@@ -183,7 +195,8 @@
             rcr->old_ip = apr_pstrdup(r->connection->pool, r->connection->remote_ip);
             rcr->r = r;
             apr_pool_cleanup_register(r->pool, (void *)rcr, rpaf_cleanup, apr_pool_cleanup_null);
-            r->connection->remote_ip = apr_pstrdup(r->connection->pool, ((char **)arr->elts)[((arr->nelts)-1)]);
+            /* r->connection->remote_ip = apr_pstrdup(r->connection->pool, ((char **)arr->elts)[((arr->nelts)-1)]); */
+            r->connection->remote_ip = apr_pstrdup(r->connection->pool, last_not_in_array(arr, cfg->proxy_ips));
             r->connection->remote_addr->sa.sin.sin_addr.s_addr = apr_inet_addr(r->connection->remote_ip);
             if (cfg->sethostname) {
                 const char *hostvalue;
@@ -201,7 +214,7 @@
-    }
+    /* } */
     return DECLINED;

Or, if you’d prefer ez-mode, I rolled some RPMs of mod_rpaf that include this patch:


And, for completeness, mod_rpaf.conf:

LoadModule rpaf_module        modules/

RPAFenable On
RPAFsethostname On
RPAFproxy_ips 10.
RPAFheader X-Forwarded-For

Add domains and users

Quick one liner to take a list of domains and create Apache vhosts from a template, create users, set their home dir, permissions etc

cat domains.out |while read line ; do DOMAIN=$line ; NODOTDOMAIN=`echo $DOMAIN | sed -e 's/\.//g'` ; mkdir -p /var/www/vhosts/$DOMAIN ; sed -e "s/$DOMAIN/g" /etc/httpd/vhost.d/default.vhost > /etc/httpd/vhost.d/$DOMAIN.conf ; useradd -d /var/www/vhosts/$DOMAIN $NODOTDOMAIN ; chown $NODOTDOMAIN:$NODOTDOMAIN /var/www/vhosts/$DOMAIN ; PASSWERD=`head -n 50 /dev/urandom | tr -dc A-Za-z0-9 | head -c8` ; echo $PASSWERD | passwd $NODOTDOMAIN --stdin ; echo "Domain: $DOMAIN" ; echo "User: $NODOTDOMAIN" ; echo "Password: $PASSWERD" ; echo ; done

Log PHP memory usage per-request

You can easily log how much memory each request for a PHP page takes by modifying the LogFormat:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b %{mod_php_memory_usage}n \"%{Referer}i\" \"%{User-Agent}i\"" combined-php

Note the latter definition, which includes %{mod_php_memory_usage}n — this will print out the amount of memory, in bytes, required to execute the script as requested. Big help in finding memory leaks. To use, just change the log definition to the newly-created “combined-php” format:

CustomLog logs/fever-access_log combined-php

Do note that this may (probably will) break Apache log parsers that are expecting the standard combined format. If using for troubleshooting, I recommend logging to an alternate location so as to not screw up log statistics.

Credit to Brad Ison for this find

Block POSTs from blank referrers

I found a great article on Secure Computing: Sec-C that includes some excellent, simple Apache configurations and RewriteRules to blog various annoyances and compromises. A wonderful example is this bit, designed to stop POST requests that have no referrer set. There’s no reason for anyone to be trying to post arbitrary data to a script to not have a referrer, as that would indicate a direct hit — which is bad juju.

# Identify if a Referer is used
SetEnvIf Referer "^$" no_referer=1
<Limit POST>
Order Allow,Deny
Allow from all
Deny from env=no_referer

Lots of other fascinating security and forensics insights on the Sec-C blog as well!

Push email on iPhone and other smartphones… without Exchange

Tonight, I found a clever open-source project entitled Z-Push. This small collection of PHP sits in a web directory and responds to ActiveSync queries — the protocol used for Exchange. It then checks and delivers email.

This is useful because of the limitations of some smartphones — such as the iPhone — wherein Exchange-hosted mail is delivered instantly, while standard POP3 or IMAP mail accounts suffer a long polling delay.

On the server side, configuration is fairly simple:

  1. wget
  2. tar xzvf z-push-1.3RC.tar.gz
  3. mv z-push /var/www/html
  4. yum install php-imap
  5. chown apache:apache /var/www/html/z-push/state
  6. vi /var/www/html/z-push/config.php and configure the following:
    define(’IMAP_SERVER’, ‘localhost’);
    define(’IMAP_PORT’, 143);
    define(’IMAP_OPTIONS’, ‘/notls/norsh’);

  8. Add the following Alias to an Apache SSL VirtualHost:
  9. Alias /Microsoft-Server-ActiveSync /var/www/html/z-push/index.php

  10. Restart Apache

On your phone, simply create a new Exchange-type account that points to your server as if it was an Exchange server. Send a test mail and marvel at how fast it appears on your phone! Tested on iPhone and Motorola Droid with excellent success.

Enable WebDAV with Plesk

Configuring WebDAV in Apache is simple, but it’s even easier to configure and manage with Plesk!

1. Create a Protected Directory
Log into Plesk and select the domain that is to receive the DAV repository. Click on “Protected Directories” and create a new one – name it as the DAV share will be named, for they are one and the same.

2. Configure WebDAV Users
Add users who should have access to this DAV repo.

3. Edit vhost.conf and Reconfigure Plesk
On the server, edit the domain’s vhost.conf and enter the following:

[code]<Directory “/var/www/vhosts/”>
DAV on
AllowOverride None

Regenerate Apache’s configuration and you’re golden:

[code]/usr/local/psa/admin/bin/websrvmng -av[/code]

4. Test
You can easily test DAV configuration by using a DAV client such as `cadaver’.

[code][kale@superhappykittymeow ~]$ cadaver
Authentication required for on server `’:
Username: kale
dav:/DAVDir/> ls
Listing collection `/DAVDir/’: collection is empty.[/code]

Success! You can manage access to the DAV share through the Plesk interface.

mod_auth_mysql and segfaults

Symptom: seemingly random PHP scripts are causing Apache to segfault.

Looking deeper: all the PHP scripts that are causing segfaults make database queries (specifically, MySQL).

Look even closer: the following line is in your Apache configuration:

[code]LoadModule auth_mod_mysql modules/[/code]

Solution: comment that line out of your Apache configuration and restart Apache.

Why: If the PHP code is run through Apache, you’ve essentially got one process making the SQL queries (if your PHP code makes it so). However, while your code made the connection and is expecting responses and whatnot, Apache, with mod_auth_mysql loaded, is ready and willing to make and take database connections. When a connection that returns a response is made from your PHP code, Apache will attempt to accept the response and handle it itself, instead of passing it to PHP. Since Apache is not expecting the data it’s getting, it has no error handling code for this situation and simply segfaults.

Disable mod_auth_mysql by commenting it out and everything will work without issue.

Apache MultiViews and RewriteRules

Don’t work together.

I think it’s a bug in mod_rewrite, to be honest, though more of a “not thinking these two modules would ever be used together” kind of oversight, rather than a full bug.

Essentially, if you are using MultiViews to make for pretty URLs (say,, where ‘bar’ doesn’t exist, but instead loads the content from bar.php), and you attempt to implement RewriteRules to modify the URL, you will see erratic results.

If, for example, you have a RewriteRule as follows:

[code]RewriteCond %{HTTP_HOST} !^www\.foo\.com
RewriteRule (.*)$1 [R=301,L][/code]

which, essentially, takes all non-WWW requests and makes them, you will find that MultiView URLs will be redirected to their real resources if the URL matches a rule. For example,

will become

after going through the MultiView filter and the RewriteRules. This is due to the way the rules work — essentially, the request will be parsed through mod_rewrite to find a match. If no match against the URL, the MultiView is processed to get the real resource which is then presented to the end user. If a match is made, however, mod_rewrite has mod_negotiation process the MultiView to find the real resource so it can properly do the rewrite — it is never changed back, however, to the pretty MultiView URL. If your goal is pretty URLs without any effort expended, relying on MultiView, you will find that RewriteRules are your nemesis.

There are a few routes available to get around this odd behavior, but my favorite (and easiest to implement) is to move the RewriteRule logic to the site code. It’s much harder to implement MultiView-esque functionality than it is to re-implement RewriteRules.

To implement the above RewriteRule, redirecting non-www to www, simply add an auto_prepend_file to your .htaccess in lieu of the RewriteRule as such:

[code]php_value auto_prepend_file “/var/www/html/prepend.php” [/code]

This file contains simply:

[code lang=”php”][/code]

With this code prepended to every PHP script (assuming your site is written in PHP, of course), all non-www requests will be redirected to www — *after* the MultiView is processed and not interfering with its inner workings.

Enable core dumps with apache, RHEL5

From this post on Jared’s tech blog:

[code lang=”bash”]echo “ulimit -c unlimited >/dev/null 2>&1” >> /etc/profile
echo “DAEMON_COREFILE_LIMIT=’unlimited'” >> /etc/sysconfig/init
echo 1 > /proc/sys/fs/suid_dumpable
echo “core.%p” > /proc/sys/kernel/core_pattern
echo “CoreDumpDirectory /var/apache-core-dumps” > \
mkdir /var/apache-core-dumps
chown apache: /var/apache-core-dumps
source /etc/profile
/etc/init.d/httpd restart[/code]

Now you can test it by sending a SIGSEGV to a random apache child process:

[code lang=”bash”]tail -f /var/log/httpd/error_log | grep -i seg &
ps auxwww |grep httpd (pick a random pid not owned by root)
kill -11 2014
[Mon Jul 06 21:05:39 2009] [notice] child pid 2014 exit signal
Segmentation fault (11), possible coredump in /var/apache-core-dumps
cd /var/apache-core-dumps

You can then get a backtrace using gdb:

[code lang=”bash”]gdb /usr/sbin/httpd core.2014
(gdb) > bt full[/code]

Brilliant – thanks Jared, I fought Apache for an hour to enable CoreDumps before putting my fist through the monitor!

Who’s connecting to Apache?

Spot DDoS’s and the like quickly:

[code lang=”bash”] netstat -plan | grep :80 | awk ‘{print $5}’ | sed ‘s/:.*$//’ | sort | uniq -c | sort -rn |head [/code]

What is Apache doing?

Ever wish you knew what Apache was working on at any given moment, but kicking yourself because you forgot to enable a server-status directive? This snippet will help you diagnose timeouts and long-running scripts (for bad coders like myself):

[code lang=”bash”]for i in `ps -elf |grep http|awk ‘{print $4}’|sort|uniq`; do ls -la /proc/$i/cwd ; done|awk ‘{print $11}’|sort|uniq -c |sort -nr [/code]

Practical awk (for Apache logs)

Who is hotlinking?
[code lang=”bash”]awk -F\” ‘($2 ~ /\.(jpg|gif)/ && $4 !~ /^http:\/\/www\.yourdomain\.com/){print $4}’ access_log.processed \
| sort | uniq -c | sort[/code]

Blank referrers (usually indicates direct hits, such as a user typing in, or a script):
[code lang=”bash”]awk -F\” ‘($6 ~ /^-?$/)’ access_log.processed | awk ‘{print $1}’ | sort | uniq[/code]

How many different IPs visited on a specific day (and how often they visited):
[code lang=”bash”]grep ’12/Dec/2008′ access_log.processed | \
awk ‘{cnt[$1]++;} END{for (ip in cnt){printf(“%-15s visited: %04d time(s).\n”, ip, cnt[ip])}}'[/code]

Amount of data transferred for a specific date:
[code lang=”bash”]grep ’12/Dec/2008’ access_log.processed | awk ‘{ SUM += $10} END { print SUM/1024/1024 }'[/code]