Browse Category: howto

Command-line Elasticsearch client

We use the ELK stack extensively at my job, thanks to my evangelizing and endless hard work. With all our servers logging to logstash and being pushed to Elasticsearch, logging into servers via ssh just to check logs is a thing of the past, and to help push that ideology along, I’ve hacked up a simple bash script to query Elasticsearch and return results in a manner that mimics running `tail` on a server’s logs. It quite literally just runs a query against Elasticsearch’s HTTP API, but I added some niceties so I can allow folks to make queries to ES without having to read a novel on how to do so.

Example:
[kstedman@kalembp:~/bin] $ ./es-cli.sh -h 10.x.x.x:9200 -q "+host:xxx6039 +type:syslog" -t 1000 -n 4
2014-07-09T05:05:26.000000+00:00 xxx6039 snmpd[15279]: Connection from UDP: [10.x.x.x]:57258
2014-07-09T05:05:26.000000+00:00 xxx6039 snmpd[15279]: Received SNMP packet(s) from UDP: [10.x.x.x]:57258
2014-07-09T05:05:26.000000+00:00 xxx6039 snmpd[15279]: Connection from UDP: [10.x.x.x]:57258
2014-07-09T05:05:27.000000+00:00 xxx6039 snmpd[15279]: Connection from UDP: [10.x.x.x]:57258

Of course, since this is text output on the console, you can use it as inputs/outputs to scripts, sed/grep/awk to your heart’s content, etc. Requires python datetime and json. Enjoy!

#!/bin/bash
# 
# es-cli.sh: Search server logs from the comfort of your terminal!
#
# This is a command-line wrapper for Elasticsearch's RESTful API.
# This is super-beta, version .000001-alpha. Questions/comments/hatemail to Kale Stedman,
# I'm so sorry. You should probably pipe the output to less.
# 
# usage: ./es-cli.sh -u $USER -p $PASS -h es-hostname -q "$query" -t $time -n 500
# ex: ./es-cli.sh -u kstedman -p hunter2 -h es.hostname.com -q "program:crond" -t 5 -n 50
# 
# -h host      The Elasticsearch host you're trying to connect to.
# -u username  Optional: If your ES cluster is proxied through apache and you have http auth enabled, username goes here
# -p password  Optional: If your ES cluster is proxied through apache and you have http auth enabled, password goes here
# -q query     Optional: Query to pass to ES. If not given, "*" will be used.
# -t timeframe Optional: How far back to search. Value is in mimutes. If not given, defaults to 5.
# -n results   Optional: Number of results to return. If not given, defaults to 500.


# Declare usage fallback/exit
usage() { echo "Usage: $0 -h host [ -u USER ] [ -p PASS ] [ -q "QUERY" ] [ -t TIMEFRAME ] [ -n NUMRESULTS ]" 1>&2; exit 1; }

# Parse options
while getopts ":u:p:h:q:t:n:" o; do
    case "${o}" in
        u)
            u=${OPTARG}
            ;;
        p)
            p=${OPTARG}
            ;;
        h)
            h=${OPTARG}
            ;;
        q)
            q="${OPTARG}"
            ;;
        t)
            t=${OPTARG}
            ;;
        n)
            n=${OPTARG}
            ;;
        *)
            usage
            ;;
    esac
done
shift $((OPTIND-1))

if [ -z "${p}" ] && [ ! -z "${u}" ] ; then
  echo -n "Password: "
  read -s p
  echo
fi

# Check for required variables
if [ -z "${h}" ] ; then
    usage
fi

# Set defaults if not set
if [ -z "${n}" ] ; then
  # default: 500 results returned 
  n=500
fi

if [ -z "${q}" ] ; then
  # default: query "*"
  q="*"
fi

if [ -z "${t}" ] ; then
  # default: 5 minutes ago
  t="5"
fi

# cross-platform time compatibilities
FROMDATE=`python -c "from datetime import date, datetime, time, timedelta; print (datetime.now() - timedelta(minutes=${t})).strftime('%s')"`
NOWDATE=`python -c "from datetime import date, datetime, time, timedelta; print (datetime.now()).strftime('%s')"`
ZEROS="000"
NOW=${NOWDATE}${ZEROS}
FROM=${FROMDATE}${ZEROS}

# Build query
query="{\"query\":{\"filtered\":{\"query\":{\"bool\":{\"should\":[{\"query_string\":{\"query\":\"${q}\"}}]}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"@timestamp\":{\"from\":$FROM,\"to\":$NOW}}}]}}}},\"size\":${n},\"sort\":[{\"syslog_timestamp\":{\"order\":\"asc\"}}]}"

if [ ! -z "${u}" ] ; then
  up="${u}:${p}@"
else
  up=""
fi

# run query and prettify the output
URL="http://${up}${h}/_all/_search?pretty"
curl -s -XGET "${URL}" -d ''"${query}"'' | python -mjson.tool |grep '"message"' | awk -F\: -v OFS=':' '{ $1=""; print $0}' | sed -e 's/^: "//g' | sed -e 's/", $//g' | sed -e 's/\\n/\
  /g'

long-running bash command notifier for osx

I stumbled across this fantastic blog post that offers a clever bash script to notify you of the completion of long-running commands in your bash shell. I made a couple tweaks to make it work for OSX, and gave it a little blacklist (I usually run `less’ or `vim’ for >10 seconds, for example).

Requires growl and growlnotify, bash, and this clever pre-exec hook for bash. Download that pre-exec hook:

mkdir -p ~/src/shell-tools
curl http://www.twistedmatrix.com/users/glyph/preexec.bash.txt > ~/src/shell-tools/preexec.bash

Now copy and paste this into ~/src/shell-tools/long-running.bash:

# Source this, and then run notify_when_long_running_commands_finish_install
#
# Relies on http://www.twistedmatrix.com/users/glyph/preexec.bash.txt
# Full credit to http://code.mumak.net/2012/01/undistract-me.html
# Modified slightly for OSX support and blacklist (see the egrep loop in the
# precmd() function

if [ -f ~/src/shell-tools/preexec.bash ]; then
    . ~/src/shell-tools/preexec.bash
else
    echo "Could not find preexec.bash"
fi

LONG_RUNNING_COMMAND_TIMEOUT=10

function notify_when_long_running_commands_finish_install() {
    local RUNNING_COMMANDS_DIR=~/.cache/running-commands
    mkdir -p $RUNNING_COMMANDS_DIR
    for pid_file in $RUNNING_COMMANDS_DIR/*; do
        local pid=$(basename $pid_file)
        # If $pid is numeric, then check for a running bash process.
        case $pid in
        ''|*[!0-9]*) local numeric=0 ;;
        *) local numeric=1 ;;
        esac

        if [[ $numeric -eq 1 ]]; then
            local command=$(ps -o command= $pid)
            if [[ $command != $BASH ]]; then
                rm -f $pid_file
            fi
        fi
    done

    _LAST_COMMAND_STARTED_CACHE=$RUNNING_COMMANDS_DIR/$$

    function precmd () {

        if [[ -r $_LAST_COMMAND_STARTED_CACHE ]]; then

            local last_command_started=$(head -1 $_LAST_COMMAND_STARTED_CACHE)
            local last_command=$(tail -n +2 $_LAST_COMMAND_STARTED_CACHE)

            if [[ -n $last_command_started ]]; then
                local now=$(date -u +%s)
                local time_taken=$(( $now - $last_command_started ))
                if [[ $time_taken -gt $LONG_RUNNING_COMMAND_TIMEOUT ]]; then
                  if [ `echo "$last_command" | egrep -c "less|more|vi|vim|man|ssh"` == 1 ] ; then 
                    exit 0
                  else
                    growlnotify \
                        -m "$last_command completed in $time_taken seconds" \
                        "Command complete:"
                  fi
                fi
            fi
            # No command is running, so clear the cache.
            echo -n > $_LAST_COMMAND_STARTED_CACHE
        fi
    }

    function preexec () {
        date -u +%s > $_LAST_COMMAND_STARTED_CACHE
        echo "$1" >> $_LAST_COMMAND_STARTED_CACHE
    }

    preexec_install
}

Finally, source it by adding the following to your ~/.bash_profile:

. ~/src/shell-tools/preexec.bash
. ~/src/shell-tools/long-running.bash
notify_when_long_running_commands_finish_install

also: site redesign! (read: i installed a new theme from the gallery, go team)

Add fields to a MySQL table without doing an ALTER TABLE

I have a database table that was created about 2 years ago and has been filling up quite quickly over the years. These days, it’s massive. Our database dumps are 68gb uncompressed, and 60gb of that is this table. It’s used quite regularly, as it contains all of the error reports we receive, but to call it “unwieldy” is an understatement.

I was content to just let sleeping dogs lie, but alas — one of my devs needs a couple extra fields added to the table for more data and sorting and whatnot. If this wasn’t a 60gb table in our production database, I’d happily run an ALTER TABLE and call it a day. (In fact, I attempted to do this — and then the site went down because the whole db was locked. oops)

Instead, I discovered a better way to add fields while retaining both uptime and data (!). MySQL’s CREATE TABLE command actually has a lot of interesting functionality that allows me to do this:

CREATE TABLE errors2 (
  keywords VARCHAR(255), 
  errorid VARCHAR(64), 
  stacktrace TEXT, 
  is_silent BOOL, 
  id INT(10) AUTO_INCREMENT, 
  PRIMARY KEY (id), 
  KEY playerid (playerid,datecreate), 
  KEY datecreate (datecreate), 
  KEY hidden (hidden,datecreate), 
  KEY hidden_debug (hidden,is_debug,datecreate)
) 
ENGINE=InnoDB 
AUTO_INCREMENT=2417067 
DEFAULT CHARSET=utf8 
SELECT * from errors; 

What this CREATE TABLE statement does is create a new table with 5 explicitly-specified fields (keywords, errorid, stacktrace, is_silent, and id). Four of these are what I wanted to add; ‘id’ exists in the original table, but I specify it here because I need to make it AUTO_INCREMENT (as this is a table setting, not a bit of data or schema that can be copied). Additional keys are specified verbatim from a SHOW CREATE TABLE errors (the original table), as is the AUTO_INCREMENT value.

After specifying my table creation variables, I perform a SELECT on the original table. MySQL is smart enough to know that if I’m SELECTing during a CREATE TABLE, I probably want any applicable table schema copied as well, so it does exactly that — copies over any columns missing from the schema I specified in my CREATE statement. Even better, because the various keys were specified, the indexes get copied over as well.

The result? An exact copy of the original table — with four additional fields added. All that’s left is to clean up:

DROP TABLE errors;
RENAME TABLE errors2 TO errors;

And that, as they say, is that.

Backup/restore Elasticsearch index

[UPDATED 2017-03-09]
I still get comments/questions regarding this process I hacked together many moons ago. I must request that anybody who’s looking for a way to backup Elasticsearch indices STOP and do not follow the process described — it was for ES 0.00000000001, written back in 2011. You should not do what I suggest here! I’m saving this purely for historical purposes.

What you should do instead is save your events in flat text — in Logstash, output to both your ES index for searching via Kibana or whatnot, and also output your event to a flat file, likely periodic (per-day or month or whatever). Backup and archive these text files, since they compress quite well. When you want to restore data from a period, just re-process it through Logstash — CPU is cheap nowadays with cloud instances! The data is the important part — processed or not, if you have the data in an easily stored format, you can re-process it.

[Original post as follows]

I’ve been spending a lot of time with Elasticsearch recently, as I’ve been implementing logstash for our environment. Logstash, by the way, is a billion times awesome and I can’t recommend it enough for large-scale log management/search. Elasticsearch is pretty awesome too, but considering the sheer amount of data I was putting into it, I don’t feel satisfied with its replication-based redundancy — I need backups that I can save and restore at will. Since logstash creates a new Elasticsearch index for each day worth of logs, I want the ability to backup and restore arbitrary indices.

Elasticsearch has a concept of a gateway, wherein you can configure a gateway that maintains metadata and snapshots are regularly taken. “Regularly” as in every 10 seconds by default. The docs recommend using S3 as a gateway, meaning every 10s it’ll ship data up to S3 for backup purposes, and if a node ever needs to recover data, it can just look to S3 and get the metadata and fill in data from that source. However, this model does not support the “rotation”-style backup and restore I’m looking for, and it can’t keep up with the rate of data I’m sending it (my daily indices are about 15gb apiece, making for about 400k log entries an hour).

So I’ve come up with a pair of scripts that allow me to manage logstash/Elasticsearch index data, allowing for arbitrary restore of an index, as well as rotation so as to keep the amount of data that Elasticsearch keeps track of manageable. As always, I wrote my scripts for my environment, so I take no responsibility if they do not work in yours and instead destroy all your data (a distinct possibility). I include these scripts here because I spent a while trying to figure this out and couldn’t find any information elsewhere on the net.

The following script backs up today’s logstash index. I’m retarded at timezones, so I managed to somehow ship my logs to logstash in GMT, so my “day” ends at 5pm, when logstash closes its index and opens a new one for the new day. Shortly after logstash closes an index (stops writing to it, not “close” in the Elasticsearch sense), I run the following script in cron, which backs up the index, backs up the metadata, creates a restore script, and sticks it all in S3:

#!/bin/bash
# herein we backup our indexes! this script should run at like 6pm or something, after logstash
# rotates to a new ES index and theres no new data coming in to the old one. we grab metadatas,
# compress the data files, create a restore script, and push it all up to S3.

TODAY=`date +"%Y.%m.%d"`
INDEXNAME="logstash-$TODAY" # this had better match the index name in ES
INDEXDIR="/usr/local/elasticsearch/data/logstash/nodes/0/indices/"
BACKUPCMD="/usr/local/backupTools/s3cmd --config=/usr/local/backupTools/s3cfg put"
BACKUPDIR="/mnt/es-backups/"
YEARMONTH=`date +"%Y-%m"`
S3TARGET="s3://backups/elasticsearch/$YEARMONTH/$INDEXNAME"

# create mapping file with index settings. this metadata is required by ES to use index file data
echo -n "Backing up metadata... "
curl -XGET -o /tmp/mapping "http://localhost:9200/$INDEXNAME/_mapping?pretty=true" > /dev/null 2>&1
sed -i '1,2d' /tmp/mapping #strip the first two lines of the metadata
echo '{"settings":{"number_of_shards":5,"number_of_replicas":1},"mappings":{' >> /tmp/mappost 
# prepend hardcoded settings metadata to index-specific metadata
cat /tmp/mapping >> /tmp/mappost
echo "DONE!"

# now lets tar up our data files. these are huge, so lets be nice
echo -n "Backing up data files (this may take some time)... "
mkdir -p $BACKUPDIR
cd $INDEXDIR
nice -n 19 tar czf $BACKUPDIR/$INDEXNAME.tar.gz $INDEXNAME 
echo "DONE!"

echo -n "Creating restore script... "
# time to create our restore script! oh god scripts creating scripts, this never ends well...
cat << EOF >> $BACKUPDIR/$INDEXNAME-restore.sh
#!/bin/bash
# this script requires $INDEXNAME.tar.gz and will restore it into elasticsearch
# it is ESSENTIAL that the index you are restoring does NOT exist in ES. delete it
# if it does BEFORE trying to restore data.

# create index and mapping
echo -n "Creating index and mappings... "
curl -XPUT 'http://localhost:9200/$INDEXNAME/' -d '`cat /tmp/mappost`' > /dev/null 2>&1
echo "DONE!"

# extract our data files into place
echo -n "Restoring index (this may take a while)... "
cd $INDEXDIR
tar xzf $BACKUPDIR/$INDEXNAME.tar.gz
echo "DONE!"

# restart ES to allow it to open the new dir and file data
echo -n "Restarting Elasticsearch... "
/etc/init.d/es restart
echo "DONE!"
EOF
echo "DONE!" # restore script done

# push both tar.gz and restore script to s3
echo -n "Saving to S3 (this may take some time)... "
$BACKUPCMD $BACKUPDIR/$INDEXNAME.tar.gz $S3TARGET.tar.gz
$BACKUPCMD $BACKUPDIR/$INDEXNAME-restore.sh $S3TARGET-restore.sh
echo "DONE!"

# cleanup tmp files
rm /tmp/mappost
rm /tmp/mapping

Restoring from this data is just as you would expect — download the backed up index.tar.gz and the associated restore.sh to the same directory, chmod +x the restore.sh, then run it. It will automagically create the index and put the data in place. This has the benefit of making backed up indices portable — you can “export” them from one ES cluster and import them to another.

As mentioned, because of logstash, I have daily indices that I back up; I also rotate them to prevent ES from having to search through billions of gigs of data over time. I keep 8 days worth of logs in ES (due to timezone issues) by doing the following:

#!/bin/bash
# Performs 'rotation' of ES indices. Maintains only 8 indicies (1 week) of logstash logs; this script
# is to be run at midnight daily and removes the oldest one (as well as any 1970s-era log indices,
# as these are a product of timestamp fail).  Please note the insane amount of error-checking
# in this script, as ES would rather delete everything than nothing...

# Before we do anything, let's get rid of any nasty 1970s-era indices we have floating around
TIMESTAMPFAIL=`curl -s localhost:9200/_status?pretty=true |grep index |grep log |sort |uniq |awk -F\" '{print $4}' |grep 1970 |wc -l`
if [ -n $TIMESTAMPFAIL ]
	then
		curl -s localhost:9200/_status?pretty=true |grep index |grep log |sort |uniq |awk -F\" '{print $4}' |grep 1970 | while read line
			do
				echo "Indices with screwed-up timestamps found; removing"
				echo -n "Deleting index $line: "
				curl -s -XDELETE http://localhost:9200/$line/
				echo "DONE!"
			done
fi


# Get list of indices; should we rotate?
INDEXCOUNT=`curl -s localhost:9200/_status?pretty=true |grep index |grep log |sort |uniq |awk -F\" '{print $4}' |wc -l`
if [ $INDEXCOUNT -lt "9" ] 
	then
		echo "Less than 8 indices, bailing with no action"
		exit 0
	else
		echo "More than 8 indices, time to do some cleaning"
		
		# Let's do some cleaning!
		OLDESTLOG=`curl -s localhost:9200/_status?pretty=true |grep index |grep log |sort |uniq |awk -F\" '{print $4}' |head -n1`
		echo -n "Deleting oldest index, $OLDESTLOG: "
		curl -s -XDELETE http://localhost:9200/$OLDESTLOG/
		echo "DONE!"
fi

Sometimes, due to the way my log entries get to logstash, the timestamp is mangled, and logstash, bless its heart, tries so hard to index it. Since logstash is keyed on timestamps, though, this means every once in a while I get an index dated 1970 with one or two entries. There’s no harm save for any overhead of having an extra index, but it also makes it impossible to back those up or to be able to make any assumptions about the index names. I nuke the 1970s indices from orbit, and then, if there are more than 8 indices in logstash, drop the oldest. I run this script at midnight daily, after index backup. Hugest caveat in the world about the rotation: running `curl -s -XDELETE http://localhost:9200/logstash-10.14.2011/’ will delete index logstash-10.14.2011, as you’d expect. However, if that variable $OLDESTLOG is mangled somehow and this command is run: `curl -s -XDELETE http://localhost:9200//’, you will delete all of your indices. Just a friendly warning!

Renaming a node in chef

Too bad there’s no `knife node rename ‘, eh?

Here’s what you gotta do instead:

knife client delete oldname
knife node delete oldname

On the node itself:

rm /etc/chef/client.pem
sed -i 's/oldname/newname/g' /etc/chef/client.rb
ls /etc/chef/validation.pem # ensure it's there!
chef-client -N newname

This will register the new node name with chef. The runlist will be empty, so you’ll have to rebuild it. Voila!

mod_rpaf and Amazon ELB

Amazon’s ELB service is nice — magical load balancers that just work, sitting in front of your servers, that you can update and modify on a whim. Of course, because it’s a load balancer (a distributed load balancer infrastructure, to be more precise), Apache and other applications sitting behind it see all the incoming traffic as coming from the load balancer — ie, $REMOTE_ADDR is 10.251.74.17 instead of the end client’s public IP.

This is normal behavior when sitting behind a load balancer, and it’s also normal behavior for the load balancer to encapsulate the original client IP in an X-Forwarded-For header. Using Apache, we can, for example, modify LogFormat definitions to account for this, logging %{X-Forwarded-For}i to log the end user’s IP.

Where this falls short, however, is when you want to *do* things with the originating IP beyond logging. The real-world scenario I ran into was using mod_qos to do rate-limiting based on URIs within Apache — mod_qos tests against the remote IP, not the X-Forwarded-For, so using the module as is, I’m unable to apply any QoS rules against anything beyond the load balancer… which of course defeats the purpose.

Luckily, I’m not the only person to have ever run into this issue. The Apache module mod_rpaf is explicitly designed to address this type of situation by translating the X-Forwarded-For header into the remote address as Apache expects, so that other modules can properly run against the originating IP — not the load balancer.

ELB makes implementation of mod_rpaf much more difficult that it should be, however. ELB is architected as a large network of load balancers, such that incoming outside requests bounce around a bit within the ELB infrastructure before being passed to your instance. Each “bounce” adds an additional IP to X-Forwarded-For, essentially chaining proxies. Additionally, there are hundreds of internal IPs within ELB that would need to be accounted for to use mod_rpaf as is, as you must specify the proxy IPs to strip.

So I patched up mod_rpaf to work with ELB. I’ve been running it for a day or so in dev and it appears to be working as expected, passing the original client value to mod_qos (and mod_qos testing and working against that), but of course if you run into issues, please let me know (because your issues will probably show up in my environment as well).

Here is the patch:

--- mod_rpaf-2.0.c	2008-01-01 03:05:40.000000000 +0000
+++ mod_rpaf-2.0.c~	2011-08-25 20:04:39.000000000 +0000
@@ -136,13 +136,25 @@
 }
 
 static int is_in_array(const char *remote_ip, apr_array_header_t *proxy_ips) {
-    int i;
+   /* int i;
     char **list = (char**)proxy_ips->elts;
     for (i = 0; i < proxy_ips->nelts; i++) {
         if (strcmp(remote_ip, list[i]) == 0)
             return 1;
     }
     return 0;
+    */
+    return 1;
+}
+
+static char* last_not_in_array(apr_array_header_t *forwarded_for,
+			       apr_array_header_t *proxy_ips) {
+    int i;
+    for (i = (forwarded_for->nelts)-1; i > 0; i--) {
+	if (!is_in_array(((char **)forwarded_for->elts)[i], proxy_ips))
+	    break;
+    }
+    return ((char **)forwarded_for->elts)[i];
 }
 
 static apr_status_t rpaf_cleanup(void *data) {
@@ -161,7 +173,7 @@
     if (!cfg->enable)
         return DECLINED;
 
-    if (is_in_array(r->connection->remote_ip, cfg->proxy_ips) == 1) {
+    /* if (is_in_array(r->connection->remote_ip, cfg->proxy_ips) == 1) { */
         /* check if cfg->headername is set and if it is use
            that instead of X-Forwarded-For by default */
         if (cfg->headername && (fwdvalue = apr_table_get(r->headers_in, cfg->headername))) {
@@ -183,7 +195,8 @@
             rcr->old_ip = apr_pstrdup(r->connection->pool, r->connection->remote_ip);
             rcr->r = r;
             apr_pool_cleanup_register(r->pool, (void *)rcr, rpaf_cleanup, apr_pool_cleanup_null);
-            r->connection->remote_ip = apr_pstrdup(r->connection->pool, ((char **)arr->elts)[((arr->nelts)-1)]);
+            /* r->connection->remote_ip = apr_pstrdup(r->connection->pool, ((char **)arr->elts)[((arr->nelts)-1)]); */
+            r->connection->remote_ip = apr_pstrdup(r->connection->pool, last_not_in_array(arr, cfg->proxy_ips));
             r->connection->remote_addr->sa.sin.sin_addr.s_addr = apr_inet_addr(r->connection->remote_ip);
             if (cfg->sethostname) {
                 const char *hostvalue;
@@ -201,7 +214,7 @@
             }
 
         }
-    }
+    /* } */
     return DECLINED;
 }

Or, if you’d prefer ez-mode, I rolled some RPMs of mod_rpaf that include this patch:

mod_rpaf-0.6-0.7.i386.rpm
mod_rpaf-0.6-0.7.x86_64.rpm

And, for completeness, mod_rpaf.conf:

LoadModule rpaf_module        modules/mod_rpaf-2.0.so

RPAFenable On
RPAFsethostname On
RPAFproxy_ips 10.
RPAFheader X-Forwarded-For

Extra logging wrapper script for SES Postfix transport

I’m using Amazon’s SES service for my servers’ emails. To implement, instead of re-writing all of our code to hook into the SES API, I simply configured Postfix to use the example script ses-send-mail.pl provided by Amazon. It works fine and dandy, with mails happily going out to their intended recipients via SES.

However, that’s not good enough for me. You see, if you send a mail through SES and it bounces, you’ll receive the bounce message at the original From: address, as expected, but because a lot of ISPs/ESPs strip the original To: header in their bounce templates to prevent backscatter, and SES mangles the message ID set on the email by Postfix (replacing it with their own), it’s very possible to get bounce messages that have no information on the intended recipient. How do you do bounce management when you have no information that links the bounce to the original email that you sent?

While Amazon strips the message ID assigned by Postfix, it adds its own message ID — AWSMessageID. This value is returned by the SES API when you submit an email to the service; the provided example scripts, however, don’t do anything with this ID.

To address this issue in my environment, I wrote the following script, which I set as my Postfix transport (rather than ses-send-email.pl).

#!/bin/bash
# send mail via SES and create a log with returned messageid for bounce processing

MAILFROM=$1
RCPTTOLOG=`echo $* | awk '{$1=""; print $0}' | awk '{sub(/^[ \t]+/, "")};1'`
RCPTTO=`echo $RCPTTOLOG | sed -e 's/\ /,/g'`
SCRIPT=/usr/local/amazon/ses-send-email.pl
SCRIPTOPTS="-r"
TIMESTAMP=`date +"%Y-%m-%d %H:%M:%S"`
ACCESSFILE=/usr/local/amazon/access
THEMAIL=`cat -`
SUBJECT=`echo "$THEMAIL" |awk '($0 ~ /Subject: /) {$1=""; print $0}' |awk '{sub(/^[ \t]+/, "")};1'`

OUTPUT=`echo "$THEMAIL" | $SCRIPT $SCRIPTOPTS --verbose -k $ACCESSFILE -f $MAILFROM $RCPTTO`
if echo "$OUTPUT" |grep -q Error
then
	exit 1 # SES error, postfix should defer this msg
fi

MESSAGEID=`echo $OUTPUT |awk '{print $4}' |awk -F\> '{print $2}' |awk -F\< '{print " AWSMessageID=" $1}'`

# log
echo "$TIMESTAMP from=$MAILFROM to=\"$RCPTTOLOG\" subject=\"$SUBJECT\" $MESSAGEID" >> /var/log/ses_maillog

Set ACCESS to the location of the file containing your AWS key and secret, and of course configure paths as needbe. The transport should be configured as such in master.cf:

# AWS-SES
aws-email  unix  -       n       n       -       -       pipe
  flags=R user=mail argv=/usr/local/amazon/ses-log-n-send.sh ${sender} ${recipient}

You’ll get a log file at /var/log/ses_maillog that looks something like this:

2011-08-23 16:26:24 from=bugs@butt.com to="butts@gmail.com" subject="this is my email subject"  AWSMessageID=00000131f8f261e2-75f27db7-b6d2-43ca-9c26-9a4a92ecbfd0-000000
2011-08-23 16:26:23 from=bugs@butt.com to="morebutts@gmail.com" subject="Re: this is my email subject"  AWSMessageID=00000131f8f761b9-acfceec3-73ab-4d5e-8959-f7bb9ee00665-000000
2011-08-23 16:26:25 from=bugs@butt.com to="toomanybutts@gmail.com" subject="another email subject"  AWSMessageID=00000131f8f76669-1540d563-41c0-4ba9-adc0-122ee41f4b28-000000

Now you can grep grep grep away for the AWSMessageID to match the one in the bounce email to find the original recipient and update your lists accordingly.

SOLVED: Macbook Air kernel_task slowness

I love my Macbook Air more than I’ve loved any laptop before (my first experience with a 12″ iBook in 2005 was a thing of beauty, but pales in comparison to my relationship with my Air). However, its CPU throttling to prevent heat drives me batty due to its aggressiveness — play a Flash video, for example, for a few minutes and it will start stuttering as the CPU temperature rises. Let it continue and the entire computer will slow to a crawl. If you open Activity Monitor or look at top, you’ll see a process owned by root called ‘kernel_task’ using 150% CPU usage or so. Kill the Flash video and the kernel_task will slowly scale back and things will return to normal.

What’s going on here is an interesting approach to temperature management. As temperature rises due to load on the CPU, the kernel runs some low-cost operations over and over — think ‘gettimeofday()’-style functions. Since the kernel has top priority, system CPU usage spikes while userland CPU usage is forced down, lowering the actual activity that the CPU is doing and thus lowering the temperature. A decent idea, I guess, but in practice it’s way too aggressive.

Luckily, doing a bit of digging in /System/Library/Extensions, I came across an extension called ‘AppleIntelPenrynProfile.kext’ that, looking at the Info.plist, ties into power management and performance monitoring — the IOClass is ‘AppleIntelPenrynPerformanceMonitor’ and the IOProviderClass ‘AppleACPICPU’.

This kernel extension is loaded on boot into userland, but interestingly, if you boot into safe mode (hold shift during boot), it is not loaded — and the kernel_task CPU spikes don’t occur, even under heat-generating load. You can verify this by running `kextstat’, which lists all loaded kernel extensions — run it in Terminal while booted normally and you should see an extension called ‘com.apple.driver.AppleIntelPenrynProfile’ loaded. Boot into safe mode by holding down shift before the chime and run `kextstat’ again — no com.apple.driver.AppleIntelPenrynProfile, and no kernel_task CPU spike when generating heat (play a Flash video).

So… why load that module at all?

Back in normal OSX, launch Terminal and run the following:

cd /System/Library/Extensions/AppleProfileFamily.kext/Contents/PlugIns
sudo kextunload AppleIntelPenrynProfile.kext

Verify it’s unloaded:

kextstat | grep Penryn

This should return no output if the module was successfully unloaded. Now, go play a Flash video and enjoy a less-crippled Air!

* Disclaimer: I take NO responsibility if you brick your Mac, it catches on fire, never boots again, or otherwise break. I’ve had no problems and it’s been working quite well, but your experience/hardware/whatever may be different.

** Disclaimer 2: I highly recommend running SMCFanControl and pushing your fans to max when running heat-intensive operations. I do not recommend running your Air at 80ºC or hotter for extended periods — the kernel will no longer discourage this activity by slowing things down. It’s unlikely that you will fry your CPU due to extensive hot use, as the CPU’s thermal shutdown is lower than its point of combustion, but this doesn’t mean you should push that threshold.

*** Disclaimer 3: The Penryn profile is for the Rev 2 Macbook Airs. Rev 1 is Merom, with the extension called AppleIntelMeromProfile.kext.

ImageMagick, PDFs, and third-party fonts

If you use ImageMagick to convert PDFs, you’ll know it’s as simple as

convert file.pdf file.jpg

However, if you’re using a third-party non-GhostScript-sanctioned font, this won’t work terribly well and fail with a rather cryptic GhostScript error such as:

ERROR: /invalidfileaccess in –file–
Operand stack:
–dict:5/5(L)– F2 10.0 –dict:6/6(L)– –dict:6/6(L)– STSongStd-Light-Acro-UniGB-UCS2-H –dict:10/12(ro)(G)– –nostringval– –dict:7/7(L)– –dict:7/7(L)– Adobe-GB1 CIDFont Adobe-GB1 Adobe-GB1 –nostringval– (/usr/share/fonts/chinese/TrueType/uming.ttf) (r)
Execution stack:
%interp_exit .runexec2 –nostringval– –nostringval– –nostringval– 2 %stopped_push –nostringval– –nostringval– –nostringval– false 1 %stopped_push 1 3 %oparray_pop 1 3 %oparray_pop 1 3 %oparray_pop –nostringval– –nostringval– 2 1 6 –nostringval– %for_pos_int_continue –nostringval– –nostringval– –nostringval– –nostringval– 1 %stopped_push –nostringval– –nostringval– –nostringval– %array_continue –nostringval– false 1 %stopped_push –nostringval– %loop_continue –nostringval– –nostringval– –nostringval– –nostringval– –nostringval– %array_continue –nostringval– –nostringval– –nostringval– –nostringval– %loop_continue –nostringval– 12 9 %oparray_pop –nostringval– –nostringval– –nostringval– –nostringval– –nostringval–
Dictionary stack:
–dict:1127/1686(ro)(G)– –dict:0/20(G)– –dict:107/200(L)– –dict:107/200(L)– –dict:104/127(ro)(G)– –dict:241/347(ro)(G)– –dict:20/24(L)– –dict:4/6(L)– –dict:24/31(L)– –dict:38/50(ro)(G)–
Current allocation mode is local
Last OS error: 2
ESP Ghostscript 815.02: Unrecoverable error, exit code 1
convert: Postscript delegate failed `/home/kale/poop.pdf’.
convert: missing an image filename `/home/kale/poop.pnm’.

In this instance, the PDF poop.pdf contains a Chinese font which Ghostscript knows about but won’t let me use since I installed it after-the-fact from RPM (chinese-fonts.noarch). GhostScript by default is run from ImageMagick with the -dSAFER flag which is a mildly paranoid flag that prevents GS from using files outside of its root (/usr/share/ghostscript, usually). To work around this error, rather than dealing with GS’s confusing and arcane font directory configuration, I recommend simply amending ImageMagick’s delegates definition for the PDF filetype:

/usr/lib64/ImageMagick-6.2.8/config/delegates.xml:

<delegate decode=”pdf” encode=”ps” mode=”bi” command='”gs” -q -dBATCH -dSAFER -dMaxBitmap=500000000 -dNOPAUSE -dAlignToPixels=0 -dEPSCrop -sDEVICE=”pswrite” -sOutputFile=”%o” -f”%i”‘ />

to

<delegate decode=”pdf” encode=”ps” mode=”bi” command='”gs” -q -dBATCH -dMaxBitmap=500000000 -dNOPAUSE -dAlignToPixels=0 -dEPSCrop -sDEVICE=”pswrite” -sOutputFile=”%o” -f”%i”‘ />

(remove the -dSAFER flag)

Henceforth, `convert poop.pdf poop.jpg’ will work with third-party fonts without issue.

Enforcing Secure Passwords in Horde

A coworker, Alex, discovered that Horde, in conjunction with Plesk, allows users to change their passwords arbitrarily — but doesn’t enforce any sort of password policy, allowing such passwords as “test” or even “” (null). This, obviously, is a huge security risk as mail compromises can lead to fairly terrible things.

From his article:

If you (or a client you are representing) want to set horde to do the typical “strict password” enforcement, look for the file:

horde/passwd/backends.php

And read the bit about password policy. An example policy that can be set in this file that would require 1 capital, 1 lowercase, 1 special character and 1 number, with a minimum password size of 8, would look like:

‘password policy’ => array(
‘minLength’ => 8,
‘maxLength’ => 64,
‘maxSpace’ => 0,
‘minUpper’ => 1,
‘minLower’ => 1,
‘minNumeric’ => 1,
‘minSymbols’ => 1
),

Stop snmpd from spamming syslog

Dell’s OpenManage tools come with a MIB to allow for the system to query against and set up traps. However, by default, snmpd will not allow arbitrary smux peers, and your logs will be spammed with the following:

Apr 29 19:19:37 server snmpd[4321]: [smux_accept] accepted fd 9 from 127.0.0.1:39622
Apr 29 19:19:37 server snmpd[4321]: refused smux peer: oid SNMPv2-SMI::enterprises.674.10892.1, descr Systems Management SNMP MIB Plug-in Manager
Apr 29 19:19:40 server snmpd[4321]: [smux_accept] accepted fd 9 from 127.0.0.1:39693
Apr 29 19:19:40 server snmpd[4321]: refused smux peer: oid SNMPv2-SMI::enterprises.674.10892.1, descr Systems Management SNMP MIB Plug-in Manager
Apr 29 19:19:43 server snmpd[4321]: [smux_accept] accepted fd 9 from 127.0.0.1:39790
Apr 29 19:19:43 server snmpd[4321]: refused smux peer: oid SNMPv2-SMI::enterprises.674.10892.1, descr Systems Management SNMP MIB Plug-in Manager

Add the following to you /etc/snmpd/snmpd.conf:

smuxpeer .1.3.6.1.4.1.674.10892.1

and restart snmpd.

Run Urchin on-demand for all profiles at once

There’s no built-in way in Urchin to re-run the processing job for all domains (such as after fixing a problem). This can, however, be done on the command line with a while loop:

[code lang=”bash”]ls -alh ../usr/local/urchin/data/reports/ |awk ‘{print $NF}’ |while read line ; do /usr/local/urchin/bin/urchin -p”$line” ; done[/code]

HOWTO: Install VNC server on RHEL5

Install a desktop environment and the VNC server:

yum groupinstall "GNOME Desktop Environment"
yum  install vnc-server

Change to the user who will be owning the session and run `vncserver’ to set up their password and create the default files

su kale -
vncserver

Edit the xstartup file for that user to point to GNOME:

vi /home/kale/.vnc/xstartup

# Uncomment the following two lines for normal desktop:
 unset SESSION_MANAGER
 exec /etc/X11/xinit/xinitrc

Kill that VNC session

killall Xvnc

Edit vncserver’s config file

vi /etc/sysconfig/vncservers

VNCSERVERS="2:kale"
VNCSERVERARGS[2]="-geometry 800x600 -nohttpd"

Start VNC

service vncserver start

And connect! Since it’s specified in /etc/sysconfig/vncservers that kale’s session is on display 2 (the 2:kale bit), the port for this connection is 5902. Connect to this port and enter in the password you specified earlier and voila!

WHOIS visiting your site?

I’m fond of WHOIS data for getting an idea who’s visiting a site, though most WHOIS servers return data that’s full of disclaimers and irrelevant data. Rather, I much prefer Team Cymru’s batch WHOIS lookup server, whois.cymru.com.

First, extract your IPs:
[code lang=”bash”]F=ips.out ; echo “begin”>>$F ; echo “verbose”>>$F ; awk ‘{print $1}’ tech-access_log |sort |uniq>>$F ; echo “end” >>$F[/code]

Now send them to Cymru for processing:
[code lang=”bash”]nc whois.cymru.com 43 < $F | sort > whois.out[/code]

Review whois.out at your leisure for detailed IP information. It’s well-formatted, allowing for easily scripting against:

91      | 128.113.197.128  | 128.113.0.0/16      | US | arin     | 1986-02-27 | RPI-AS - Rensselaer Polytechnic Institute
91      | 128.113.247.58   | 128.113.0.0/16      | US | arin     | 1986-02-27 | RPI-AS - Rensselaer Polytechnic Institute
9121    | 88.232.9.77      | 88.232.0.0/17       | TR | ripencc  | 2005-10-27 | TTNET TTnet Autonomous System
9       | 128.2.161.88     | 128.2.0.0/16        | US | arin     | 1984-04-17 | CMU-ROUTER - Carnegie Mellon University
9136    | 91.186.50.28     | 91.186.32.0/19      | DE | ripencc  | 2006-11-07 | WOBCOM WOBCOM GmbH - www.wobcom.de
9143    | 212.203.31.1     | 212.203.0.0/19      | NL | ripencc  | 2000-08-08 | ZIGGO Ziggo - tv, internet, telefoon