Blog entries published in 2011
Feeds: RSS | Atom

Java SIGBUS - an unclear way of saying /tmp is full

Published: 2011-05-02 19:27 UTC. Tags: linux java

I had the following happen for every new java process on one of my servers the other day:

server:~$ java
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007f3e0c5aad9b, pid=17280, tid=139904457242368
#
# JRE version: 6.0_24-b07
# Java VM: Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x7ed9b]  memset+0xa5b
#
# An error report file with more information is saved as:
# /home/user/hs_err_pid17280.log
Segmentation fault

Turns out this is Java's way of telling you that the /tmp directory is full. It's trying to mmap some performance/hotspot-related file in /tmp which succeeds, but when it's trying to access this area, it will get the SIGBUS signal.

More info here

0 comments.

Hadoop Streaming Error Codes

Published: 2011-01-31 08:12 UTC. Tags: hadoop

I'm using Hadoop Streaming a lot. It's exit codes has been something of a mystery, so today I decided to find out by looking at the source code.

The exit codes are listed in StreamJob.java, and are as follows:

  1. Success
  2. Job not successful, i.e. something went wrong with M/R code.
  3. Bad input path
  4. Invalid jobconf
  5. Output path already exists
  6. Error launching job. Could be any error, for example some HDFS communication error.
0 comments.

Continous Integration with Hudson - embarrasingly simple!

Published: 2011-01-27 19:24 UTC. Tags: open source software testing

I'm working on a rather large reporting and analytics application that runs on top of Hadoop at work. It has tests. A whole bunch of them, actually. That's good.

So far, we've been running the tests manually when making new releases. But doing it more often is always better, since it gives you an indication on when things went wrong, and also forces you to keep your tests in a state where they pass. Some people call it Continous Integration.

Now, you can do all the work getting your builds to build and run tests yourself, via cron and scripts and other types of messiness. Or you can try an existing solution. Today I decided to try Hudson.

That turned out to be embarrasingly simple to get started with. Basically, it's a matter of:

  1. Downloading hudson.war from their site.
  2. Start it by running java -jar hudson.war
  3. Go to http://localhost:8080 with a web browser of your choice. That would be Opera in my case. You have to eat your own dog-food.
  4. Go to the Hudson management screen and enable the git plugin
  5. Setup a new project. Tell it where the code is and on which branch.
  6. Configure what commands to run to build and test. Make the test command output an xunit xml file.
  7. Tell Hudson where that xml file is.

Result: Hudson will periodically poll git and run my build and test commands, then show a changelog and what tests failed. All this after 30 minutes of setup time. I'm impressed.

0 comments.

Slow Puppetmaster? Check your reverse DNS

Published: 2011-01-13 19:26 UTC. Tags: puppet

Yesterday some of the servers I care for at work were moved to a different network. After the move, all puppetd runs started to take a very long time. Where it would usually take 10-15 seconds, it now timed out with errors like:

Jan 12 19:39:16 host1 puppetd[15760]: Calling puppetmaster.getconfig
Jan 12 19:41:16 host1 puppetd[15760]: Configuration retrieval timed out

(Note the two minutes between the informational message about calling puppetmaster.getconfig, and the timeout)

Highly confusing, especially since puppetd was slow not only on hosts which had moved to the new network, but also on hosts which had not moved.

The reason turned out to be slow reverse DNS for the new network range. Puppetmaster it seems is doing lot's and lot's of DNS lookups for clients, and that seems to be a synchronous operation. I think what caused all hosts to slow down was that puppetmaster got busy looking up one of the hosts on the new network, and that would cause the request from a host that had not moved to be put on hold.

Fixing the DNS issue solved the problem.

This is on puppet 0.24.5. Later versions might have a better behaviour.

0 comments.