Monday, June 25, 2007

More about profiling with SIGPROF

First, read this post about profiling with JVMTI and SIGPROF.

A poster on that entry asked me to elaborate a little on exactly what happens when the timer goes off and the signal is raised. In this post, I'll explain that in a little more detail. First, I'll back up a little.

Edited to add: Okay, I wrote this whole thing, and very, very briefly answered the actual question at the end. So, if you are interested in the answer to the actual question, and not a long discourse on how to use SIGPROF, look at the end of this entry.

What is a Signal?

A signal is a very basic method used either to indicate odd behavior in a program synchronously in UNIX-alike systems. If a process divides by zero, for example, a SIGFPE (Floating Point Exception) is synchronously sent to that process. Signals can also be used asynchronously; for example, one process can send another a signal. The other process determines how it wants to handle that signal when it receives it. That process can either register a signal handler to handle that signal, or let the default action occur.

On most modern UNIX systems, there are 32 signals. If you want to see the full list in a glibc system, see the header file /usr/include/bits/signum.h. They each map to a number; SIGKILL, for example, is number 9. You can use the command line program "kill" to send a signal to any process. If you send a SIGKILL to a program using "kill -9 ", for example, it will take the default action of terminating that process, unless it has registered a signal handler.

SIGPROF is the one we care about for the purposes of this post. It is used for sample profiling. UNIX systems can be set up to send this signal at a regular interval; the application can register a handler to record profiling information in the handler.

How Do I Register a SIGPROF Signal Handler?

On Linux, there are several steps involved in registering your own signal handler. The basic idea is that a sigaction structure needs to be passed to the sigaction system call. Warning: I am not making a pretense that this is complete. Many man pages should be studied before attempting this. On Linux, a sigaction looks like this:

struct sigaction {
void (*sa_handler)(int);
void (*sa_sigaction)(int, siginfo_t *, void *);
sigset_t sa_mask;
int sa_flags;
void (*sa_restorer)(void);
}

The first field, sa_handler, is the handler that you want to define. We're only going to fill in that one — if you are interested in the others, then look at the sigaction man page.

You define the handler by declaring a void function that takes a single integer argument:

void handler(int signal) {
printf("received signal %d", signal);
}

You then define your sigaction struct, and pass it to sigaction:

struct sigaction sa;
sa.sa_handler = &handler;
sigemptyset(&sa.sa_mask); // Oh, just look it up.
struct sigaction old_handler;
sigaction(SIGPROF, &sa, &old_handler);

If there was already a registered handler, it will be in old_handler. You then set a timer to go off at a specified interval, and send your process a SIGPROF:

static struct itimerval timer;
timer.it_interval.tv_sec = 1; // number of seconds is obviously up to you
timer.it_interval.tv_usec = 0; // as is number of microseconds.
timer.it_value = timer.it_interval;
setitimer(ITIMER_PROF, &timer, NULL);

setitimer tells the system to deliver a SIGPROF at the interval specified by the timer. The system will wait for the timer to go off, deliver a SIGPROF, and repeat until the process dies or the itimer is replaced.

At this point, your signal handler is complete!

Or...?

Well, it turns out that it isn't that simple (and bear in mind that we haven't even gotten to the Java part, yet). There is the issue of threading. The signal is handled in-process. So which thread actually handles it? Well, it turns out that this is not well-specified.

In Linux 2.4 and earlier, each thread was a separate process. Because of this, you need to set up a separate timer for every thread that you want to be profiled. In other words, it is more-or-less completely unusable unless you have a very tightly controlled environment.

In Linux 2.6, a randomly chosen, currently executing thread is picked to execute the signal handler. This is awfully useful, considering that what we want to do is find out what the currently executing threads are doing.

There are a lot of pitfalls here, so proceed with caution. For example, in C code, you can't really grab an arbitrary lock in a signal handler, because the thread executing might already have that lock; because most C lock implementation aren't reentrant, this can cause serious problems.

There is also the issue of what happens if you only want some threads to be able to handle the signal, and you want to disable it for the rest. They invented pthread_sigmask() for this purpose. If you want to disable the handling of a signal altogether, you can use sigprocmask().

What's the point of all of this?

The original question asked me what exactly happened when the timer went off and the signal got raised by the OS. He thought that perhaps it raised a Java exception. It doesn't. It just delivers the signal to the process. So I wrote a C++ handler that called AsyncGetCallTrace, and loaded it in the Agent_OnLoad part of a JVMTI agent.

No comments: