Friday, August 27, 2010

JavaOne Talk Cancelled

Because I am a Google employee, my talks are not happening. Apologies to any interested parties.

I'm going to try to give the talks in another venue, or at least to use the blog as a venue for some of the topics I wanted to cover.

Monday, August 2, 2010

Why Many Profilers have Serious Problems (More on Profiling with Signals)

This is a followon to a blog post I made in 2007 about using signal handling to profile Java apps. I mentioned in another post why this might be a good idea, but I wanted to expand on the theme.

Hiroshi Yamauchi and I are giving a talk at this year's JavaOne. Part of this talk will be about our experiences writing profilers at Google, and in preparing for it, I realized my previous entry on this subject only told half the story.

The topic of the original post is that there is an undocumented JVMTI call in OpenJDK that allows you to get a stack trace from a running thread, regardless of the state of that thread. In Unix-like systems, you can use a SIGPROF signal to call this function at (semi-)regular intervals, without having to do anything to your code. The followon post, intended to describe why you would use this, described a couple of things that are true:

  • That it tells you what is actually running on your CPU, not just what might be scheduled, which is what profilers that take a snapshot of every running thread do. This increases its accuracy dramatically. For example, there are many profilers that report a great deal of your time spent in blocking mechanisms, simply because threads frequently become scheduled as they exit blocking functions. However, those sorts of threads are all "ready to be scheduled" rather than "actually running". As a result, CPU time is charged to those threads inaccurately.

  • That it reports time spent in GC and in other VM-specific activities (like JIT compiling). At Google, we find that substantial amounts of our users' CPU time is spent in VM activities. In fact, GC brings the occasional Google service to its knees (not mentioning any names).

I now realize that I left all of the JIT compiler-related reasons out of that post. This post is an attempt to repair that problem, and describe why profiling with AsyncGetCallTrace and signals is a better approach than the typical state of the art.

Sampling profilers that take thread dumps from individual threads often do so by injecting thread stack trace gathering code into your code. This is suboptimal in many ways:

  • It introduces overhead into your code. This is the most obvious way. As soon as you introduce overhead, you are changing the performance characteristics.

  • Changing the size of the code changes the optimizing decisions made by the JIT. For example, one of the most important performance improvements made by a just-in-time compiler is when it inlines small methods. It won't inline if the method gets too large. Introducing sampling into the methods changes the size, and therefore affects inlining decisions, changing the performance characteristics of your code.

  • Changing the size of the code affects code layout. For relatively obvious reasons of caching and memory alignment, the placement of your code can affect its performance. Bigger code often means worse performance.

At this point, you might be thinking that the best thing to do is use something like Thread.getAllStackTraces and take your chances with its inaccuracy (which is what the aforementioned sampling profilers do). However, there is one more factor, which is significant in any of the documented and built-in stack trace sampling methods - that system-wide stack trace sampling happens for any given thread when it is at a safe point. Safe points are places in the code that the VM knows it can do a whole host of things - like initiate garbage collection - safely. The location of these safe points is determined by the JIT. It often puts them in places that aren't ideal for CPU profiling. For example, there may be a hot loop in your code that the JIT decides should not be interrupted by a safe point. If you use most standard profilers, this hot loop will never get profiled! As a result, the placement of safe points affects the sampling quality of standard sampling profiling techniques.

(Another interesting point, as made by Todd Mytkowicz and Amer Diwan of the University of Colorado: since JIT behavior really depends on everything in the system, and most profilers are part of the system, the decision about where to put a safe point will end up depending on which profiler you are using. This can make the results of the profilers clash violently: because of their differing behaviors, the safe points end up in different places, and the profilers end up tracking different places. See Mytkowicz and Diwan's recent PLDI paper for details.)

All of this JIT talk basically mirrors the Heisenberg Uncertainty Principle, but for profiling. You can either have exact information, or you can have your code behave as it is supposed to behave, but not both. You need to minimize the effects, so you want a profiler that doesn't interfere with or depend on JIT decisions at all. AsyncGetCallTrace fits this bill - you call it, and it returns a result. You don't call it directly from your code. It doesn't wait for a safe point. It doesn't change your code. The JIT doesn't care.

ETA: I believe that the profiler bundled with Sun Studio uses the AsyncGetCallTrace method call, but I'm not exactly sure how.

Sunday, February 7, 2010

Garbage Collection, [Soft]References, Finalizers and the Memory Model (Part 2)

In which I ramble on at greater length about finalizers and multithreading, as well as applying that discussion to SoftReferences. Read Part 1 first. It explains why finalizers are more multithreaded than you think, and why they, like all multithreaded code, aren't necessarily going to do what you think they should do.

I mentioned in the last entry that a co-worker asked me a question about when objects are allowed to be collected. My questioner had been arguing with his colleagues about the following example:

Object f() {
Object o = new Object();
SoftReference<Object> ref = new SoftReference<Object>(o);
return ref.get();
}
He wanted to know whether f() was guaranteed to return o, or whether the new Object could be collected before f() returned. The principle he was operating under was that because the object reference was still on the stack, the object would not be collected. Those of you who read the last entry will know better, of course. To put it in the words of the Java Language Specification, optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable.

In practice, what this means is that the VM can:
  1. Decide the variable o is no longer used after the last time it appears in the program,

  2. Clear the SoftReference immediately after it is constructed, and

  3. Return null

Those of you who read my entry on how Hotspot decides to clear SoftReferences will know that this is extremely unlikely in Hotspot, where a SoftReference always survives for at least one collection. But in other VMs? I have no idea.

Anyway, the obvious question is: how do you prevent the collector from collecting this object? There are a few ways:

  1. The obvious answer is to return o instead of ref.get() (this was harder in the original code, which I've simplified to make my point).

  2. Another way is way is to make o reachable from a static field. That makes the object rather obviously reachable.

  3. A third way is to give the object a finalizer, and do synchronization in that finalizer, and then do synchronization on that
    object when you want the object to stay alive.

That last one probably requires a few more words. We wrote that into the spec so that the Finalizer Guardian idiom would work. The Finalizer Guardian idiom is described in Josh Bloch's invaluable book Effective Java (lgt amazon. Seriously, if you don't have a copy, go get one).

Anyway, we put a nice little exegesis on the link between the Finalizer Guardian and this rule into the language spec, so just go ahead and read it there.

Having said all of that, I will agree that this is a massive pain. It would be nice if keeping objects reachable-and-not-collected were a little easier. Doug Lea has proposed a method in his new Fences API called reachabilityFence. This method would keep an object reachable (in a garbage collection sense) until execution reached that point in the code. I'm on the fence about the Fences API, but I think that this particular method (which he alternatively called keepAlive) would be a useful tool.

Sunday, January 31, 2010

Garbage Collection, References, Finalizers and the Memory Model (Part 1)

A little while ago, I got asked a question about when an object is allowed to be collected. It turns out that objects can be collected sooner than you think. In this entry, I'll talk a little about that.

When we were formulating the memory model, this question came up with finalizers. Finalizers run in separate threads (usually they run in a dedicated finalizer thread). As a result, we had to worry about memory model effects. The basic question we had to answer was, what writes are the finalizers guaranteed to see? (If that doesn't sound like an interesting question, you should either go read my blog entry on volatiles or admit to yourself that this is not a blog in which you have much interest).

Let's start with a mini-puzzler. A brief digression: I'm calling it a mini-puzzler because in general, for puzzlers, if you actually run them, you will get weird behavior. In this case, you probably won't see the weird behavior. But the weird behavior is perfectly legal Java behavior. That's the problem with multithreading and the memory model — you really never know what the results of a program will be from doing something as distasteful as actually running it.

Anyway, suppose you have a class that looks like this:

class FinalizableObject {
int i; // set in the constructor
int j; // set by the setter, below
static int k; // set by direct access
public FinalizableObject(int i) {
this.i = i;
}
public void setJ(int j) {
this.j = j;
}
public void finalize() {
System.out.println(i + " " + j + " " + k);
}
}

And then you use this class, thus:

void f() {
FinalizableObject fo = new FinalizableObject(1);
fo.setJ(2);
FinalizableObject.k = 3;
}

Let's say that by some miracle the finalizer actually runs (Rule 1 of why you don't use finalizers: they are not guaranteed to run in a timely fashion, or, in fact, at all). What do you think the program is guaranteed to print?

Those of you who are used to reading these entries will realize immediately that unless they actually already know the answer, they have no idea. Let's try to reason it out, then.

First, we notice that the object reference fo is live on the stack when all three variables are set. So, the object shouldn't get garbage collected, right? The finalizer should print out 1 2 3, yes?

Would I have asked if that were the answer?

It turns out that the VM, as usual, is going to play some tricks here. In the words of the JLS, optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. What this means is that the VM is going to make your object garbage sooner than you think.

The VM can do a few things to effect this (yes, this is the correct spelling of effect). First, it can notice that the
object is never used after the call to setJ, and null out the reference to fo immediately after that. It's reasonably clear that if the finalizer ran immediately after that, you would see 1 2 0.

That's not the end of it, though. The VM can notice that:

  • This thread isn't using the value written by that write to j, and

  • There is no evidence that synchronization will make this write visible to another thread.

The VM can then decide that the write to j is redundant, and eliminate it that write altogether. Woosh! You get 1 0 0.

At this point, you are probably expecting me to say that you can also get 0 0 0, because the programmer isn't actually using the write to i, either. As a matter of fact, I'm not going to say that. It turns out that the end of an object's constructor happens-before the execution of its finalize method. In practice, what this means is that any writes that occur in the constructor must be finished and visible to any reads of the same variable in the finalizer, just as if those variables were volatile. This paragraph originally read incorrectly

The immediate question is, how does the programmer avoid this insanity? The answer is: don't use finalization!

Okay, that's not enough of an answer. Sometimes you need to use finalization. There's a hint several paragraphs up. The finalizer takes place in a separate thread. It turns out that what you need to do is — exactly what you would do to make the code thread-safe. Let's do that, and look at the code again.

class FinalizableObject {
static final Object lockObject = new Object();
int i; // set in the constructor
int j; // set by the setter, below
static int k; // set by direct access
public FinalizableObject(int i) {
this.i = i;
}
public void setJ(int j) {
this.j = j;
}
public void finalize() {
synchronized (lockObject) {
System.out.println(i + " " + j + " " + k);
}
}
}

And then you use this class, thus:

void f() {
synchronized (lockObject) {
FinalizableObject fo = new FinalizableObject(1);
fo.setJ(2);
FinalizableObject.k = 3;
}
}

The finalizer is now guaranteed not to execute until all of the fields are set. When that sucker runs, you will see 1 2 3.

Oddly, I've been writing for almost an hour and I haven't gotten to my coworker's question yet. In the interests of brevity, I'll make this a series. More later.

Wednesday, January 6, 2010

A Note on the Thread (Un)safety of Format Classes

I recently got a note on another blog post asking this question:

I have a general question on the thread safety and this is not directly related with your blog. I would appreciate if you could post it on your blog. I have a class that has only one static method that accepts two string parameters and does some operation locally (as shown below). Do I need to synchronize this method?

public class A {
  public static boolean getDate(String str, String str2){
    SimpleDateFormat formatter =
      new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
    boolean isBefore = false;
    Date date1;
    Date date2;
    try {
      date1 = formatter.parse(str);
      date2 = formatter.parse(str);
      isBefore = date1.after(date2);
    } catch (Exception e) {
      e.getMessage();
    }
    return isBefore;
  }
}


Since it had nothing to do with the blog post in question, I'll answer it here.

The questioner is clearly asking about the well-documented lack of thread safety of the Format classes. This is a long-standing bug at Sun. If you create a Format object (or a MessageFormat, NumberFormat, DecimalFormat, ChoiceFormat, DateFormat or SimpleDateFormat object), it cannot be shared among threads. The above code does not share its SimpleDateFormat object among threads, so it is safe.

If the formatter field had been a static field of the class, the method would not be thread-safe. However, this way, you have to create a (relatively) expensive SimpleDateFormat object every time you invoke the method. There are many ways around this. One answer (if you want to avoid locking) is to use a ThreadLocal:

private static final ThreadLocal formatters =
new ThreadLocal() {
@Override public SimpleDateFormat initialValue() {
return new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
}
};
public static boolean getDate(String str, String str2) {
SimpleDateFormat formatter = formatters.get();
...
The even better answer is to use the thread-safe Joda time libraries, instead. I hate to recommend using a non-standard library when there is an equivalent standard one, but this Format behavior is pretty broken.