Sunday, January 31, 2010

Garbage Collection, References, Finalizers and the Memory Model (Part 1)

A little while ago, I got asked a question about when an object is allowed to be collected. It turns out that objects can be collected sooner than you think. In this entry, I'll talk a little about that.

When we were formulating the memory model, this question came up with finalizers. Finalizers run in separate threads (usually they run in a dedicated finalizer thread). As a result, we had to worry about memory model effects. The basic question we had to answer was, what writes are the finalizers guaranteed to see? (If that doesn't sound like an interesting question, you should either go read my blog entry on volatiles or admit to yourself that this is not a blog in which you have much interest).

Let's start with a mini-puzzler. A brief digression: I'm calling it a mini-puzzler because in general, for puzzlers, if you actually run them, you will get weird behavior. In this case, you probably won't see the weird behavior. But the weird behavior is perfectly legal Java behavior. That's the problem with multithreading and the memory model — you really never know what the results of a program will be from doing something as distasteful as actually running it.

Anyway, suppose you have a class that looks like this:

class FinalizableObject {
int i; // set in the constructor
int j; // set by the setter, below
static int k; // set by direct access
public FinalizableObject(int i) {
this.i = i;
}
public void setJ(int j) {
this.j = j;
}
public void finalize() {
System.out.println(i + " " + j + " " + k);
}
}

And then you use this class, thus:

void f() {
FinalizableObject fo = new FinalizableObject(1);
fo.setJ(2);
FinalizableObject.k = 3;
}

Let's say that by some miracle the finalizer actually runs (Rule 1 of why you don't use finalizers: they are not guaranteed to run in a timely fashion, or, in fact, at all). What do you think the program is guaranteed to print?

Those of you who are used to reading these entries will realize immediately that unless they actually already know the answer, they have no idea. Let's try to reason it out, then.

First, we notice that the object reference fo is live on the stack when all three variables are set. So, the object shouldn't get garbage collected, right? The finalizer should print out 1 2 3, yes?

Would I have asked if that were the answer?

It turns out that the VM, as usual, is going to play some tricks here. In the words of the JLS, optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. What this means is that the VM is going to make your object garbage sooner than you think.

The VM can do a few things to effect this (yes, this is the correct spelling of effect). First, it can notice that the
object is never used after the call to setJ, and null out the reference to fo immediately after that. It's reasonably clear that if the finalizer ran immediately after that, you would see 1 2 0.

That's not the end of it, though. The VM can notice that:

  • This thread isn't using the value written by that write to j, and

  • There is no evidence that synchronization will make this write visible to another thread.

The VM can then decide that the write to j is redundant, and eliminate it that write altogether. Woosh! You get 1 0 0.

At this point, you are probably expecting me to say that you can also get 0 0 0, because the programmer isn't actually using the write to i, either. As a matter of fact, I'm not going to say that. It turns out that the end of an object's constructor happens-before the execution of its finalize method. In practice, what this means is that any writes that occur in the constructor must be finished and visible to any reads of the same variable in the finalizer, just as if those variables were volatile. This paragraph originally read incorrectly

The immediate question is, how does the programmer avoid this insanity? The answer is: don't use finalization!

Okay, that's not enough of an answer. Sometimes you need to use finalization. There's a hint several paragraphs up. The finalizer takes place in a separate thread. It turns out that what you need to do is — exactly what you would do to make the code thread-safe. Let's do that, and look at the code again.

class FinalizableObject {
static final Object lockObject = new Object();
int i; // set in the constructor
int j; // set by the setter, below
static int k; // set by direct access
public FinalizableObject(int i) {
this.i = i;
}
public void setJ(int j) {
this.j = j;
}
public void finalize() {
synchronized (lockObject) {
System.out.println(i + " " + j + " " + k);
}
}
}

And then you use this class, thus:

void f() {
synchronized (lockObject) {
FinalizableObject fo = new FinalizableObject(1);
fo.setJ(2);
FinalizableObject.k = 3;
}
}

The finalizer is now guaranteed not to execute until all of the fields are set. When that sucker runs, you will see 1 2 3.

Oddly, I've been writing for almost an hour and I haven't gotten to my coworker's question yet. In the interests of brevity, I'll make this a series. More later.

Wednesday, January 6, 2010

A Note on the Thread (Un)safety of Format Classes

I recently got a note on another blog post asking this question:

I have a general question on the thread safety and this is not directly related with your blog. I would appreciate if you could post it on your blog. I have a class that has only one static method that accepts two string parameters and does some operation locally (as shown below). Do I need to synchronize this method?

public class A {
  public static boolean getDate(String str, String str2){
    SimpleDateFormat formatter =
      new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
    boolean isBefore = false;
    Date date1;
    Date date2;
    try {
      date1 = formatter.parse(str);
      date2 = formatter.parse(str);
      isBefore = date1.after(date2);
    } catch (Exception e) {
      e.getMessage();
    }
    return isBefore;
  }
}


Since it had nothing to do with the blog post in question, I'll answer it here.

The questioner is clearly asking about the well-documented lack of thread safety of the Format classes. This is a long-standing bug at Sun. If you create a Format object (or a MessageFormat, NumberFormat, DecimalFormat, ChoiceFormat, DateFormat or SimpleDateFormat object), it cannot be shared among threads. The above code does not share its SimpleDateFormat object among threads, so it is safe.

If the formatter field had been a static field of the class, the method would not be thread-safe. However, this way, you have to create a (relatively) expensive SimpleDateFormat object every time you invoke the method. There are many ways around this. One answer (if you want to avoid locking) is to use a ThreadLocal:

private static final ThreadLocal formatters =
new ThreadLocal() {
@Override public SimpleDateFormat initialValue() {
return new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
}
};
public static boolean getDate(String str, String str2) {
SimpleDateFormat formatter = formatters.get();
...
The even better answer is to use the thread-safe Joda time libraries, instead. I hate to recommend using a non-standard library when there is an equivalent standard one, but this Format behavior is pretty broken.