Monday, July 21, 2008

Immutability in Java, Part 3: Deserialization and Reflection

This time, I'll talk about deserialization, immutability and final fields. I'll try to be a little shorter.

I spent the last two posts on immutability (here and here) talking about what it takes to make an object's fields immutable. This advice mostly boiled down to "it has to be final and set before the constructor ends".

When you have deserialization, though, the fields can't be set before the constructor ends. An instance object is constructed using the default constructor for the object, and the fields have to be filled in later. We gave deserialization special semantics, so that when an object was constructed this way, it would behave as if it had been constructed by an ordinary constructor. No problems there.

The problems come when and if you need to write custom deserialization (using, for example, the readObject method of ObjectInputStream). You can do this using reflection:
Class cl = // get the class instance
Field f = cl.getDeclaredField("myFinalField");
f.setAccessible(true);
f.set(enclosingObject, newValue);
Note that you need setAccessible to change final fields using reflection.

ETA: Jared Levy points out that you should probably use Field.getDeclaredField if the field isn't public. The original code used Field.getField(). Thanks to him for pointing that out.

In the last post, I mentioned that something very like a "freeze" action happens at the end of the constructor; as long as your final fields (and everything reachable from them) are fully constructed by the end of that, your data are immutable. There is a similar "freeze" action after you modify final fields using reflection; as long as your final fields (and everything reachable from them) are fully constructed as of when you change them via reflection, your data are immutable. This means that if you have data you want your final field to point to, and you want other threads to see, they all have to be constructed by the time you set the final field. See the previous postings for examples of this.

The other restrictions on immutability apply, too. You can't let another thread see the object before the final field is set via reflection, for example. See the previous blog postings for more on this, too.

This is slightly different for static final fields, since they are often completely removed at javac compile time. You can't change static final fields after class initialization. Well, almost. You can change System.out, System.in and System.err using System.setOut(), System.setIn() and System.setErr(). We felt very dirty after writing these rules.

By the way, the general treatment of static final fields was a big mistake in the original spec. The compiler (javac) is forced to inline static final compile time constants everywhere it can. They can even be inlined into other classes. This means that you can compile a class A with a static final int x set to 1, compile another class B that prints A.x, recompile the first class with x set to 2, and then run the program and have class B print out "1" instead of "2". Weird and bogus. However, we are stuck with it.

Monday, July 14, 2008

Immutability in Java, Part 2

I'd like to talk a little more about what it takes to ensure thread-safe immutability in Java, following on from a (semi)recent post I made on the subject.

The basic gist of that post was that if you make data immutable, then they can be shared between threads without additional synchronization. I call this "thread-safe immutability". In that post, I said this:

Now, in common parlance, immutability means "does not change". Immutability doesn't mean "does not change" in Java. It means "is transitively reachable from a final field, has not changed since the final field was set, and a reference to the object containing the final field did not escape the constructor".

I just wanted to go over these points again, because I get a lot of questions about them and there are a couple of things I glossed over. Let's take them one by one. Immutability means...
  1. the object is transitively reachable from a final field

    What this means is that you can reach the object or value by following references from the final field. Let's say we're making an immutable String class, like the one in the Java libraries. We might have something like this (notice I'm not checking for null pointers in the example code. Go ahead, make me care.):
    class String {
    private final byte[] bytes;
    public String(byte[] value) {
    byte[] newArray = new byte[value.length];
    System.arraycopy(value, 0, newArray,
    0, value.length);
    bytes = newArray;
    }
    }
    The thing we are storing in our String is a final reference to the byte array, not the array itself. The contents of the byte array can be changed, according to the semantics of Java, but what is important, for the purposes of thread-safety, is that they are reachable by following a reference from the final field.

    ETA: So, apparently, I wasn't clear enough on this. Let's try again.

    Transitively reachable means that you can get from the final field to the data you want to be immutable by following zero or more references. So, if you had:
    class String {
    final byte[] bytes;
    }
    The array of bytes is reachable from the bytes field. If you have:
    class String {
    static class BytesHolder {
    byte[] bytes;
    }
    final BytesHolder holder;
    }
    The array of bytes is still transitively reachable from the bytes field. In fact, you can add as many layers of indirection as you want — as long as you can get there by following references from the field, it is still transitively reachable from the field.

  2. the object has not changed since the final field was set

    This one is simple, right? Here's the String class again, but the bytes are no longer immutable:
    class String {
    private final byte[] bytes;
    public String(byte[] value) {
    byte[] newArray = new byte[value.length];
    System.arraycopy(value, 0, newArray,
    0, value.length);
    bytes = newArray;
    }
    public setZerothElt(byte b) {
    byte[0] = b;
    }
    }
    Well, now, obviously that's not thread-safe immutable. If two threads call setZerothElt(), then all sorts of bad thread horseplay can happen.

    However, I did mislead you a little here. I said that, if you want the data to be thread-safe immutable, it should not be changed after the final field gets set. It turns out that the immutable data can be changed after the final field is set. It just can't be changed after the constructor ends:
    class String {
    // Absolutely fine.
    private final byte[] bytes;
    public String(byte[] value) {
    bytes = new byte[value.length];
    System.arraycopy(value, 0,
    bytes, 0, value.length);
    }
    }
    You can think of the end of the constructor as being something like a "freeze" operation. The data reachable by the object's final fields must be frozen by the end of the constructor. When figuring out how this should work, we made this decision because it is a big pain in the neck to try to get everything done by the time you actually set the final field, and besides, the semantics were simpler this way.

  3. A reference to the object containing the final field did not escape the constructor

    "Escaping" the constructor simply means that when you were inside the constructor, you stored a reference to the object under construction somewhere where another thread could get at it.

    In some ways, this is kind of a nasty restriction. There are all sorts of classes that let a reference to the object escape the constructor. Swing has lots of examples of this (I think — I don't feel like looking it up). Here's an example where our String class tries to keep track of all of its instances:
    class String {
    // Don't do this.
    static Set<String> allStrings =
    Collections.synchronizedSet(new HashSet());
    private final byte[] bytes;
    public String(byte[] value) {
    allStrings.add(this);
    bytes = new byte[value.length];
    System.arraycopy(value, 0,
    bytes, 0, value.length);
    }
    }
    It's pretty clear that another thread could come along and read the object being constructed out of allStrings before it is done being constructed. So, there's no free ride there.

    But this one gets nastier, too! Let's take an example that looks correct at first blush, but isn't:
    class String {
    // Don't do this, either.
    static String lastConstructed;
    private final byte[] bytes;
    public String(byte[] value) {
    bytes = new byte[value.length];
    System.arraycopy(value, 0,
    bytes, 0, value.length);
    lastConstructed = this;
    }
    }
    It looks as if, when you read lastConstructed, it will always give you the last String instance that was constructed. Don't be fooled. Even if this does give you the last String instance that was constructed (and it may or may not, but that's besides the point), it is quite possible for the reader thread to see uninitialized data. The compiler can take the assignment to lastConstructed, and move it to before the assignment to bytes. So, you have to avoid allowing references to objects to escape constructors. (To preempt questions — yes, this example would work if you made lastConstructed volatile.).
A couple of more points, too:
  • The byte array has to be reachable from the final field by the end of the constructor. If I had bytes point to a container object, and then made the container object point to the array after the constructor finished, then I wouldn't be able to claim it was threadsafe.
    // The bytes reachable from bytesHolder *after* later()
    // is called are not thread-safe immutable.
    class String {
    static class BytesHolder {
    byte[] bytes;
    }
    final BytesHolder bytesHolder;
    public String() {
    bytesHolder = new BytesHolder();
    }
    void later(byte[] value) {
    byte[] newArray = new byte[value.length];
    System.arraycopy(value, 0,
    bytes, 0, value.length);
    bytesHolder.bytes = newArray;
    }
    }
    This, of course, violates the rule that the byte array is set before the constructor finishes — but even if you don't violate that rule, you don't get the thread-safety guarantee:
    // The bytes reachable from bytesHolder *after*
    // later() is called are still not thread-safe immutable.
    class String {
    static class BytesHolder {
    byte[] bytes;
    }
    final BytesHolder bytesHolder;
    byte[] bytes;
    public String(byte[] value) {
    bytesHolder = new BytesHolder();
    bytes = new byte[value.length];
    System.arraycopy(value, 0,
    bytes, 0, value.length);
    }
    void later(byte[] value) {
    bytesHolder.bytes = bytes;
    }
    }
  • The other point is that you can't just acquire these guarantees by waiting an inordinately long time. Here's another example of an instance escaping its constructor:
    class String {
    // Don't do this, either.
    static String firstInstance;
    private final byte[] bytes;
    public String(byte[] value) {
    bytes = new byte[value.length];
    System.arraycopy(value, 0,
    bytes, 0, value.length);
    if (firstInstance == null) {
    firstInstance = this;
    }
    }
    public static String getFirst() {
    String instance = firstInstance;
    if (instance == null) return null;
    // Wait a while for the constructor to end...
    try {
    Thread.sleep(1000)
    } catch (InterruptedException e) {}
    return instance;
    }
    }
    (Obviously, your code wouldn't look like this; this is just an example) The fact of the matter is that without using the guarantees of final fields or some form of explicit synchronization, there is no guarantee that a thread that reads the version that escapes the constructor will ever see the correct version. This can be accomplished in various ways, all of which have to do with sneaky compiler tricks.
There is, of course, far more to say about immutability. At some point, I should talk about its relationship with serialization and reflection. And best practices for making immutable objects when values aren't available for the constructor. And best practices for making immutable data types (like Sets and Maps). I think I've done enough typing for one evening, though. :)

If you liked this post, read the followup: Immutability in Java, Part 3.