Immutable objects that reference each other?

Today I was trying to wrap my head around immutable objects that reference each other. I came to the conclusion that you can't possibly do that without using lazy evaluation but in the process I wrote this (in my opinion) interesting code.

public class A
{
public string Name { get; private set; }
public B B { get; private set; }
public A()
{
B = new B(this);
Name = "test";
}
}


public class B
{
public A A { get; private set; }
public B(A a)
{
//a.Name is null
A = a;
}
}

What I find interesting is that I cannot think of another way to observe object of type A in a state that is not yet fully constructed and that includes threads. Why is this even valid? Are there any other ways to observe the state of an object that is not fully constructed?

5879 次浏览

If you consider the initialization order

  • Derived static fields
  • Derived static constructor
  • Derived instance fields
  • Base static fields
  • Base static constructor
  • Base instance fields
  • Base instance constructor
  • Derived instance constructor

clearly through up-casting you can access the class BEFORE the derived instance constructor is called (this is the reason you shouldn't use virtual methods from constructors. They could easily access derived fields not initialized by the constructor/the constructor in the derived class could not have brought the derived class in a "consistent" state)

The principle is that don't let your this object escape from the constructor body.

Another way to observe such problem is by calling virtual methods inside the constructor.

You can avoid the problem by instancing B last in your constuctor:

 public A()
{
Name = "test";
B = new B(this);
}

If what you suggest was not possible, then A would not be immutable.

Edit: fixed, thanks to leppie.

"Fully constructed" is defined by your code, not by the language.

This is a variation on calling a virtual method from the constructor,
the general guideline is: don't do that.

To correctly implement the notion of "fully constructed", don't pass this out of your constructor.

Indeed, leaking the this reference out during the constructor will allow you to do this; it may cause problems if methods get invoked on the incomplete object, obviously. As for "other ways to observe the state of an object that is not fully constructed":

  • invoke a virtual method in a constructor; the subclass constructor will not have been called yet, so an override may try to access incomplete state (fields declared or initialized in the subclass, etc)
  • reflection, perhaps using FormatterServices.GetUninitializedObject (which creates an object without calling the constructor at all)

Yes, this is the only way for two immutable objects to refer to each other - at least one of them must see the other in a not-fully-constructed way.

It's generally a bad idea to let this escape from your constructor but in cases where you're confident of what both constructors do, and it's the only alternative to mutability, I don't think it's too bad.

Why is this even valid?

Why do you expect it to be invalid?

Because a constructor is supposed to guarantee that the code it contains is executed before outside code can observe the state of the object.

Correct. But the compiler is not responsible for maintaining that invariant. You are. If you write code that breaks that invariant, and it hurts when you do that, then stop doing that.

Are there any other ways to observe the state of an object that is not fully constructed?

Sure. For reference types, all of them involve somehow passing "this" out of the constructor, obviously, since the only user code that holds the reference to the storage is the constructor. Some ways the constructor can leak "this" are:

  • Put "this" in a static field and reference it from another thread
  • make a method call or constructor call and pass "this" as an argument
  • make a virtual call -- particularly nasty if the virtual method is overridden by a derived class, because then it runs before the derived class ctor body runs.

I said that the only user code that holds a reference is the ctor, but of course the garbage collector also holds a reference. Therefore, another interesting way in which an object can be observed to be in a half-constructed state is if the object has a destructor, and the constructor throws an exception (or gets an asynchronous exception like a thread abort; more on that later.) In that case, the object is about to be dead and therefore needs to be finalized, but the finalizer thread can see the half-initialized state of the object. And now we are back in user code that can see the half-constructed object!

Destructors are required to be robust in the face of this scenario. A destructor must not depend on any invariant of the object set up by the constructor being maintained, because the object being destroyed might never have been fully constructed.

Another crazy way that a half-constructed object could be observed by outside code is of course if the destructor sees the half-initialized object in the scenario above, and then copies a reference to that object to a static field, thereby ensuring that the half-constructed, half-finalized object is rescued from death. Please do not do that. Like I said, if it hurts, don't do it.

If you're in the constructor of a value type then things are basically the same, but there are some small differences in the mechanism. The language requires that a constructor call on a value type creates a temporary variable that only the ctor has access to, mutate that variable, and then do a struct copy of the mutated value to the actual storage. That ensures that if the constructor throws, then the final storage is not in a half-mutated state.

Note that since struct copies are not guaranteed to be atomic, it is possible for another thread to see the storage in a half-mutated state; use locks correctly if you are in that situation. Also, it is possible for an asynchronous exception like a thread abort to be thrown halfway through a struct copy. These non-atomicity problems arise regardless of whether the copy is from a ctor temporary or a "regular" copy. And in general, very few invariants are maintained if there are asynchronous exceptions.

In practice, the C# compiler will optimize away the temporary allocation and copy if it can determine that there is no way for that scenario to arise. For example, if the new value is initializing a local that is not closed over by a lambda and not in an iterator block, then S s = new S(123); just mutates s directly.

For more information on how value type constructors work, see:

Debunking another myth about value types

And for more information on how C# language semantics try to save you from yourself, see:

Why Do Initializers Run In The Opposite Order As Constructors? Part One

Why Do Initializers Run In The Opposite Order As Constructors? Part Two

I seem to have strayed from the topic at hand. In a struct you can of course observe an object to be half-constructed in the same ways -- copy the half-constructed object to a static field, call a method with "this" as an argument, and so on. (Obviously calling a virtual method on a more derived type is not a problem with structs.) And, as I said, the copy from the temporary to the final storage is not atomic and therefore another thread can observe the half-copied struct.


Now let's consider the root cause of your question: how do you make immutable objects that reference each other?

Typically, as you've discovered, you don't. If you have two immutable objects that reference each other then logically they form a directed cyclic graph. You might consider simply building an immutable directed graph! Doing so is quite easy. An immutable directed graph consists of:

  • An immutable list of immutable nodes, each of which contains a value.
  • An immutable list of immutable node pairs, each of which has the start and end point of a graph edge.

Now the way you make nodes A and B "reference" each other is:

A = new Node("A");
B = new Node("B");
G = Graph.Empty.AddNode(A).AddNode(B).AddEdge(A, B).AddEdge(B, A);

And you're done, you've got a graph where A and B "reference" each other.

The problem, of course, is that you cannot get to B from A without having G in hand. Having that extra level of indirection might be unacceptable.

As noted, the compiler has no means of knowing at what point an object has been constructed well enough to be useful; it therefore assumes that a programmer who passes this from a constructor will know whether an object has been constructed well enough to satisfy his needs.

I would add, however, that for objects which are intended to be truly immutable, one must avoid passing this to any code which will examine the state of a field before it has been assigned its final value. This implies that this not be passed to arbitrary outside code, but does not imply that there is anything wrong with having an object under construction pass itself to another object for the purpose of storing a back-reference which will not actually be used until after the first constructor has completed.

If one were designing a language to facilitate the construction and use of immutable objects, it may be helpful for it to declare methods as being usable only during construction, only after construction, or either; fields could be declared as being non-dereferenceable during construction and read-only afterward; parameters could likewise be tagged to indicate that should be non-dereferenceable. Under such a system, it would be possible for a compiler to allow the construction of data structures which referred to each other, but where no property could ever change after it was observed. As to whether the benefits of such static checking would outweigh the cost, I'm not sure, but it might be interesting.

Incidentally, a related feature which would be helpful would be the ability to declare parameters and function returns as ephemeral, returnable, or (the default) persistable. If a parameter or function return were declared ephemeral, it could not be copied to any field nor passed as a persistable parameter to any method. Additionally, passing an ephemeral or returnable value as a returnable parameter to a method would cause the return value of the function to inherit the restrictions of that value (if a function has two returnable parameters, its return value would inherit the more restrictive constraint from its parameters). A major weakness with Java and .net is that all object references are promiscuous; once outside code gets its hands on one, there's no telling who may end up with it. If parameters could be declared ephemeral, it would more often be possible for code which held the only reference to something to know it held the only reference, and thus avoid needless defensive copy operations. Additionally, things like closures could be recycled if the compiler could know that no references to them existed after they returned.