Why are C# 3.0 object initializer constructor parentheses optional?

It seems that the C# 3.0 object initializer syntax allows one to exclude the open/close pair of parentheses in the constructor when there is a parameterless constructor existing. Example:

var x = new XTypeName { PropA = value, PropB = value };

As opposed to:

var x = new XTypeName() { PropA = value, PropB = value };

I'm curious why the constructor open/close parentheses pair is optional here after XTypeName?

17517 次浏览

I am no Eric Lippert, so I can't say for sure, but I would assume it is because the empty parenthesis is not needed by the compiler in order to infer the initialization construct. Therefore it becomes redundant information, and not needed.

Because that's how the language was specified. They add no value, so why include them?

It's also very similar to implicity typed arrays

var a = new[] { 1, 10, 100, 1000 };            // int[]
var b = new[] { 1, 1.5, 2, 2.5 };            // double[]
var c = new[] { "hello", null, "world" };      // string[]
var d = new[] { 1, "one", 2, "two" };         // Error

Reference: http://msdn.microsoft.com/en-us/library/ms364047%28VS.80%29.aspx

This was done to simplify the construction of objects. The language designers have not (to my knowledge) specifically said why they felt that this was useful, though it is explicitly mentioned in the C# Version 3.0 Specification page:

An object creation expression can omit the constructor argument list and enclosing parentheses, provided it includes an object or collection initializer. Omitting the constructor argument list and enclosing parentheses is equivalent to specifying an empty argument list.

I suppose that they felt the parenthesis, in this instance, were not necessary in order to show developer intent, since the object initializer shows the intent to construct and set the properties of the object instead.

In your first example, the compiler infers that you're calling the default constructor (the C# 3.0 Language Specification states that if no parenthesis are provided, the default constructor is called).

In the second, you explicitly call the default constructor.

You can also use that syntax to set properties while explicitly passing values to the constructor. If you had the following class definition:

public class SomeTest
{
public string Value { get; private set; }
public string AnotherValue { get; set; }
public string YetAnotherValue { get; set;}


public SomeTest() { }


public SomeTest(string value)
{
Value = value;
}
}

All three statements are valid:

var obj = new SomeTest { AnotherValue = "Hello", YetAnotherValue = "World" };
var obj = new SomeTest() { AnotherValue = "Hello", YetAnotherValue = "World"};
var obj = new SomeTest("Hello") { AnotherValue = "World", YetAnotherValue = "!"};

This question was the subject of my blog on September 20th 2010. Josh and Chad's answers ("they add no value so why require them?" and "to eliminate redundancy") are basically correct. To flesh that out a bit more:

The feature of allowing you to elide the argument list as part of the "larger feature" of object initializers met our bar for "sugary" features. Some points we considered:

  • the design and specification cost was low
  • we were going to be extensively changing the parser code that handles object creation anyway; the additional development cost of making the parameter list optional was not large compared to the cost of the larger feature
  • the testing burden was relatively small compared to the cost of the larger feature
  • the documentation burden was relatively small compared...
  • the maintenance burden was anticipated to be small; I don't recall any bugs reported in this feature in the years since it shipped.
  • the feature does not pose any immediately obvious risks to future features in this area. (The last thing we want to do is make a cheap, easy feature now that makes it much harder to implement a more compelling feature in the future.)
  • the feature adds no new ambiguities to the lexical, grammatical or semantic analysis of the language. It poses no problems for the sort of "partial program" analysis that is performed by the IDE's "IntelliSense" engine while you are typing. And so on.
  • the feature hits a common "sweet spot" for the larger object initialization feature; typically if you are using an object initializer it is precisely because the constructor of the object does not allow you to set the properties you want. It is very common for such objects to simply be "property bags" that have no parameters in the ctor in the first place.

Why then did you not also make empty parentheses optional in the default constructor call of an object creation expression that does not have an object initializer?

Take another look at that list of criteria above. One of them is that the change does not introduce any new ambiguity in the lexical, grammatical or semantic analysis of a program. Your proposed change does introduce a semantic analysis ambiguity:

class P
{
class B
{
public class M { }
}
class C : B
{
new public void M(){}
}
static void Main()
{
new C().M(); // 1
new C.M();   // 2
}
}

Line 1 creates a new C, calls the default constructor, and then calls the instance method M on the new object. Line 2 creates a new instance of B.M and calls its default constructor. If the parentheses on line 1 were optional then line 2 would be ambiguous. We would then have to come up with a rule resolving the ambiguity; we could not make it an error because that would then be a breaking change that changes an existing legal C# program into a broken program.

Therefore the rule would have to be very complicated: essentially that the parentheses are only optional in cases where they don't introduce ambiguities. We'd have to analyze all the possible cases that introduce ambiguities and then write code in the compiler to detect them.

In that light, go back and look at all the costs I mention. How many of them now become large? Complicated rules have large design, spec, development, testing and documentation costs. Complicated rules are much more likely to cause problems with unexpected interactions with features in the future.

All for what? A tiny customer benefit that adds no new representational power to the language, but does add crazy corner cases just waiting to yell "gotcha" at some poor unsuspecting soul who runs into it. Features like that get cut immediately and put on the "never do this" list.

How did you determine that particular ambiguity?

That one was immediately clear; I am pretty familiar with the rules in C# for determining when a dotted name is expected.

When considering a new feature how do you determine whether it causes any ambiguity? By hand, by formal proof, by machine analysis, what?

All three. Mostly we just look at the spec and noodle on it, as I did above. For example, suppose we wanted to add a new prefix operator to C# called "frob":

x = frob 123 + 456;

(UPDATE: frob is of course await; the analysis here is essentially the analysis that the design team went through when adding await.)

"frob" here is like "new" or "++" - it comes before an expression of some sort. We'd work out the desired precedence and associativity and so on, and then start asking questions like "what if the program already has a type, field, property, event, method, constant, or local called frob?" That would immediately lead to cases like:

frob x = 10;

does that mean "do the frob operation on the result of x = 10, or create a variable of type frob called x and assign 10 to it?" (Or, if frobbing produces a variable, it could be an assignment of 10 to frob x. After all, *x = 10; parses and is legal if x is int*.)

G(frob + x)

Does that mean "frob the result of the unary plus operator on x" or "add expression frob to x"?

And so on. To resolve these ambiguities we might introduce heuristics. When you say "var x = 10;" that's ambiguous; it could mean "infer the type of x" or it could mean "x is of type var". So we have a heuristic: we first attempt to look up a type named var, and only if one does not exist do we infer the type of x.

Or, we might change the syntax so that it is not ambiguous. When they designed C# 2.0 they had this problem:

yield(x);

Does that mean "yield x in an iterator" or "call the yield method with argument x?" By changing it to

yield return(x);

it is now unambiguous.

In the case of optional parens in an object initializer it is straightforward to reason about whether there are ambiguities introduced or not because the number of situations in which it is permissible to introduce something that starts with { is very small. Basically just various statement contexts, statement lambdas, array initializers and that's about it. It's easy to reason through all the cases and show that there's no ambiguity. Making sure the IDE stays efficient is somewhat harder but can be done without too much trouble.

This sort of fiddling around with the spec usually is sufficient. If it is a particularly tricky feature then we pull out heavier tools. For example, when designing LINQ, one of the compiler guys and one of the IDE guys who both have a background in parser theory built themselves a parser generator that could analyze grammars looking for ambiguities, and then fed proposed C# grammars for query comprehensions into it; doing so found many cases where queries were ambiguous.

Or, when we did advanced type inference on lambdas in C# 3.0 we wrote up our proposals and then sent them over the pond to Microsoft Research in Cambridge where the languages team there was good enough to work up a formal proof that the type inference proposal was theoretically sound.

Are there ambiguities in C# today?

Sure.

G(F<A, B>(0))

In C# 1 it is clear what that means. It's the same as:

G( (F<A), (B>0) )

That is, it calls G with two arguments that are bools. In C# 2, that could mean what it meant in C# 1, but it could also mean "pass 0 to the generic method F that takes type parameters A and B, and then pass the result of F to G". We added a complicated heuristic to the parser which determines which of the two cases you probably meant.

Similarly, casts are ambiguous even in C# 1.0:

G((T)-x)

Is that "cast -x to T" or "subtract x from T"? Again, we have a heuristic that makes a good guess.