Why is "null" present in C# and Java?

We noticed that lots of bugs in our software developed in C# (or Java) cause a NullReferenceException.

Is there a reason why "null" has even been included in the language?

After all, if there were no "null", I would have no bug, right?

In other words, what feature in the language couldn't work without null?

6871 次浏览

I can't speak to your specific issue, but it sounds like the problem isn't the existence of null. Null exists in databases, you need some way to account for that in the application level. I don't think that's the only reason it exists in .net, mind you. But I figure it's one of the reasons.

Null is an extremely powerful feature. What do you do if you have an absence of a value? It's NULL!

One school of thought is to never return null, another is to always. For example, some say you should return a valid but empty object.

I prefer null as to me it's a truer indication of what it actually is. If I can't retrieve an entity from my persistence layer, I want null. I don't want some empty value. But that's me.

It is especially handy with primitives. For example, if I have true or false, but it's used on a security form, where a permission can be Allow, Deny, or not set. Well, I want that not set to be null. So I can use bool?

There are a lot more I could go on about, but I will leave it there.

Null is an essential requirement of any OO language. Any object variable that hasn't been assigned an object reference has to be null.

Nullity is a natural consequence of reference types. If you have a reference, it has to refer to some object - or be null. If you were to prohibit nullity, you would always have to make sure that every variable was initialized with some non-null expression - and even then you'd have issues if variables were read during the initialization phase.

How would you propose removing the concept of nullity?

"Null" is included in the language because we have value types and reference types. It's probably a side effect, but a good one I think. It gives us a lot of power over how we manage memory effectively.

Why we have null? ...

Value types are stored on the "stack", their value sits directly in that piece of memory (i.e. int x = 5 means that that the memory location for that variable contains "5").

Reference types on the other hand have a "pointer" on the stack pointing to the actual value on the heap (i.e. string x = "ello" means that the memory block on the stack only contains an address pointing to the actual value on the heap).

A null value simply means that our value on the stack does not point to any actual value on the heap - it's an empty pointer.

Hope I explained that well enough.

Removing null wouldn't solve much. You would need to have a default reference for most variables that is set on init. Instead of null-reference exceptions you would get unexpected behaviour because the variable is pointing to the wrong objects. At least null-references fail fast instead of causing unexpected behaviour.

You can look at the null-object pattern for a way to solve part of this problem

Anders Hejlsberg, "C# father", just spoke about that point in his Computerworld interview:

For example, in the type system we do not have separation between value and reference types and nullability of types. This may sound a little wonky or a little technical, but in C# reference types can be null, such as strings, but value types cannot be null. It sure would be nice to have had non-nullable reference types, so you could declare that ‘this string can never be null, and I want you compiler to check that I can never hit a null pointer here’.

50% of the bugs that people run into today, coding with C# in our platform, and the same is true of Java for that matter, are probably null reference exceptions. If we had had a stronger type system that would allow you to say that ‘this parameter may never be null, and you compiler please check that at every call, by doing static analysis of the code’. Then we could have stamped out classes of bugs.

Cyrus Najmabadi, a former software design engineer on the C# team (now working at Google) discuss on that subject on his blog: (1st, 2nd, 3rd, 4th). It seems that the biggest hindrance to the adoption of non-nullable types is that notation would disturb programmers’ habits and code base. Something like 70% of references of C# programs are likely to end-up as non-nullable ones.

If you really want to have non-nullable reference type in C# you should try to use Spec# which is a C# extension that allow the use of "!" as a non-nullable sign.

static string AcceptNotNullObject(object! s)
{
return s.ToString();
}

Null in C# is mostly a carry-over from C++, which had pointers that didn't point to anything in memory (or rather, adress 0x00). In this interview, Anders Hejlsberg says that he would've like to have added non-nullable reference types in C#.

Null also has a legitimate place in a type system, however, as something akin to the bottom type (where object is the top type). In lisp, the bottom type is NIL and in Scala it is Nothing.

It would've been possible to design C# without any nulls but then you'd have to come up with an acceptable solution for the usages that people usually have for null, such as unitialized-value, not-found, default-value, undefined-value, and None<T>. There would've probably been less adoption amongst C++ and Java programmers if they did succeed in that anyhow. At least until they saw that C# programs never had any null pointer exceptions.

If you create an object with an instance variable being a reference to some object, what value would you suggest has this variable before you assigned any object reference to it?

There are situations in which null is a nice way to signify that a reference has not been initialized. This is important in some scenarios.

For instance:

MyResource resource;
try
{
resource = new MyResource();
//
// Do some work
//
}
finally
{
if (resource != null)
resource.Close();
}

This is in most cases accomplished by the use of a using statement. But the pattern is still widely used.

With regards to your NullReferenceException, the cause of such errors are often easy to reduce by implementing a coding standard where all parameters a checked for validity. Depending on the nature of the project I find that in most cases it's enough to check parameters on exposed members. If the parameters are not within the expected range an ArgumentException of some kind is thrown, or a error result is returned, depending on the error handling pattern in use.

The parameter checking does not in itself remove bugs, but any bugs that occur are easier to locate and correct during the testing phase.

As a note, Anders Hejlsberg has mentioned the lack of non-null enforcement as one of the biggest mistakes in the C# 1.0 specification and that including it now is "difficult".

If you still think that a statically enforced non-null reference value is of great importance you could check out the spec# language. It is an extension of C# where non-null references are part of the language. This ensures that a reference marked as non-null can never be assigned a null reference.

Null as it is available in C#/C++/Java/Ruby is best seen as an oddity of some obscure past (Algol) that somehow survived to this day.

You use it in two ways:

  • To declare references without initializing them (bad).
  • To denote optionality (OK).

As you guessed, 1) is what causes us endless trouble in common imperative languages and should have been banned long ago, 2) is the true essential feature.

There are languages out there that avoid 1) without preventing 2).

For example OCaml is such a language.

A simple function returning an ever incrementing integer starting from 1:

let counter = ref 0;;
let next_counter_value () = (counter := !counter + 1; !counter);;

And regarding optionality:

type distributed_computation_result = NotYetAvailable | Result of float;;
let print_result r = match r with
| Result(f) -> Printf.printf "result is %f\n" f
| NotYetAvailable -> Printf.printf "result not yet available\n";;

If you're getting a 'NullReferenceException', perhaps you keep referring to objects which no longer exist. This is not an issue with 'null', it's an issue with your code pointing to non-existent addresses.

I'm surprised no one has talked about databases for their answer. Databases have nullable fields, and any language which will be receiving data from a DB needs to handle that. That means having a null value.

In fact, this is so important that for basic types like int you can make them nullable!

Also consider return values from functions, what if you wanted to have a function divide a couple numbers and the denominator could be 0? The only "correct" answer in such a case would be null. (I know, in such a simple example an exception would likely be a better option... but there can be situations where all values are correct but valid data can produce an invalid or uncalculable answer. Not sure an exception should be used in such cases...)

One response mentioned that there are nulls in databases. That's true, but they are very different from nulls in C#.

In C#, nulls are markers for a reference that doesn't refer to anything.

In databases, nulls are markers for value cells that don't contain a value. By value cells, I generally mean the intersection of a row and a column in a table, but the concept of value cells could be extended beyond tables.

The difference between the two seems trivial, at first clance. But it's not.

After all, if there were no "null", I would have no bug, right?

The answer is NO. The problem is not that C# allows null, the problem is that you have bugs which happen to manifest themselves with the NullReferenceException. As has been stated already, nulls have a purpose in the language to indicate either an "empty" reference type, or a non-value (empty/nothing/unknown).

Null do not cause NullPointerExceptions...

Programers cause NullPointerExceptions.

Without nulls we are back to using an actual arbitrary value to determine that the return value of a function or method was invalid. You still have to check for the returned -1 (or whatever), removing nulls will not magically solve lazyness, but mearly obfuscate it.

I propose:

  1. Ban Null
  2. Extend Booleans: True, False and FileNotFound

Commonly - NullReferenceException means that some method didn't like what it was handed and returned a null reference, which was later used without checking the reference before use.

That method could have thown some more detailed exception instead of returning null, which complies with the fail fast mode of thinking.

Or the method might be returning null as a convenience to you, so that you can write if instead of try and avoid the "overhead" of an exception.

Besides ALL of the reasons already mentioned, NULL is needed when you need a placeholder for a not-yet created object. For example. if you have a circular reference between a pair of objects, then you need null since you cannot instantiate both simultaneously.

class A {
B fieldb;
}


class B {
A fielda;
}


A a = new A() // a.fieldb is null
B b = new B() { fielda = a } // b.fielda isnt
a.fieldb = b // now it isnt null anymore

Edit: You may be able to pull out a language that works without nulls, but it will definitely not be an object oriented language. For example, prolog doesn't have null values.

The question may be interpreted as "Is it better to have a default value for each referance type (like String.Empty) or null?". In this prespective I would prefer to have nulls, because;

  • I would not like to write a default constructor for each class I write.
  • I would not like some unneccessary memory to be allocated for such default values.
  • Checking whether a referance is null is rather cheaper than value comparisons.
  • It is highly possible to have more bugs that are harder to detect, instead of NullReferanceExceptions. It is a good thing to have such an exception which clearly indicates that I am doing (assuming) something wrong.

Like many things in object-oriented programming, it all goes back to ALGOL. Tony Hoare just called it his "billion-dollar mistake." If anything, that's an understatement.

Here is a really interesting thesis on how to make nullability not the default in Java. The parallels to C# are obvious.

  • Explicit nulling, though that is seldom necessary. Perhaps one can view it as a form of defensive programming.
  • Use it (or nullable(T) structure for non-nullables) as a flag to indicate a missing field when mapping fields from one data source (byte stream, database table) to an object. It can get very unwieldy to make a boolean flag for every possible nullable field, and it may be impossible to use sentinel values like -1 or 0 when all values in the range that field is valid. This is especially handy when there are many many fields.

Whether these are cases of use or abuse is subjective, but I use them sometimes.

If a framework allows the creation of an array of some type without specifying what should be done with the new items, that type must have some default value. For types which implement mutable reference semantics (*) there is in the general case no sensible default value. I consider it a weakness of the .NET framework that there is no way to specify that a non-virtual function call should suppress any null check. This would allow immutable types like String to behave as value types, by returning sensible values for properties like Length.

(*) Note that in VB.NET and C#, mutable reference semantics may be implemented by either class or struct types; a struct type would implement mutable reference semantics by acting as a proxy for a wrapped instance of a class object to which it holds an immutable reference.

It would also be helpful if one could specify that a class should have non-nullable mutable value-type semantics (implying that--at minimum--instantiating a field of that type would create a new object instance using a default constructor, and that copying a field of that type would create a new instance by copying the old one (recursively handling any nested value-type classes).

It's unclear, however, exactly how much support should be built into the framework for this. Having the framework itself recognize the distinctions between mutable value types, mutable reference types, and immutable types, would allow classes which themselves hold references to a mixture of mutable and immutable types from outside classes to efficiently avoid making unnecessary copies of deeply-immutable objects.

Sorry for answering four years late, I am amazed that none of the answers so far have answered the original question this way:

Languages like C# and Java, like C and other languages before them, have null so that the programmer can write fast, optimized code by using pointers in an efficient way.


  • Low-level view

A little history first. The reason why null was invented is for efficiency. When doing low-level programming in assembly, there is no abstraction, you have values in registers and you want to make the most of them. Defining zero to be a not a valid pointer value is an excellent strategy to represent either an object or nothing.

Why waste most of the possible values of a perfectly good word of memory, when you can have a zero-memory-overhead, really fast implementation of the optional value pattern? This is why null is so useful.

  • High-level view.

Semantically, null is in no way necessary to programming languages. For example, in classic functional languages like Haskell or in the ML family, there is no null, but rather types named Maybe or Option. They represent the more high-level concept of optional value without being concerned in any way by what the generated assembly code will look like (that will be the compiler's job).

And this is very useful too, because it enables the compiler to catch more bugs, and that means less NullReferenceExceptions.

  • Bringing them together

In contrast to these very high-level programming languages, C# and Java allow a possible value of null for every reference type (which is another name for type that will end up being implemented using pointers).

This may seem like a bad thing, but what's good about it is that the programmer can use the knowledge of how it works under the hood, to create more efficient code (even though the language has garbage collection).

This is the reason why null still exists in languages nowadays: a trade-off between the need of a general concept of optional value and the ever-present need for efficiency.

The feature that couldn't work without null is being able to represent "the absence of an object".

The absence of an object is an important concept. In object-oriented programming, we need it in order to represent an association between objects that is optional: object A can be attached to an object B, or A might not have an object B. Without null we could still emulate this: for instance we could use a list of objects to associate B with A. That list could contain one element (one B), or be empty. This is somewhat inconvenient and doesn't really solve anything. Code which assumes that there is a B, such as aobj.blist.first().method() is going to blow up in a similar way to a null reference exception: (if blist is empty, what is the behavior of blist.first()?)

Speaking of lists, null lets you terminate a linked list. A ListNode can contain a reference to another ListNode which can be null. The same can be said about other dynamic set structures such as trees. Null lets you have an ordinary binary tree whose leaf nodes are marked by having child references that are null.

Lists and trees can be built without null, but they have to be circular, or else infinite/lazy. That would probably be regarded as an unacceptable constraint by most programmers, who would prefer to have choices in designing data structures.

The pains associated with null references, like null references arising accidentally due to bugs and causing exceptions, are partially a consequence of the static type system, which introduces a null value into every type: there is a null String, null Integer, null Widget, ...

In a dynamically typed language, there can be a single null object, which has its own type. The upshot of this is that you have all the representational advantages of null, plus greater safety. For instance if you write a method which accepts a String parameter, then you're guaranteed that the parameter will be a string object, and not null. There is no null reference in the String class: something that is known to be a String cannot be the object null. References do not have type in a dynamic language. A storage location such as a class member or function parameter contains a value which can be a reference to an object. That object has type, not the reference.

So these languages provide a clean, more or less matematically pure model of "null", and then the static ones turn it into somewhat of a Frankenstein's monster.