It can also be a bit confusing because we also use the term serialize to refer to converting a class into data stream for storage or network transmission.
The central distinction between the two is that serializability is a global property; a property of an entire history of operations/transactions. Linearizability is a local property; a property of a single operation/transaction. Another distinction is that linearizability includes a notion of real-time, which serializability does not: the linearization point of an operation must lie between its invocation and response times. (See Tim Harris: Transactional Memory, 2ed. See Herlihy's slides from The Art of Multiprocessor Programming, the section on Linearizability, which are available here, for some examples and proofs.
Both properties are aimed at the same goal: sequential consistency. From Herlihy's paper:
Much work on databases and distributed systems uses serializability as the basic correctness condition for concurrent computations. In this model, a transaction is a thread of control that applies a finite sequence of primitive operations to a set of objects shared with other transactions. A history is serializable if it is equivalent to one in which transactions appear to execute sequentially, i.e., without interleaving. A (partial) precedence order can be defined on non-overlapping pairs of transactions in the obvious way. A history is strictly serializable if the transactions’ order in the sequential history is compatible with their precedence order...
...Linearizability can be viewed as a special case of strict serializability where transactions are restricted to consist of a single operation applied to a single object. Nevertheless, this single-operation restriction has far-reaching practical and formal consequences, giving linearizable computations a different flavor from their serializable counterparts. An immediate practical consequence is that concurrency control mechanisms appropriate for serializability are typically inappropriate for linearizability because they introduce unnecessary overhead and place unnecessary restrictions on concurrency.
Herlihy, Maurice and Jeanette Wing: Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Prog. Lang. and Sys. Vol. 12, No. 3, July 1990, Pages 463-492. URL
http://www.cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf
If you really care about this, read the paper that introduced the definitions. For linearizability, that's Linearizability: A Correctness Condition for Concurrent Objects, Herlihy and Wing. It's dense, but worth the attention. Note that in the software transactional memory community, it's an open question whether linearizability is the right goal / property to aim for.
Serializability is about the outcome of a collection of operations/the "system" being expressible as a specific ordering ("as if execution took place in a specific order...") of all the operations. Linearizability is a property of a single subset of operations in the system... an operation/set of operations are linearizable if they appear to the other operations as if they occurred at a specific instant in (logical) time with respect to the others. The canonical paper here is Papadimitriou, The Serializability of Concurrent Database Updates.
Think "atomic operation" when you're thinking about "linearizable." A (set of) operations are linearizable when they (appear to) occur atomically with respect to other parts of the system. A common formulation is "provide the illusion that each operation takes effect instantaneously between its invocation and response." The formulation of linearizability is due to Herlihy, which emphasizes that this is a local property, vs. other kinds of sequential consistency properties like "serializability" which are global.
See @andersoj's answer for a clear description of the difference between serializability and linearizability.
This is only indirectly relevant to Java concurrent programming. In general, a concurrent Java program does not need to have either a serializable or linearizable history. In the cases that do, serializability is generally sufficient for a program (Java or otherwise) for "correctness", though particular problems could require the stronger linearizability property. But either way, it is the problem that determines the correctness requirements, not Java.
"In plain English, under linearizability, writes should appear to be instantaneous. Imprecisely, once a write completes, all later reads (where “later” is defined by wall-clock start time) should return the value of that write or the value of a later write. Once a read returns a particular value, all later reads should return that value or the value of a later write."
"Serializability is a guarantee about transactions, or groups of one or more operations over one or more objects. It guarantees that the execution of a set of transactions (usually containing read and write operations) over multiple items is equivalent to some serial execution (total ordering) of the transactions."
A good way to understand this is to look at this problem from a database standpoint. (I know you ask for a context of java, sorry)
Assuming, you are a database. You accepts multiple transactions operating on the same object concurrently but you only have one single disk arm.
When you received multiple transactions at the same time, you will have to re-order those operations within transactions in some way so you poor disk arm can handle them one-by-one.
Serializable
you have the ability re-arrange those transactions to make it looks like they happens sequentially (one by one).
As you can imagine, it's not always possible if you accept arbitrary transactions (e.g. one bad transaction last 10 years).
So naturally, you enforce some restrictions or conflict prevention mechanisms then you can say "I'm serializable! :)" .
Linearizable
Not only do you need to do what serialization needs you to do. You also take a good look at those transactions. And try very hard to re-arrange those transactions in a sequential fashion without breaking the semantic order of transactions. As you might have noticed, semantic order is the key.
Basically, in order to claim that you are linearizable, you will have to assume/find a linearization point for every transactions and then order them according to the linearization point.
Therefore, it's uncommon for a versatile RDMS database to say Hey I'm linearizable!.
But, it's not uncommon if you are a Key-Value database.
e.g. As a KV database, you can say "I am linearizable!" if you can ensure a read will always get the latest possible write.
(assuming the moment of sending response for the read operation is the linearization point)
This sounds trivial, but will be a major challenge if you are a distributed KV database.
Also note that serializability doesn't require you to give the same guarantee.
Imagine you have your distributed system running on 3 machines (3 replicas). We simply want to write and read value of key "A".
Linearizable
In this model, our system will work like this - As soon as write(A) is done, we can read the latest value from any of the three replicas. Meaning, once an action (write/read) is done, its impact is visible to all future actions across the whole system (Like an atomic operation).
Here, the order of operation doesn't matter. So, in reality, we want to do write and read, but the system can do read and write and its fine and will still be Linearizable. Paxos/Raft algorithms provide this for a transaction for majority nodes.
Serializable
In this model, all the replicas (machines) will process actions/events in same order. So, if one of the replica does write(A) and read(A), other replicas will do in same order.
Here, it's possible to get stale data from other replicas. Let's say - replica_1 does write(A). Read(A) from replica_1 will return latest value, but if we read from replica_2, we may get old value. The only guarantee is whenever (no time limit) replica_2 will process in same order - write and then read. Zookeeper provides this.