Depending on things you leave to speculation, you may want to wrap the original dictionary and do a sort of copy-on-write.
The "copy" is then a dictionary which looks up stuff in the "parent" dictionary, if it doesn't already contain the key --- but stuffs modifications in itself.
This assumes that you won't be modifying the original and that the extra lookups don't end up costing more.
Looking at the C source for the Python dict operations, you can see that they do a pretty naive (but efficient) copy. It essentially boils down to a call to PyDict_Merge:
PyDict_Merge(PyObject *a, PyObject *b, int override)
This does the quick checks for things like if they're the same object and if they've got objects in them. After that it does a generous one-time resize/alloc to the target dict and then copies the elements one by one. I don't see you getting much faster than the built-in copy().
The measurments are dependent on the dictionary size though. For 10000 entries copy(d) and d.copy() are almost the same.
a = {b: b for b in range(10000)}
In [5]: %timeit copy(a)
10000 loops, best of 3: 186 µs per loop
In [6]: %timeit deepcopy(a)
100 loops, best of 3: 14.1 ms per loop
In [7]: %timeit a.copy()
1000 loops, best of 3: 180 µs per loop
I realise this is an old thread, but this is a high result in search engines for "dict copy python", and the top result for "dict copy performance", and I believe this is relevant.
From Python 3.7, newDict = oldDict.copy() is up to 5.5x faster than it was previously. Notably, right now, newDict = dict(oldDict) does not seem to have this performance increase.