That way you're guaranteed not to get "invalid" unicode sequences such as the first half of a surrogate pair without the second half. Nothing's going to decide to normalize the data into something strange (it's all ASCII). There's no chance of using code points which aren't registered in Unicode, or anything like that. Oh, and you can cut and paste without much fear, too.
Yes, you end up with 4 characters for every 3 bytes - but that's a small price to pay for the knowledge that your data won't be corrupted.