Binary Hex Comments
0xxxxxxx 0x00..0x7F Only byte of a 1-byte character encoding
10xxxxxx 0x80..0xBF Continuation bytes (1-3 continuation bytes)
110xxxxx 0xC0..0xDF First byte of a 2-byte character encoding
1110xxxx 0xE0..0xEF First byte of a 3-byte character encoding
11110xxx 0xF0..0xF4 First byte of a 4-byte character encoding
(The last line looks as if it should read 0xF0..0xF7; however, the 21-bit range of Unicode (U+0000 - U+10FFFF) means that the maximum valid value is 0xF4; values 0xF5..0xF7 cannot occur in valid UTF-8.)
Looking at whether a particular sequence of bytes is valid UTF-8 means you need to think about:
Continuation bytes appearing where not expected
Non-continuation bytes appearing where a continuation byte is expected
Incomplete characters at end of string (variation of 'continuation byte expected')
Non-minimal sequences
UTF-16 surrogates
In valid UTF-8, the bytes 0xF5..0xFF cannot occur.
Non-minimal sequences
There are multiple possible representations for some characters. For example, the Unicode character U+0000 (ASCII NUL) could be represented by:
0x00
0xC0 0x80
0xE0 0x80 0x80
0xF0 0x80 0x80 0x80
However, the Unicode standard clearly states that the last three alternatives are not acceptable because they are not minimal. It so happens that the bytes 0xC0 and 0xC1 can never appear in valid UTF-8 because the only characters that could be encoded by those are minimally encoded as single byte characters in the range 0x00..0x7F.
UTF-16 Surrogates
Within the Basic Multi-lingual Plane (BMP), the Unicode values U+D800 - U+DFFF are reserved for UTF-16 surrogates and cannot appear encoded in valid UTF-8. If they were valid in UTF-8 (which, I emphasize, they are not), then the surrogates would be encoded:
U+D800 — 0xED 0xA0 0x80 (smallest high surrogate)
U+DBFF — 0xED 0xAF 0xBF (largest high surrogate)
U+DC00 — 0xED 0xB0 0x80 (smallest low surrogate)
U+DFFF — 0xED 0xBF 0xBF (largest low surrogate)
Bad Data
So, your BAD data should contain samples violating these various prescriptions.
Continuation byte not preceded by one of the initial byte values
Multi-character initial bytes not followed by enough continuation bytes
Non-minimal multi-byte characters
UTF-16 surrogates
Invalid bytes (0xC0, 0xC1, 0xF5..0xFF).
Note that a byte-order mark (BOM) U+FEFF, aka zero-width no-break space (ZWNBSP), cannot appear unencoded in UTF-8 — the bytes 0xFF and 0xFE are not permitted in valid UTF-8. An encoded ZWNBSP can appear in a UTF-8 file as 0xEF 0xBB 0xBF, but the BOM is completely superfluous in UTF-8.
There are also some noncharacters in Unicode. U+FFFE and U+FFFF are two such noncharacters (and the last two code points in each plane, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, ... U+10FFFE, U+10FFFF are others). These should not normally appear in Unicode data for data exchange, but can appear in private use. See the Unicode FAQ link for lots of sordid details, including the rather complex history of noncharacters in Unicode. (Corrigendum #9: Clarification About Noncharacters, which was released in January 2013, does what its title suggests — clarifies the meaning of non-characters.)