To match individual characters, you can simply include them in an a character class, either as literals or via the \u03FB syntax.
Obviously you often cannot list all allowed characters in ideographic languages. To make the regex treat unicode characters according to their type or code block, various other escapes are supported that are defined here. Look at the section "Unicode support", particularly the references to the Character class and to the Unicode Standard itself.
the Java regular expression API works on the char type
the char type is implicitly UTF-16
if you have UTF-8 data you will need to transcode it to UTF-16 on input if this is not already being done
Unicode is the universal set of characters and UTF-8 can describe all of it (including control characters, punctuation, symbols, letters, etc.) You will have to be more specific about what you want to include and what you want to exclude. Java regular expressions uses the \p{category} syntax to match codepoints by category. See the Unicode standard for the list of categories.
If you want to identify and separate words in a sequence of ideographs, you will need to look at a more sophisticated API. I would start with the BreakIterator type.