如何在 Java 中删除输入文本中的标点符号?

我试图用 Java 中的用户输入得到一个句子,我需要把它变成小写并删除所有标点符号。这是我的代码:

    String[] words = instring.split("\\s+");
for (int i = 0; i < words.length; i++) {
words[i] = words[i].toLowerCase();
}
String[] wordsout = new String[50];
Arrays.fill(wordsout,"");
int e = 0;
for (int i = 0; i < words.length; i++) {
if (words[i] != "") {
wordsout[e] = words[e];
wordsout[e] = wordsout[e].replaceAll(" ", "");
e++;
}
}
return wordsout;

我似乎找不到任何办法删除所有非字母字符。我曾经尝试过使用正则表达式和迭代器,但没有成功。谢谢你的帮助。

187692 次浏览

You may try this:-

Scanner scan = new Scanner(System.in);
System.out.println("Type a sentence and press enter.");
String input = scan.nextLine();
String strippedInput = input.replaceAll("\\W", "");
System.out.println("Your string: " + strippedInput);

[^\w] matches a non-word character, so the above regular expression will match and remove all non-word characters.

If you don't want to use RegEx (which seems highly unnecessary given your problem), perhaps you should try something like this:

public String modified(final String input){
final StringBuilder builder = new StringBuilder();
for(final char c : input.toCharArray())
if(Character.isLetterOrDigit(c))
builder.append(Character.isLowerCase(c) ? c : Character.toLowerCase(c));
return builder.toString();
}

It loops through the underlying char[] in the String and only appends the char if it is a letter or digit (filtering out all symbols, which I am assuming is what you are trying to accomplish) and then appends the lower case version of the char.

This first removes all non-letter characters, folds to lowercase, then splits the input, doing all the work in a single line:

String[] words = instring.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");

Spaces are initially left in the input so the split will still work.

By removing the rubbish characters before splitting, you avoid having to loop through the elements.

I don't like to use regex, so here is another simple solution.

public String removePunctuations(String s) {
String res = "";
for (Character c : s.toCharArray()) {
if(Character.isLetterOrDigit(c))
res += c;
}
return res;
}

Note: This will include both Letters and Digits

You can use following regular expression construct

Punctuation: One of !"#$%&'()*+,-./:;<=>?@[]^_`{|}~

inputString.replaceAll("\\p{Punct}", "");

If your goal is to REMOVE punctuation, then refer to the above. If the goal is to find words, none of the above solutions does that. INPUT: "This. and:that. with'the-other". OUTPUT: ["This", "and", "that", "with", "the", "other"] but what most of these "replaceAll" solutions is actually giving you is: OUTPUT: ["This", "andthat", "withtheother"]