正则表达式匹配一定长度的单词

我想知道正则表达式来匹配单词,使单词有一个最大长度。 例如,如果一个单词的长度最大为10个字符,我希望正则表达式匹配,但如果长度超过10,那么正则表达式不应该匹配。

我尽力了

^(\w{10})$

但是,只有当单词的最小长度为10个字符时,才能匹配。如果单词超过10个字符,它仍然匹配,但只匹配前10个字符。

269376 次浏览

I think you want \b\w{1,10}\b. The \b matches a word boundary.

Of course, you could also replace the \b and do ^\w{1,10}$. This will match a word of at most 10 characters as long as its the only contents of the string. I think this is what you were doing before.

Since it's Java, you'll actually have to escape the backslashes: "\\b\\w{1,10}\\b". You probably knew this already, but it's gotten me before.

^\w{0,10}$ # allows words of up to 10 characters.
^\w{5,}$   # allows words of more than 4 characters.
^\w{5,10}$ # allows words of between 5 and 10 characters.

Length of characters to be matched.

{n,m}  n <= length <= m
{n}    length == n
{n,}   length >= n

And by default, the engine is greedy to match this pattern. For example, if the input is 123456789, \d{2,5} will match 12345 which is with length 5.

If you want the engine returns when length of 2 matched, use \d{2,5}?

Even, I was looking for the same regex but I wanted to include the all special character and blank spaces too. So here is the regex for that:

^[A-Za-z0-9\s$&+,:;=?@#|'<>.^*()%!-]{0,10}$

Method 1

Word boundaries would work perfectly here, such as with:

\b\w{3,8}\b
\b\w{2,}
\b\w{,10}\b
\b\w{5}\b

RegEx Demo 1

Java

Some languages such as Java and C++ are double-escape required:

\\b\\w{3,8}\\b
\\b\\w{2,}
\\b\\w{,10}\\b
\\b\\w{5}\\b

PS: \\b\\w{,10}\\b may not work for all languages or flavors.

Test 1

import java.util.regex.Matcher;
import java.util.regex.Pattern;




public class RegularExpression{


public static void main(String[] args){




final String regex = "\\b\\w{3,8}\\b";
final String string = "words with length three to eight";


final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);


while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
}


}
}

Output 1

Full match: words
Full match: with
Full match: length
Full match: three
Full match: eight

Method 2

Another good-to-know method is to use negative lookarounds:

(?<!\w)\w{3,8}(?!\w)
(?<!\w)\w{2,}
(?<!\w)\w{,10}(?!\w)
(?<!\w)\w{5}(?!\w)

Java

(?<!\\w)\\w{3,8}(?!\\w)
(?<!\\w)\\w{2,}
(?<!\\w)\\w{,10}(?!\\w)
(?<!\\w)\\w{5}(?!\\w)

RegEx Demo 2

Test 2

import java.util.regex.Matcher;
import java.util.regex.Pattern;




public class RegularExpression{


public static void main(String[] args){




final String regex = "(?<!\\w)\\w{1,10}(?!\\w)";
final String string = "words with length three to eight";


final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);


while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
}


}
}

Output 2

Full match: words
Full match: with
Full match: length
Full match: three
Full match: to
Full match: eight

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here


If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Liked Pardeep's answer but I needed whole word bounds in a string/title that can be any messed up string an advertising dept. can think up .

**\b\w(**[A-Za-z0-9\s$&+,:;=?@#|'<>.^*()%!-]{1,22}**)\b**

should iterate through a string ( tested notepad++ ) and get the largest group of words in the range i.e. 1,22 chars here without splitting mid word.

Here was the final command for me in python to add some LF's

name = re.sub(r"\b(\w[A-Za-z0-9\s$&+,:;=?@#|'<>.^*()%!-]{1,22})\b","\\\1\\\n",name)

Simple, complete and tested java code, for finding words of certain length n:

int n = 10;
String regex = "\\b\\w{" + n + "}\\b";
String str = "Hello, this is a test 1234567890";
ArrayList<String> words = new ArrayList<>();
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
words.add(matcher.group(0));
}
System.out.println(words);

For more explanations and different options - see other answers.