正则表达式[ A-z ]和[ a-zA-Z ]之间的区别

我正在使用正则表达式来编程输入验证器的文本框中，我只需要字母字符。我想知道 [A-z]和 [a-zA-Z]是否是等价的，或者在性能方面是否存在差异。

在我的搜索中，我一直在阅读 [a-zA-Z]，但是没有提到 [A-z]。

我用的是 Java 的 String.matches(regex)。

regex

114624 次浏览

小开

最佳答案

[A-z] will match ASCII characters in the range from A to z, while [a-zA-Z] will match ASCII characters in the range from A to Z A5 in the range from a to z. At first glance, this might seem equivalent -- however, if you look at A6 of ASCII characters, you'll see that A-z includes several other characters. Specifically, they are [, A0, A1, A2, A3, and A4 (which you clearly don't want).

小开

Take a look at ASCII table. You'll see that there are some characters between Z and a, so you will match more than you intented to.

小开

The a-z matchs 'a' to 'z' A-Z matchs 'A' to 'Z' A-z matches all these as well as the characters between the 'Z' and 'a' which are [ ] ^ / _ `

Refer to http://www.asciitable.com/

小开

The square brackets create a character class and the hyphen is a shorthand for adding every character between the two provided characters. i.e. [A-F] can be written [ABCDEF].

The character class [A-z] will match every character between those characters, which in ASCII includes some other characters such as '[', '\' and ']'.

An alternative to specifying both cases would be to set the regular expression to be case-insensitive, by using the /i modifier.

小开

Take a look at the ASCII chart (which Java characters are based on): there are quite a few punctuation characters situated between Z and a, namely these:

[\]^ _`

小开

When you take a look at the ASCII table, you will see following:

A = 65
Z = 90
a = 97
z = 122

So, [A-z] will match every char from 65 to 122. This includes these characters (91 -> 96) as well:

[\]^_`

This means [A-Za-z] will match only the alphabet, without the extra characters above.