正则表达式来匹配文本中带或不带逗号和小数的数字

我正在尝试定位和替换文本体中的所有数字。我找到了几个几乎可以解决这个问题的正则表达式示例,但没有一个是完美的。我的问题是,我的文本中的数字可能有也可能没有小数和逗号。例如:

“这只5000磅的狐狸跳过了99.999.9998713英尺的栅栏。”

正则表达式应该返回“ 5000”和“ 99,999.99998713”。例如,我发现逗号上的数字被拆分,或者被限制在小数点后两位。我开始理解正则表达式,足以看出为什么一些示例被限制在小数点后两位,但我还没有学会如何克服它,并包括逗号获得整个序列。

以下是我的最新版本:

[0-9]+(\.[0-9][0-9]?)?

Which returns, "5000", "99,99", "9.99", and "998713" for the above text.

223607 次浏览

下面的正则表达式将匹配您的示例中的两个数字。

\b\d[\d,.]*\b

它将返回5000和99.999.99998713-匹配您的需求。

编辑: 这篇文章获得了大量的浏览量,让我首先告诉大家他们在谷歌上搜索的内容:

#ALL THESE REQUIRE THE WHOLE STRING TO BE A NUMBER
#For numbers embedded in sentences, see discussion below


#### NUMBERS AND DECIMALS ONLY ####
#No commas allowed
#Pass: (1000.0), (001), (.001)
#Fail: (1,000.0)
^\d*\.?\d+$


#No commas allowed
#Can't start with "."
#Pass: (0.01)
#Fail: (.01)
^(\d+\.)?\d+$


#### CURRENCY ####
#No commas allowed
#"$" optional
#Can't start with "."
#Either 0 or 2 decimal digits
#Pass: ($1000), (1.00), ($0.11)
#Fail: ($1.0), (1.), ($1.000), ($.11)
^\$?\d+(\.\d{2})?$


#### COMMA-GROUPED ####
#Commas required between powers of 1,000
#Can't start with "."
#Pass: (1,000,000), (0.001)
#Fail: (1000000), (1,00,00,00), (.001)
^\d{1,3}(,\d{3})*(\.\d+)?$


#Commas required
#Cannot be empty
#Pass: (1,000.100), (.001)
#Fail: (1000), ()
^(?=.)(\d{1,3}(,\d{3})*)?(\.\d+)?$


#Commas optional as long as they're consistent
#Can't start with "."
#Pass: (1,000,000), (1000000)
#Fail: (10000,000), (1,00,00)
^(\d+|\d{1,3}(,\d{3})*)(\.\d+)?$


#### LEADING AND TRAILING ZEROES ####
#No commas allowed
#Can't start with "."
#No leading zeroes in integer part
#Pass: (1.00), (0.00)
#Fail: (001)
^([1-9]\d*|0)(\.\d+)?$


#No commas allowed
#Can't start with "."
#No trailing zeroes in decimal part
#Pass: (1), (0.1)
#Fail: (1.00), (0.1000)
^\d+(\.\d*[1-9])?$

既然这个问题已经解决了,接下来的大部分内容都是为了说明如果您试图巧妙地使用正则表达式,它会变得多么复杂,以及为什么应该寻找替代方法。阅读风险自负。


这是一个非常常见的任务,但是到目前为止我在这里看到的所有答案都将接受与数字格式不匹配的输入,例如 ,1119,9,9或甚至 .,,.。这很容易解决,即使这些数字嵌入到其他文本中。恕我直言,任何不能把1,234.56和1234ー 只有那些数字ー从 abc22 1,234.56 9.9.9.9 def 1234中拉出来的答案都是错误的。

首先,如果不需要在一个正则表达式中完成这些操作,那么不要这样做。对于两种不同的数字格式,即使没有嵌入到其他文本中,也很难维护一个单独的正则表达式。您真正应该做的是在空格中分割所有内容,然后在结果上运行两个或三个较小的正则表达式。如果这对你来说不是一个选择,那就继续读下去。

Basic pattern

考虑到你给出的例子,这里有一个简单的正则表达式,它允许几乎任何 0000格式的整数或小数,并阻塞所有其他格式:

^\d*\.?\d+$

这里有一个需要 0,000格式的:

^\d{1,3}(,\d{3})*(\.\d+)?$

Put them together, and commas become optional as long as they're consistent:

^(\d*\.?\d+|\d{1,3}(,\d{3})*(\.\d+)?)$

Embedded numbers

上面的模式要求整个输入是一个数字。你要找的是嵌入在文本中的数字所以你得把那部分弄松。另一方面,你不希望它看到 catch22,并认为它找到了数字22。如果您使用的是后向支持(如 C # ,。NET 4.0 +) ,这很简单: 用 (?<!\S)代替 ^,用 (?!\S)代替 $,然后你就可以开始了:

(?<!\S)(\d*\.?\d+|\d{1,3}(,\d{3})*(\.\d+)?)(?!\S)

如果你使用的是 JavaScript 或者 Ruby 之类的语言,那么事情就会变得更加复杂:

(?:^|\s)(\d*\.?\d+|\d{1,3}(?:,\d{3})*(?:\.\d+)?)(?!\S)

您必须使用捕获组; 如果没有后视支持,我无法想出替代方案。您想要的数字将在第1组(假设整个比赛是第0组)。

Validation and more complex rules

I think that covers your question, so if that's all you need, stop reading now. If you want to get fancier, things turn very complex very quickly. Depending on your situation, you may want to block any or all of the following:

  • 输入空白
  • 前导零(例如000123)
  • 尾随零(例如1.2340000)
  • Decimals 以小数点开始(例如.001而非0.001)

Just for the hell of it, let's assume you want to block the first 3, but allow the last one. What should you do? I'll tell you what you should do, you should use a different regex for each rule and progressively narrow down your matches. But for the sake of the challenge, here's how you do it all in one giant pattern:

(?<!\S)(?=.)(0|([1-9](\d*|\d{0,2}(,\d{3})*)))?(\.\d*[1-9])?(?!\S)

这就是它的意思:

(?<!\S) to (?!\S) #The whole match must be surrounded by either whitespace or line boundaries. So if you see something bogus like :;:9.:, ignore the 9.
(?=.)             #The whole thing can't be blank.


(                    #Rules for the integer part:
0                  #1. The integer part could just be 0...
|                  #
[1-9]              #   ...otherwise, it can't have leading zeroes.
(                  #
\d*              #2. It could use no commas at all...
|                #
\d{0,2}(,\d{3})* #   ...or it could be comma-separated groups of 3 digits each.
)                  #
)?                   #3. Or there could be no integer part at all.


(       #Rules for the decimal part:
\.    #1. It must start with a decimal point...
\d*   #2. ...followed by a string of numeric digits only.
[1-9] #3. It can't be just the decimal point, and it can't end in 0.
)?      #4. The whole decimal part is also optional. Remember, we checked at the beginning to make sure the whole thing wasn't blank.

Tested here: http://rextester.com/YPG96786

这将允许这样的事情:

100,000
999.999
90.0009
1,000,023.999
0.111
.111
0

It will block things like:

1,1,1.111
000,001.111
999.
0.
111.110000
1.1.1.111
9.909,888

There are several ways to make this regex simpler and shorter, but understand that changing the pattern will loosen what it considers a number.

由于许多正则表达式引擎(例如 JavaScript 和 Ruby)不支持负向回溯,正确做到这一点的唯一方法是使用捕获组:

(?:^|\s)(?=.)((?:0|(?:[1-9](?:\d*|\d{0,2}(?:,\d{3})*)))?(?:\.\d*[1-9])?)(?!\S)

你们要找的数字在第一捕获组。

在此测试: http://rubular.com/r/3HCSkndzhT

最后一个音符

显然,这是一个庞大、复杂、几乎无法读取的正则表达式。我很喜欢这个挑战,但是你应该用 考虑您是否真的想要在生产环境中使用它。而不是试图在一个步骤中完成所有的事情,你可以用两个步骤来完成: 一个正则表达式捕捉任何 might为数字的内容,然后另一个正则表达式剔除任何 不是为数字的内容。或者您可以进行一些基本的处理,然后使用语言内置的数字解析函数。你自己选吧。

在需求方面有一定的自由度,你要找的是

\d+([\d,]?\d)*(\.\d+)?

但是请注意,这将匹配例如11,11,1

\d+(,\d+)*(\.\d+)?

这假设在任何逗号或小数之前或之后总是至少有一个数字,还假设最多有一个小数,并且所有逗号都在小数之前。

几天前,我在 从数字的字符串中移除尾随零的问题工作。

在这个问题的连续性中,我发现这个问题很有趣,因为它将问题扩展到包含逗号的数字。

我已经采用了我在前一个问题中编写的 regex 模式,并对它进行了改进,以便它能够将逗号数字作为这个问题的答案。

我被自己的热情和对正则表达式的喜爱冲昏了头脑。我不知道这个结果是否完全符合迈克尔 · 普雷斯科特所表达的需求。我很想知道我的正则表达式中有哪些点是过量的或者不足的,并且纠正它,使它更适合你。

Now, after a long session of work on this regex, I have a sort of weight in the brain, so I'm not fresh enough to give a lot of explanation. If points are obscure, and if anybody may come to be interested enough, please, ask me.

正则表达式的设计是为了能够检测到以科学记数法 第二季,第10集甚至 5,22,454.12E-00.0478表示的数字,同时去除这些数字的两部分中不必要的零。如果一个指数等于零,则修改该数字,以便不再有指数。

我在模式中加入了一些验证,以便某些特定情况下不匹配,例如 12. . 57不匹配。但是在 111中,字符串 111匹配,因为前面的逗号被认为是句子的逗号而不是数字。

我认为逗号的管理应该得到改进,因为在我看来,印度数字中逗号之间只有2位数。我想这不难纠正吧

Here after is a code demonstrating how my regex works. There are two functions, according if one wants the numbers 1245 to be transformed in ’0.1245’ or not. I wouldn't be surprised if errors or unwanted matchings or unmatchings will remain for certain cases of number strings; then I'd like to know these cases to understand and correct the deficiency.

我为这段用 Python 编写的代码道歉,但 reex 是跨语言的,我想每个人都能够理解 reex 的模式

import re


regx = re.compile('(?<![\d.])(?!\.\.)(?<![\d.][eE][+-])(?<![\d.][eE])(?<!\d[.,])'
'' #---------------------------------
'([+-]?)'
'(?![\d,]*?\.[\d,]*?\.[\d,]*?)'
'(?:0|,(?=0)|(?<!\d),)*'
'(?:'
'((?:\d(?!\.[1-9])|,(?=\d))+)[.,]?'
'|\.(0)'
'|((?<!\.)\.\d+?)'
'|([\d,]+\.\d+?))'
'0*'
'' #---------------------------------
'(?:'
'([eE][+-]?)(?:0|,(?=0))*'
'(?:'
'(?!0+(?=\D|\Z))((?:\d(?!\.[1-9])|,(?=\d))+)[.,]?'
'|((?<!\.)\.(?!0+(?=\D|\Z))\d+?)'
'|([\d,]+\.(?!0+(?=\D|\Z))\d+?))'
'0*'
')?'
'' #---------------------------------
'(?![.,]?\d)')




def dzs_numbs(x,regx = regx): # ds = detect and zeros-shave
if not regx.findall(x):
yield ('No match,', 'No catched string,', 'No groups.')
for mat in regx.finditer(x):
yield (mat.group(), ''.join(mat.groups('')), mat.groups(''))


def dzs_numbs2(x,regx = regx): # ds = detect and zeros-shave
if not regx.findall(x):
yield ('No match,', 'No catched string,', 'No groups.')
for mat in regx.finditer(x):
yield (mat.group(),
''.join(('0' if n.startswith('.') else '')+n for n in mat.groups('')),
mat.groups(''))


NS = ['  23456000and23456000. or23456000.000  00023456000 s000023456000.  000023456000.000 ',
'arf 10000 sea10000.+10000.000  00010000-00010000. kant00010000.000 ',
'  24:  24,  24.   24.000  24.000,   00024r 00024. blue 00024.000  ',
'  8zoom8.  8.000  0008  0008. and0008.000  ',
'  0   00000M0. = 000.  0.0  0.000    000.0   000.000   .000000   .0   ',
'  .0000023456    .0000023456000   '
'  .0005872    .0005872000   .00503   .00503000   ',
'  .068    .0680000   .8   .8000  .123456123456    .123456123456000    ',
'  .657   .657000   .45    .4500000   .7    .70000  0.0000023230000   000.0000023230000   ',
'  0.0081000    0000.0081000  0.059000   0000.059000     ',
'  0.78987400000 snow  00000.78987400000  0.4400000   00000.4400000   ',
'  -0.5000  -0000.5000   0.90   000.90   0.7   000.7   ',
'  2.6    00002.6   00002.60000  4.71   0004.71    0004.7100   ',
'  23.49   00023.49   00023.490000  103.45   0000103.45   0000103.45000    ',
'  10003.45067   000010003.45067   000010003.4506700 ',
'  +15000.0012   +000015000.0012   +000015000.0012000    ',
'  78000.89   000078000.89   000078000.89000    ',
'  .0457e10   .0457000e10   00000.0457000e10  ',
'   258e8   2580000e4   0000000002580000e4   ',
'  0.782e10   0000.782e10   0000.7820000e10  ',
'  1.23E2   0001.23E2  0001.2300000E2   ',
'  432e-102  0000432e-102   004320000e-106   ',
'  1.46e10and0001.46e10  0001.4600000e10   ',
'  1.077e-300  0001.077e-300  0001.077000e-300   ',
'  1.069e10   0001.069e10   0001.069000e10   ',
'  105040.03e10  000105040.03e10  105040.0300e10    ',
'  +286E000024.487900  -78.4500e.14500   .0140E789.  ',
'  081,12.40E07,95.0120     0045,78,123.03500e-0.00  ',
'  0096,78,473.0380e-0.    0008,78,373.066000E0.    0004512300.E0000  ',
'  ..18000  25..00 36...77   2..8  ',
'  3.8..9    .12500.     12.51.400  ',
'  00099,111.8713000   -0012,45,83,987.26+0.000,099,88,44.or00,00,00.00must',
'  00099,44,and   0000,099,88,44.bom',
'00,000,00.587000  77,98,23,45.,  this,that ',
'  ,111  145.20  +9,9,9  0012800  .,,.  1  100,000 ',
'1,1,1.111  000,001.111   -999.  0.  111.110000  1.1.1.111  9.909,888']




for ch in NS:
print 'string: '+repr(ch)
for strmatch, modified, the_groups in dzs_numbs2(ch):
print strmatch.rjust(20),'',modified,'',the_groups
print

结果

string: '  23456000and23456000. or23456000.000  00023456000 s000023456000.  000023456000.000 '
23456000  23456000  ('', '23456000', '', '', '', '', '', '', '')
23456000.  23456000  ('', '23456000', '', '', '', '', '', '', '')
23456000.000  23456000  ('', '23456000', '', '', '', '', '', '', '')
00023456000  23456000  ('', '23456000', '', '', '', '', '', '', '')
000023456000.  23456000  ('', '23456000', '', '', '', '', '', '', '')
000023456000.000  23456000  ('', '23456000', '', '', '', '', '', '', '')


string: 'arf 10000 sea10000.+10000.000  00010000-00010000. kant00010000.000 '
10000  10000  ('', '10000', '', '', '', '', '', '', '')
10000.  10000  ('', '10000', '', '', '', '', '', '', '')
10000.000  10000  ('', '10000', '', '', '', '', '', '', '')
00010000  10000  ('', '10000', '', '', '', '', '', '', '')
00010000.  10000  ('', '10000', '', '', '', '', '', '', '')
00010000.000  10000  ('', '10000', '', '', '', '', '', '', '')


string: '  24:  24,  24.   24.000  24.000,   00024r 00024. blue 00024.000  '
24  24  ('', '24', '', '', '', '', '', '', '')
24,  24  ('', '24', '', '', '', '', '', '', '')
24.  24  ('', '24', '', '', '', '', '', '', '')
24.000  24  ('', '24', '', '', '', '', '', '', '')
24.000  24  ('', '24', '', '', '', '', '', '', '')
00024  24  ('', '24', '', '', '', '', '', '', '')
00024.  24  ('', '24', '', '', '', '', '', '', '')
00024.000  24  ('', '24', '', '', '', '', '', '', '')


string: '  8zoom8.  8.000  0008  0008. and0008.000  '
8  8  ('', '8', '', '', '', '', '', '', '')
8.  8  ('', '8', '', '', '', '', '', '', '')
8.000  8  ('', '8', '', '', '', '', '', '', '')
0008  8  ('', '8', '', '', '', '', '', '', '')
0008.  8  ('', '8', '', '', '', '', '', '', '')
0008.000  8  ('', '8', '', '', '', '', '', '', '')


string: '  0   00000M0. = 000.  0.0  0.000    000.0   000.000   .000000   .0   '
0  0  ('', '0', '', '', '', '', '', '', '')
00000  0  ('', '0', '', '', '', '', '', '', '')
0.  0  ('', '0', '', '', '', '', '', '', '')
000.  0  ('', '0', '', '', '', '', '', '', '')
0.0  0  ('', '', '0', '', '', '', '', '', '')
0.000  0  ('', '', '0', '', '', '', '', '', '')
000.0  0  ('', '', '0', '', '', '', '', '', '')
000.000  0  ('', '', '0', '', '', '', '', '', '')
.000000  0  ('', '', '0', '', '', '', '', '', '')
.0  0  ('', '', '0', '', '', '', '', '', '')


string: '  .0000023456    .0000023456000     .0005872    .0005872000   .00503   .00503000   '
.0000023456  0.0000023456  ('', '', '', '.0000023456', '', '', '', '', '')
.0000023456000  0.0000023456  ('', '', '', '.0000023456', '', '', '', '', '')
.0005872  0.0005872  ('', '', '', '.0005872', '', '', '', '', '')
.0005872000  0.0005872  ('', '', '', '.0005872', '', '', '', '', '')
.00503  0.00503  ('', '', '', '.00503', '', '', '', '', '')
.00503000  0.00503  ('', '', '', '.00503', '', '', '', '', '')


string: '  .068    .0680000   .8   .8000  .123456123456    .123456123456000    '
.068  0.068  ('', '', '', '.068', '', '', '', '', '')
.0680000  0.068  ('', '', '', '.068', '', '', '', '', '')
.8  0.8  ('', '', '', '.8', '', '', '', '', '')
.8000  0.8  ('', '', '', '.8', '', '', '', '', '')
.123456123456  0.123456123456  ('', '', '', '.123456123456', '', '', '', '', '')
.123456123456000  0.123456123456  ('', '', '', '.123456123456', '', '', '', '', '')


string: '  .657   .657000   .45    .4500000   .7    .70000  0.0000023230000   000.0000023230000   '
.657  0.657  ('', '', '', '.657', '', '', '', '', '')
.657000  0.657  ('', '', '', '.657', '', '', '', '', '')
.45  0.45  ('', '', '', '.45', '', '', '', '', '')
.4500000  0.45  ('', '', '', '.45', '', '', '', '', '')
.7  0.7  ('', '', '', '.7', '', '', '', '', '')
.70000  0.7  ('', '', '', '.7', '', '', '', '', '')
0.0000023230000  0.000002323  ('', '', '', '.000002323', '', '', '', '', '')
000.0000023230000  0.000002323  ('', '', '', '.000002323', '', '', '', '', '')


string: '  0.0081000    0000.0081000  0.059000   0000.059000     '
0.0081000  0.0081  ('', '', '', '.0081', '', '', '', '', '')
0000.0081000  0.0081  ('', '', '', '.0081', '', '', '', '', '')
0.059000  0.059  ('', '', '', '.059', '', '', '', '', '')
0000.059000  0.059  ('', '', '', '.059', '', '', '', '', '')


string: '  0.78987400000 snow  00000.78987400000  0.4400000   00000.4400000   '
0.78987400000  0.789874  ('', '', '', '.789874', '', '', '', '', '')
00000.78987400000  0.789874  ('', '', '', '.789874', '', '', '', '', '')
0.4400000  0.44  ('', '', '', '.44', '', '', '', '', '')
00000.4400000  0.44  ('', '', '', '.44', '', '', '', '', '')


string: '  -0.5000  -0000.5000   0.90   000.90   0.7   000.7   '
-0.5000  -0.5  ('-', '', '', '.5', '', '', '', '', '')
-0000.5000  -0.5  ('-', '', '', '.5', '', '', '', '', '')
0.90  0.9  ('', '', '', '.9', '', '', '', '', '')
000.90  0.9  ('', '', '', '.9', '', '', '', '', '')
0.7  0.7  ('', '', '', '.7', '', '', '', '', '')
000.7  0.7  ('', '', '', '.7', '', '', '', '', '')


string: '  2.6    00002.6   00002.60000  4.71   0004.71    0004.7100   '
2.6  2.6  ('', '', '', '', '2.6', '', '', '', '')
00002.6  2.6  ('', '', '', '', '2.6', '', '', '', '')
00002.60000  2.6  ('', '', '', '', '2.6', '', '', '', '')
4.71  4.71  ('', '', '', '', '4.71', '', '', '', '')
0004.71  4.71  ('', '', '', '', '4.71', '', '', '', '')
0004.7100  4.71  ('', '', '', '', '4.71', '', '', '', '')


string: '  23.49   00023.49   00023.490000  103.45   0000103.45   0000103.45000    '
23.49  23.49  ('', '', '', '', '23.49', '', '', '', '')
00023.49  23.49  ('', '', '', '', '23.49', '', '', '', '')
00023.490000  23.49  ('', '', '', '', '23.49', '', '', '', '')
103.45  103.45  ('', '', '', '', '103.45', '', '', '', '')
0000103.45  103.45  ('', '', '', '', '103.45', '', '', '', '')
0000103.45000  103.45  ('', '', '', '', '103.45', '', '', '', '')


string: '  10003.45067   000010003.45067   000010003.4506700 '
10003.45067  10003.45067  ('', '', '', '', '10003.45067', '', '', '', '')
000010003.45067  10003.45067  ('', '', '', '', '10003.45067', '', '', '', '')
000010003.4506700  10003.45067  ('', '', '', '', '10003.45067', '', '', '', '')


string: '  +15000.0012   +000015000.0012   +000015000.0012000    '
+15000.0012  +15000.0012  ('+', '', '', '', '15000.0012', '', '', '', '')
+000015000.0012  +15000.0012  ('+', '', '', '', '15000.0012', '', '', '', '')
+000015000.0012000  +15000.0012  ('+', '', '', '', '15000.0012', '', '', '', '')


string: '  78000.89   000078000.89   000078000.89000    '
78000.89  78000.89  ('', '', '', '', '78000.89', '', '', '', '')
000078000.89  78000.89  ('', '', '', '', '78000.89', '', '', '', '')
000078000.89000  78000.89  ('', '', '', '', '78000.89', '', '', '', '')


string: '  .0457e10   .0457000e10   00000.0457000e10  '
.0457e10  0.0457e10  ('', '', '', '.0457', '', 'e', '10', '', '')
.0457000e10  0.0457e10  ('', '', '', '.0457', '', 'e', '10', '', '')
00000.0457000e10  0.0457e10  ('', '', '', '.0457', '', 'e', '10', '', '')


string: '   258e8   2580000e4   0000000002580000e4   '
258e8  258e8  ('', '258', '', '', '', 'e', '8', '', '')
2580000e4  2580000e4  ('', '2580000', '', '', '', 'e', '4', '', '')
0000000002580000e4  2580000e4  ('', '2580000', '', '', '', 'e', '4', '', '')


string: '  0.782e10   0000.782e10   0000.7820000e10  '
0.782e10  0.782e10  ('', '', '', '.782', '', 'e', '10', '', '')
0000.782e10  0.782e10  ('', '', '', '.782', '', 'e', '10', '', '')
0000.7820000e10  0.782e10  ('', '', '', '.782', '', 'e', '10', '', '')


string: '  1.23E2   0001.23E2  0001.2300000E2   '
1.23E2  1.23E2  ('', '', '', '', '1.23', 'E', '2', '', '')
0001.23E2  1.23E2  ('', '', '', '', '1.23', 'E', '2', '', '')
0001.2300000E2  1.23E2  ('', '', '', '', '1.23', 'E', '2', '', '')


string: '  432e-102  0000432e-102   004320000e-106   '
432e-102  432e-102  ('', '432', '', '', '', 'e-', '102', '', '')
0000432e-102  432e-102  ('', '432', '', '', '', 'e-', '102', '', '')
004320000e-106  4320000e-106  ('', '4320000', '', '', '', 'e-', '106', '', '')


string: '  1.46e10and0001.46e10  0001.4600000e10   '
1.46e10  1.46e10  ('', '', '', '', '1.46', 'e', '10', '', '')
0001.46e10  1.46e10  ('', '', '', '', '1.46', 'e', '10', '', '')
0001.4600000e10  1.46e10  ('', '', '', '', '1.46', 'e', '10', '', '')


string: '  1.077e-300  0001.077e-300  0001.077000e-300   '
1.077e-300  1.077e-300  ('', '', '', '', '1.077', 'e-', '300', '', '')
0001.077e-300  1.077e-300  ('', '', '', '', '1.077', 'e-', '300', '', '')
0001.077000e-300  1.077e-300  ('', '', '', '', '1.077', 'e-', '300', '', '')


string: '  1.069e10   0001.069e10   0001.069000e10   '
1.069e10  1.069e10  ('', '', '', '', '1.069', 'e', '10', '', '')
0001.069e10  1.069e10  ('', '', '', '', '1.069', 'e', '10', '', '')
0001.069000e10  1.069e10  ('', '', '', '', '1.069', 'e', '10', '', '')


string: '  105040.03e10  000105040.03e10  105040.0300e10    '
105040.03e10  105040.03e10  ('', '', '', '', '105040.03', 'e', '10', '', '')
000105040.03e10  105040.03e10  ('', '', '', '', '105040.03', 'e', '10', '', '')
105040.0300e10  105040.03e10  ('', '', '', '', '105040.03', 'e', '10', '', '')


string: '  +286E000024.487900  -78.4500e.14500   .0140E789.  '
+286E000024.487900  +286E24.4879  ('+', '286', '', '', '', 'E', '', '', '24.4879')
-78.4500e.14500  -78.45e0.145  ('-', '', '', '', '78.45', 'e', '', '.145', '')
.0140E789.  0.014E789  ('', '', '', '.014', '', 'E', '789', '', '')


string: '  081,12.40E07,95.0120     0045,78,123.03500e-0.00  '
081,12.40E07,95.0120  81,12.4E7,95.012  ('', '', '', '', '81,12.4', 'E', '', '', '7,95.012')
0045,78,123.03500  45,78,123.035  ('', '', '', '', '45,78,123.035', '', '', '', '')


string: '  0096,78,473.0380e-0.    0008,78,373.066000E0.    0004512300.E0000  '
0096,78,473.0380  96,78,473.038  ('', '', '', '', '96,78,473.038', '', '', '', '')
0008,78,373.066000  8,78,373.066  ('', '', '', '', '8,78,373.066', '', '', '', '')
0004512300.  4512300  ('', '4512300', '', '', '', '', '', '', '')


string: '  ..18000  25..00 36...77   2..8  '
No match,  No catched string,  No groups.


string: '  3.8..9    .12500.     12.51.400  '
No match,  No catched string,  No groups.


string: '  00099,111.8713000   -0012,45,83,987.26+0.000,099,88,44.or00,00,00.00must'
00099,111.8713000  99,111.8713  ('', '', '', '', '99,111.8713', '', '', '', '')
-0012,45,83,987.26  -12,45,83,987.26  ('-', '', '', '', '12,45,83,987.26', '', '', '', '')
00,00,00.00  0  ('', '', '0', '', '', '', '', '', '')


string: '  00099,44,and   0000,099,88,44.bom'
00099,44,  99,44  ('', '99,44', '', '', '', '', '', '', '')
0000,099,88,44.  99,88,44  ('', '99,88,44', '', '', '', '', '', '', '')


string: '00,000,00.587000  77,98,23,45.,  this,that '
00,000,00.587000  0.587  ('', '', '', '.587', '', '', '', '', '')
77,98,23,45.  77,98,23,45  ('', '77,98,23,45', '', '', '', '', '', '', '')


string: '  ,111  145.20  +9,9,9  0012800  .,,.  1  100,000 '
,111  111  ('', '111', '', '', '', '', '', '', '')
145.20  145.2  ('', '', '', '', '145.2', '', '', '', '')
+9,9,9  +9,9,9  ('+', '9,9,9', '', '', '', '', '', '', '')
0012800  12800  ('', '12800', '', '', '', '', '', '', '')
1  1  ('', '1', '', '', '', '', '', '', '')
100,000  100,000  ('', '100,000', '', '', '', '', '', '', '')


string: '1,1,1.111  000,001.111   -999.  0.  111.110000  1.1.1.111  9.909,888'
1,1,1.111  1,1,1.111  ('', '', '', '', '1,1,1.111', '', '', '', '')
000,001.111  1.111  ('', '', '', '', '1.111', '', '', '', '')
-999.  -999  ('-', '999', '', '', '', '', '', '', '')
0.  0  ('', '0', '', '', '', '', '', '', '')
111.110000  111.11  ('', '', '', '', '111.11', '', '', '', '')

下面是另一种结构,它从最简单的数字格式开始,然后以一种不重叠的方式逐渐增加更复杂的数字格式:

Java regep:

(\d)|([1-9]\d+)|(\.\d+)|(\d\.\d*)|([1-9]\d+\.\d*)|([1-9]\d{0,2}(,\d{3})+(\.\d*)?)

作为 Java 字符串(注意转义到和所需的额外内容。从那以后。在 regexp 中具有特殊含义) :

String myregexp="(\\d)|([1-9]\\d+)|(\\.\\d+)|(\\d\\.\\d*)|([1-9]\\d+\\.\\d*)|([1-9]\\d{0,2}(,\\d{3})+(\\.\\d*)?)";

说明:

  1. 这个 regexp 具有 A | B | C | D | E | F 的形式,其中 A,B,C,D,E,F 本身是不重叠的 regexp。一般来说,我发现从最简单的匹配开始比较容易。 如果 A 没有匹配你想要的,那么创建一个 B,这是对 A 的一个小小的修改,并且包含了更多你想要的东西。然后,在 B 的基础上,创建一个捕获更多的 C,等等。我还发现创建不重叠的 regexp 更容易; 理解一个包含20个与 OR 连接的简单不重叠 regexp 的 regexp 比理解几个具有更复杂匹配的 regexp 更容易。但是,每个人都有自己的!

  2. A 是(d) ,正好匹配0,1,2,3,4,5,6,7,8,9中的一个,这不能再简单了!

  3. B 是([1-9] d +) ,只匹配两位或更多位数字的数字,第一位不包括0。B 完全匹配10,11,12中的一个,... B 不与 A 重叠,而是对 A 的一个小修改。

  4. C 是(。D +) ,并且只匹配后跟一个或多个数字的小数。C 完全匹配.0.1.2.3.4.5.6.7.8.9.00.01.02中的一个..。23000... C 允许右边的尾随爱神,我更喜欢: 如果这是测量数据,尾随零的数量表示精度水平。如果您不希望右边的尾随零,请更改(。D +)至(。D * [1-9]) ,但这也排除了我认为应该允许的0。C 也是 A 的一个小修改。

  5. D is (\d.\d*) which is A plus decimals with trailing zeros on the right. D only matches a single digit, followed by a decimal, followed by zero or more digits. D matches 0. 0.0 0.1 0.2 ....0.01000...9. 9.0 9.1..0.0230000 .... 9.9999999999... If you want to exclude "0." then change D to (\d.\d+). If you want to exclude trailing zeros on the right, change D to (\d.\d*[1-9]) but this excludes 2.0 which I think should be included. D does not overlap A,B,or C.

  6. E 是([1-9] d + 。D *)是 B 加小数,后面的零在右边。如果你想排除“13”的话例如,然后将 E 改为([1-9] d + 。D +).E 不重叠 A,B,C 或 D.E 匹配10。10.010.0100... . 99.999999999999... 后面的零可以按4处理。五。

  7. F 是([1-9] d {0,2}(,d {3}) + (。*) ?)而且只匹配逗号和小数的数字,可能允许后面的零在右边。第一组([1-9] d {0,2})匹配零之后的非零数字、一个或两个以上数字。第二组(,d {3}) + 匹配一个4个字符组(逗号后跟正好3个数字) ,这个组可以匹配一次或多次(没有匹配就意味着没有逗号!).最后,。*) ?什么都不匹配,也不匹配。或者匹配一个小数。后面跟着任意数字,可能一个也没有。再次排除“1,111”这样的数字,更改(。D *)至(。D +).后面的零可以像4中那样处理。或5。F 不与 A、 B、 C、 D 或 E 重叠,我想不出更简单的正则表达式。

让我知道,如果你感兴趣,我可以编辑以上,以处理右边的尾随零所需要的。

Here is what matches regexp and what does not:

0
1
02 <- invalid
20
22
003 <- invalid
030 <- invalid
300
033 <- invalid
303
330
333
0004 <- invalid
0040 <- invalid
0400 <- invalid
4000
0044 <- invalid
0404 <- invalid
0440 <- invalid
4004
4040
4400
0444 <- invalid
4044
4404
4440
4444
00005 <- invalid
00050 <- invalid
00500 <- invalid
05000 <- invalid
50000
00055 <- invalid
00505 <- invalid
00550 <- invalid
05050 <- invalid
05500 <- invalid
50500
55000
00555 <- invalid
05055 <- invalid
05505 <- invalid
05550 <- invalid
50550
55050
55500
. <- invalid
.. <- invalid
.0
0.
.1
1.
.00
0.0
00. <- invalid
.02
0.2
02. <- invalid
.20
2.0
20.
.22
2.2
22.
.000
0.00
00.0 <- invalid
000. <- invalid
.003
0.03
00.3 <- invalid
003. <- invalid
.030
0.30
03.0 <- invalid
030. <- invalid
.033
0.33
03.3 <- invalid
033. <- invalid
.303
3.03
30.3
303.
.333
3.33
33.3
333.
.0000
0.000
00.00 <- invalid
000.0 <- invalid
0000. <- invalid
.0004
0.0004
00.04 <- invalid
000.4 <- invalid
0004. <- invalid
.0044
0.044
00.44 <- invalid
004.4 <- invalid
0044. <- invalid
.0404
0.404
04.04 <- invalid
040.4 <- invalid
0404. <- invalid
.0444
0.444
04.44 <- invalid
044.4 <- invalid
0444. <- invalid
.4444
4.444
44.44
444.4
4444.
.00000
0.0000
00.000 <- invalid
000.00 <- invalid
0000.0 <- invalid
00000. <- invalid
.00005
0.0005
00.005 <- invalid
000.05 <- invalid
0000.5 <- invalid
00005. <- invalid
.00055
0.0055
00.055 <- invalid
000.55 <- invalid
0005.5 <- invalid
00055. <- invalid
.00505
0.0505
00.505 <- invalid
005.05 <- invalid
0050.5 <- invalid
00505. <- invalid
.00550
0.0550
00.550 <- invalid
005.50 <- invalid
0055.0 <- invalid
00550. <- invalid
.05050
0.5050
05.050 <- invalid
050.50 <- invalid
0505.0 <- invalid
05050. <- invalid
.05500
0.5500
05.500 <- invalid
055.00 <- invalid
0550.0 <- invalid
05500. <- invalid
.50500
5.0500
50.500
505.00
5050.0
50500.
.55000
5.5000
55.000
550.00
5500.0
55000.
.00555
0.0555
00.555 <- invalid
005.55 <- invalid
0055.5 <- invalid
00555. <- invalid
.05055
0.5055
05.055 <- invalid
050.55 <- invalid
0505.5 <- invalid
05055. <- invalid
.05505
0.5505
05.505 <- invalid
055.05 <- invalid
0550.5 <- invalid
05505. <- invalid
.05550
0.5550
05.550 <- invalid
055.50 <- invalid
0555.0 <- invalid
05550. <- invalid
.50550
5.0550
50.550
505.50
5055.0
50550.
.55050
5.5050
55.050
550.50
5505.0
55050.
.55500
5.5500
55.500
555.00
5550.0
55500.
.05555
0.5555
05.555 <- invalid
055.55 <- invalid
0555.5 <- invalid
05555. <- invalid
.50555
5.0555
50.555
505.55
5055.5
50555.
.55055
5.5055
55.055
550.55
5505.5
55055.
.55505
5.5505
55.505
555.05
5550.5
55505.
.55550
5.5550
55.550
555.50
5555.0
55550.
.55555
5.5555
55.555
555.55
5555.5
55555.
, <- invalid
,, <- invalid
1, <- invalid
,1 <- invalid
22, <- invalid
2,2 <- invalid
,22 <- invalid
2,2, <- invalid
2,2, <- invalid
,22, <- invalid
333, <- invalid
33,3 <- invalid
3,33 <- invalid
,333 <- invalid
3,33, <- invalid
3,3,3 <- invalid
3,,33 <- invalid
,,333 <- invalid
4444, <- invalid
444,4 <- invalid
44,44 <- invalid
4,444
,4444 <- invalid
55555, <- invalid
5555,5 <- invalid
555,55 <- invalid
55,555
5,5555 <- invalid
,55555 <- invalid
666666, <- invalid
66666,6 <- invalid
6666,66 <- invalid
666,666
66,6666 <- invalid
6,66666 <- invalid
66,66,66 <- invalid
6,66,666 <- invalid
,666,666 <- invalid
1,111.
1,111.11
1,111.110
01,111.110 <- invalid
0,111.100 <- invalid
11,11. <- invalid
1,111,.11 <- invalid
1111.1,10 <- invalid
01111.11,0 <- invalid
0111.100, <- invalid
1,111,111.
1,111,111.11
1,111,111.110
01,111,111.110 <- invalid
0,111,111.100 <- invalid
1,111,111.
1,1111,11.11 <- invalid
11,111,11.110 <- invalid
01,11,1111.110 <- invalid
0,111111.100 <- invalid
0002,22.2230 <- invalid
.,5.,., <- invalid
2.0,345,345 <- invalid
2.334.456 <- invalid

这个正则表达式:

(\d{1,3},\d{3}(,\d{3})*)(\.\d*)?|\d+\.?\d*

匹配字符串中的每个数字:

11.00.11.0011.0001.000,0001000.11.000.11,323,444,0001,9991,222,455,666.01,244

下面是正则表达式:

(?:\d+)((\d{1,3})*([\,\ ]\d{3})*)(\.\d+)?

接受数字:

  • 没有空格和/或小数,例如 123456789123.123
  • 以逗号或空格作为数千个分隔符和/或小数,例如: 123 456 789123 456 789.100123,4563,232,300,000.00

Tests: http://regexr.com/3h1a2

\b\d+,

B —— > 单词界限

D + —— > 1或数字

,—— > 包含逗号,

例如:

70,000

5,44,43435.7788,44555

它将与之匹配:

70,

5,

44,

,44

(,*[\d]+,*[\d]*)+

这将匹配任何小数或大数,如下面有或没有逗号

1
100
1,262
1,56,262
10,78,999
12,34,56,789

或者

1
100
1262
156262
1078999
123456789

我的回答是:

(\d+(,?.?))*