有办法把数字转换成整数吗?

我需要将 one转换成 1two转换成 2等等。

有没有一种方法可以用一个库或一个类或任何东西来实现这一点?

139667 次浏览

Here's the trivial case approach:

>>> number = {'one':1,
...           'two':2,
...           'three':3,}
>>>
>>> number['two']
2

Or are you looking for something that can handle "twelve thousand, one hundred seventy-two"?

This could be easily be hardcoded into a dictionary if there's a limited amount of numbers you'd like to parse.

For slightly more complex cases, you'll probably want to generate this dictionary automatically, based on the relatively simple numbers grammar. Something along the lines of this (of course, generalized...)

for i in range(10):
myDict[30 + i] = "thirty-" + singleDigitsDict[i]

If you need something more extensive, then it looks like you'll need natural language processing tools. This article might be a good starting point.

The majority of this code is to set up the numwords dict, which is only done on the first call.

def text2int(textnum, numwords={}):
if not numwords:
units = [
"zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
"nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
"sixteen", "seventeen", "eighteen", "nineteen",
]


tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]


scales = ["hundred", "thousand", "million", "billion", "trillion"]


numwords["and"] = (1, 0)
for idx, word in enumerate(units):    numwords[word] = (1, idx)
for idx, word in enumerate(tens):     numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales):   numwords[word] = (10 ** (idx * 3 or 2), 0)


current = result = 0
for word in textnum.split():
if word not in numwords:
raise Exception("Illegal word: " + word)


scale, increment = numwords[word]
current = current * scale + increment
if scale > 100:
result += current
current = 0


return result + current


print text2int("seven billion one hundred million thirty one thousand three hundred thirty seven")
#7100031337

I needed to handle a couple extra parsing cases, such as ordinal words ("first", "second"), hyphenated words ("one-hundred"), and hyphenated ordinal words like ("fifty-seventh"), so I added a couple lines:

def text2int(textnum, numwords={}):
if not numwords:
units = [
"zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
"nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
"sixteen", "seventeen", "eighteen", "nineteen",
]


tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]


scales = ["hundred", "thousand", "million", "billion", "trillion"]


numwords["and"] = (1, 0)
for idx, word in enumerate(units):  numwords[word] = (1, idx)
for idx, word in enumerate(tens):       numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)


ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
ordinal_endings = [('ieth', 'y'), ('th', '')]


textnum = textnum.replace('-', ' ')


current = result = 0
for word in textnum.split():
if word in ordinal_words:
scale, increment = (1, ordinal_words[word])
else:
for ending, replacement in ordinal_endings:
if word.endswith(ending):
word = "%s%s" % (word[:-len(ending)], replacement)


if word not in numwords:
raise Exception("Illegal word: " + word)


scale, increment = numwords[word]
        

current = current * scale + increment
if scale > 100:
result += current
current = 0


return result + current`

Made change so that text2int(scale) will return correct conversion. Eg, text2int("hundred") => 100.

import re


numwords = {}




def text2int(textnum):


if not numwords:


units = [ "zero", "one", "two", "three", "four", "five", "six",
"seven", "eight", "nine", "ten", "eleven", "twelve",
"thirteen", "fourteen", "fifteen", "sixteen", "seventeen",
"eighteen", "nineteen"]


tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty",
"seventy", "eighty", "ninety"]


scales = ["hundred", "thousand", "million", "billion", "trillion",
'quadrillion', 'quintillion', 'sexillion', 'septillion',
'octillion', 'nonillion', 'decillion' ]


numwords["and"] = (1, 0)
for idx, word in enumerate(units): numwords[word] = (1, idx)
for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)


ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5,
'eighth':8, 'ninth':9, 'twelfth':12}
ordinal_endings = [('ieth', 'y'), ('th', '')]
current = result = 0
tokens = re.split(r"[\s-]+", textnum)
for word in tokens:
if word in ordinal_words:
scale, increment = (1, ordinal_words[word])
else:
for ending, replacement in ordinal_endings:
if word.endswith(ending):
word = "%s%s" % (word[:-len(ending)], replacement)


if word not in numwords:
raise Exception("Illegal word: " + word)


scale, increment = numwords[word]


if scale > 1:
current = max(1, current)


current = current * scale + increment
if scale > 100:
result += current
current = 0


return result + current

A quick solution is to use the inflect.py to generate a dictionary for translation.

inflect.py has a number_to_words() function, that will turn a number (e.g. 2) to it's word form (e.g. 'two'). Unfortunately, its reverse (which would allow you to avoid the translation dictionary route) isn't offered. All the same, you can use that function to build the translation dictionary:

>>> import inflect
>>> p = inflect.engine()
>>> word_to_number_mapping = {}
>>>
>>> for i in range(1, 100):
...     word_form = p.number_to_words(i)  # 1 -> 'one'
...     word_to_number_mapping[word_form] = i
...
>>> print word_to_number_mapping['one']
1
>>> print word_to_number_mapping['eleven']
11
>>> print word_to_number_mapping['forty-three']
43

If you're willing to commit some time, it might be possible to examine inflect.py's inner-workings of the number_to_words() function and build your own code to do this dynamically (I haven't tried to do this).

There's a ruby gem by Marc Burns that does it. I recently forked it to add support for years. You can call ruby code from python.

  require 'numbers_in_words'
require 'numbers_in_words/duck_punch'


nums = ["fifteen sixteen", "eighty five sixteen",  "nineteen ninety six",
"one hundred and seventy nine", "thirteen hundred", "nine thousand two hundred and ninety seven"]
nums.each {|n| p n; p n.in_numbers}

results:
"fifteen sixteen" 1516 "eighty five sixteen" 8516 "nineteen ninety six" 1996 "one hundred and seventy nine" 179 "thirteen hundred" 1300 "nine thousand two hundred and ninety seven" 9297

I have just released a python module to PyPI called word2number for the exact purpose. https://github.com/akshaynagpal/w2n

Install it using:

pip install word2number

make sure your pip is updated to the latest version.

Usage:

from word2number import w2n


print w2n.word_to_num("two million three thousand nine hundred and eighty four")
2003984

If anyone is interested, I hacked up a version that maintains the rest of the string (though it may have bugs, haven't tested it too much).

def text2int (textnum, numwords={}):
if not numwords:
units = [
"zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
"nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
"sixteen", "seventeen", "eighteen", "nineteen",
]


tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]


scales = ["hundred", "thousand", "million", "billion", "trillion"]


numwords["and"] = (1, 0)
for idx, word in enumerate(units):  numwords[word] = (1, idx)
for idx, word in enumerate(tens):       numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)


ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
ordinal_endings = [('ieth', 'y'), ('th', '')]


textnum = textnum.replace('-', ' ')


current = result = 0
curstring = ""
onnumber = False
for word in textnum.split():
if word in ordinal_words:
scale, increment = (1, ordinal_words[word])
current = current * scale + increment
if scale > 100:
result += current
current = 0
onnumber = True
else:
for ending, replacement in ordinal_endings:
if word.endswith(ending):
word = "%s%s" % (word[:-len(ending)], replacement)


if word not in numwords:
if onnumber:
curstring += repr(result + current) + " "
curstring += word + " "
result = current = 0
onnumber = False
else:
scale, increment = numwords[word]


current = current * scale + increment
if scale > 100:
result += current
current = 0
onnumber = True


if onnumber:
curstring += repr(result + current)


return curstring

Example:

 >>> text2int("I want fifty five hot dogs for two hundred dollars.")
I want 55 hot dogs for 200 dollars.

There could be issues if you have, say, "$200". But, this was really rough.

This code works only for numbers below 99. Both word to int and int to word (for rest need to implement 10-20 lines of code and simple logic. This is just simple code for beginners):

num = input("Enter the number you want to convert : ")
mydict = {'1': 'One', '2': 'Two', '3': 'Three', '4': 'Four', '5': 'Five','6': 'Six', '7': 'Seven', '8': 'Eight', '9': 'Nine', '10': 'Ten','11': 'Eleven', '12': 'Twelve', '13': 'Thirteen', '14': 'Fourteen', '15': 'Fifteen', '16': 'Sixteen', '17': 'Seventeen', '18': 'Eighteen', '19': 'Nineteen'}
mydict2 = ['', '', 'Twenty', 'Thirty', 'Fourty', 'fifty', 'sixty', 'Seventy', 'Eighty', 'Ninty']


if num.isdigit():
if(int(num) < 20):
print(" :---> " + mydict[num])
else:
var1 = int(num) % 10
var2 = int(num) / 10
print(" :---> " + mydict2[int(var2)] + mydict[str(var1)])
else:
num = num.lower()
dict_w = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10, 'eleven': 11, 'twelve': 12, 'thirteen': 13, 'fourteen': 14, 'fifteen': 15, 'sixteen': 16, 'seventeen': '17', 'eighteen': '18', 'nineteen': '19'}
mydict2 = ['', '', 'twenty', 'thirty', 'fourty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninty']
divide = num[num.find("ty")+2:]
if num:
if(num in dict_w.keys()):
print(" :---> " + str(dict_w[num]))
elif divide == '' :
for i in range(0, len(mydict2)-1):
if mydict2[i] == num:
print(" :---> " + str(i * 10))
else :
str3 = 0
str1 = num[num.find("ty")+2:]
str2 = num[:-len(str1)]
for i in range(0, len(mydict2)):
if mydict2[i] == str2:
str3 = i
if str2 not in mydict2:
print("----->Invalid Input<-----")
else:
try:
print(" :---> " + str((str3*10) + dict_w[str1]))
except:
print("----->Invalid Input<-----")
else:
print("----->Please Enter Input<-----")

I needed something a bit different since my input is from a speech-to-text conversion and the solution is not always to sum the numbers. For example, "my zipcode is one two three four five" should not convert to "my zipcode is 15".

I took Andrew's answer and tweaked it to handle a few other cases people highlighted as errors, and also added support for examples like the zipcode one I mentioned above. Some basic test cases are shown below, but I'm sure there is still room for improvement.

def is_number(x):
if type(x) == str:
x = x.replace(',', '')
try:
float(x)
except:
return False
return True


def text2int (textnum, numwords={}):
units = [
'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight',
'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen',
'sixteen', 'seventeen', 'eighteen', 'nineteen',
]
tens = ['', '', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety']
scales = ['hundred', 'thousand', 'million', 'billion', 'trillion']
ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
ordinal_endings = [('ieth', 'y'), ('th', '')]


if not numwords:
numwords['and'] = (1, 0)
for idx, word in enumerate(units): numwords[word] = (1, idx)
for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)


textnum = textnum.replace('-', ' ')


current = result = 0
curstring = ''
onnumber = False
lastunit = False
lastscale = False


def is_numword(x):
if is_number(x):
return True
if word in numwords:
return True
return False


def from_numword(x):
if is_number(x):
scale = 0
increment = int(x.replace(',', ''))
return scale, increment
return numwords[x]


for word in textnum.split():
if word in ordinal_words:
scale, increment = (1, ordinal_words[word])
current = current * scale + increment
if scale > 100:
result += current
current = 0
onnumber = True
lastunit = False
lastscale = False
else:
for ending, replacement in ordinal_endings:
if word.endswith(ending):
word = "%s%s" % (word[:-len(ending)], replacement)


if (not is_numword(word)) or (word == 'and' and not lastscale):
if onnumber:
# Flush the current number we are building
curstring += repr(result + current) + " "
curstring += word + " "
result = current = 0
onnumber = False
lastunit = False
lastscale = False
else:
scale, increment = from_numword(word)
onnumber = True


if lastunit and (word not in scales):
# Assume this is part of a string of individual numbers to
# be flushed, such as a zipcode "one two three four five"
curstring += repr(result + current)
result = current = 0


if scale > 1:
current = max(1, current)


current = current * scale + increment
if scale > 100:
result += current
current = 0


lastscale = False
lastunit = False
if word in scales:
lastscale = True
elif word in units:
lastunit = True


if onnumber:
curstring += repr(result + current)


return curstring

Some tests...

one two three -> 123
three forty five -> 345
three and forty five -> 3 and 45
three hundred and forty five -> 345
three hundred -> 300
twenty five hundred -> 2500
three thousand and six -> 3006
three thousand six -> 3006
nineteenth -> 19
twentieth -> 20
first -> 1
my zip is one two three four five -> my zip is 12345
nineteen ninety six -> 1996
fifty-seventh -> 57
one million -> 1000000
first hundred -> 100
I will buy the first thousand -> I will buy the 1000  # probably should leave ordinal in the string
thousand -> 1000
hundred and six -> 106
1 million -> 1000000

Make use of the Python package: WordToDigits

pip install wordtodigits

It can find numbers present in word form in a sentence and then convert them to the proper numeric format. Also takes care of the decimal part, if present. The word representation of numbers could be anywhere in the passage.

I took @recursive's logic and converted to Ruby. I've also hardcoded the lookup table so its not as cool but might help a newbie understand what is going on.

WORDNUMS = {"zero"=> [1,0], "one"=> [1,1], "two"=> [1,2], "three"=> [1,3],
"four"=> [1,4], "five"=> [1,5], "six"=> [1,6], "seven"=> [1,7],
"eight"=> [1,8], "nine"=> [1,9], "ten"=> [1,10],
"eleven"=> [1,11], "twelve"=> [1,12], "thirteen"=> [1,13],
"fourteen"=> [1,14], "fifteen"=> [1,15], "sixteen"=> [1,16],
"seventeen"=> [1,17], "eighteen"=> [1,18], "nineteen"=> [1,19],
"twenty"=> [1,20], "thirty" => [1,30], "forty" => [1,40],
"fifty" => [1,50], "sixty" => [1,60], "seventy" => [1,70],
"eighty" => [1,80], "ninety" => [1,90],
"hundred" => [100,0], "thousand" => [1000,0],
"million" => [1000000, 0]}


def text_2_int(string)
numberWords = string.gsub('-', ' ').split(/ /) - %w{and}
current = result = 0
numberWords.each do |word|
scale, increment = WORDNUMS[word]
current = current * scale + increment
if scale > 100
result += current
current = 0
end
end
return result + current
end

I was looking to handle strings like two thousand one hundred and forty-six

This code works for a series data:

import pandas as pd
mylist = pd.Series(['one','two','three'])
mylist1 = []
for x in range(len(mylist)):
mylist1.append(w2n.word_to_num(mylist[x]))
print(mylist1)
def parse_int(string):
ONES = {'zero': 0,
'one': 1,
'two': 2,
'three': 3,
'four': 4,
'five': 5,
'six': 6,
'seven': 7,
'eight': 8,
'nine': 9,
'ten': 10,
'eleven': 11,
'twelve': 12,
'thirteen': 13,
'fourteen': 14,
'fifteen': 15,
'sixteen': 16,
'seventeen': 17,
'eighteen': 18,
'nineteen': 19,
'twenty': 20,
'thirty': 30,
'forty': 40,
'fifty': 50,
'sixty': 60,
'seventy': 70,
'eighty': 80,
'ninety': 90,
}


numbers = []
for token in string.replace('-', ' ').split(' '):
if token in ONES:
numbers.append(ONES[token])
elif token == 'hundred':
numbers[-1] *= 100
elif token == 'thousand':
numbers = [x * 1000 for x in numbers]
elif token == 'million':
numbers = [x * 1000000 for x in numbers]
return sum(numbers)

Tested with 700 random numbers in range 1 to million works well.

This handles number in words of Indian style, some fractions, combination of numbers and words and also addition.

def words_to_number(words):
numbers = {"zero":0, "a":1, "half":0.5, "quarter":0.25, "one":1,"two":2,
"three":3, "four":4,"five":5,"six":6,"seven":7,"eight":8,
"nine":9, "ten":10,"eleven":11,"twelve":12, "thirteen":13,
"fourteen":14, "fifteen":15,"sixteen":16,"seventeen":17,
"eighteen":18,"nineteen":19, "twenty":20,"thirty":30, "forty":40,
"fifty":50,"sixty":60,"seventy":70, "eighty":80,"ninety":90}


groups = {"hundred":100, "thousand":1_000,
"lac":1_00_000, "lakh":1_00_000,
"million":1_000_000, "crore":10**7,
"billion":10**9, "trillion":10**12}
    

split_at = ["and", "plus"]
    

n = 0
skip = False
words_array = words.split(" ")
for i, word in enumerate(words_array):
if not skip:
if word in groups:
n*= groups[word]
elif word in numbers:
n += numbers[word]
elif word in split_at:
skip = True
remaining = ' '.join(words_array[i+1:])
n+=words_to_number(remaining)
else:
try:
n += float(word)
except ValueError as e:
raise ValueError(f"Invalid word {word}") from e
return n

TEST:

print(words_to_number("a million and one"))
>> 1000001


print(words_to_number("one crore and one"))
>> 1000,0001


print(words_to_number("0.5 million one"))
>> 500001.0


print(words_to_number("half million and one hundred"))
>> 500100.0


print(words_to_number("quarter"))
>> 0.25


print(words_to_number("one hundred plus one"))
>> 101

I find I faster way:

$ Da_Unità_a_Cifre = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10, 'eleven': 11, 'twelve': 12, 'thirteen': 13, 'fourteen': 14, 'fifteen': 15, 'sixteen': 16, 'seventeen': 17, 'eighteen': 18, 'nineteen': 19}

Da_Lettere_a_Decine = {"tw": 20, "th": 30, "fo": 40, "fi": 50, "si": 60, "se": 70, "ei": 80, "ni": 90, }

elemento = input(insert the word:)
Val_Num = 0
try:
elemento.lower()
elemento.strip()
Unità = elemento[elemento.find("ty")+2:] # è uguale alla str: five


if elemento[-1] == "y":
Val_Num = int(Da_Lettere_a_Decine[elemento[0] + elemento[1]])
print(Val_Num)
elif elemento == "onehundred":
Val_Num = 100
print(Val_Num)
else:
Cifre_Unità = int(Da_Unità_a_Cifre[Unità])
Cifre_Decine = int(Da_Lettere_a_Decine[elemento[0] + elemento[1]])
Val_Num = int(Cifre_Decine + Cifre_Unità)
print(Val_Num)
exept:
print("invalid input")