序数替换

我目前正在寻找方法来替换像第一,第二,第三,... 与适当的序数表示(第一,第二,第三)。 上周我一直在谷歌搜索,没有发现任何有用的标准工具或 NLTK 的任何功能。

那么,是否应该手动编写一些正则表达式呢?

谢谢你的建议

80545 次浏览

公认的 先前的一个问题答案有一个算法可以解决这个问题的一半: 它将 "first"转换成 1。要从那里到 "1st",可以这样做:

suffixes = ["th", "st", "nd", "rd", ] + ["th"] * 16
suffixed_num = str(num) + suffixes[num % 100]

这只对0-19号有效。

我想为我的一个项目使用序数,经过几个原型,我认为这种方法虽然不小将工作的任何正整数,是 任意整数

它通过确定数字是大于还是小于20,如果数字小于20,它将把 int 1转换成字符串1,2,2; 3,3; 其余的将有“ st”添加到它。

对于超过20的数字,它将采取最后一位和第二位到最后一位数字,我已经分别称为十和单位,并测试他们,看看要添加到数字。

顺便说一下,这是在 python 中,所以我不确定其他语言是否能够找到字符串的最后一位或倒数第二位,如果他们这样做的话,应该很容易翻译。

def o(numb):
if numb < 20: #determining suffix for < 20
if numb == 1:
suffix = 'st'
elif numb == 2:
suffix = 'nd'
elif numb == 3:
suffix = 'rd'
else:
suffix = 'th'
else:   #determining suffix for > 20
tens = str(numb)
tens = tens[-2]
unit = str(numb)
unit = unit[-1]
if tens == "1":
suffix = "th"
else:
if unit == "1":
suffix = 'st'
elif unit == "2":
suffix = 'nd'
elif unit == "3":
suffix = 'rd'
else:
suffix = 'th'
return str(numb)+ suffix

为了方便使用,我调用了函数“ o”,并且可以通过导入文件名来调用,我通过导入序号然后调用 ordinal.o (number)来调用我称之为“ ordinal”的文件名。

让我知道你的想法

数字解析器可以解析序数字(“第一”、“第二”等)到整数。

from number_parser import parse_ordinal
n = parse_ordinal("first")

要将整数转换为“1”、“2”等,可以使用以下内容(取自 Gareth 负责暗码高尔夫) :

ordinal = lambda n: "%d%s" % (n,"tsnrhtdd"[(n//10%10!=1)*(n%10<4)*n%10::4])

这对任何数字都适用:

print([ordinal(n) for n in range(1,32)])


['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th',
'11th', '12th', '13th', '14th', '15th', '16th', '17th', '18th', '19th',
'20th', '21st', '22nd', '23rd', '24th', '25th', '26th', '27th', '28th',
'29th', '30th', '31st']

我发现自己也在做类似的事情,需要将带有序数的地址(‘ Third St’)转换为地理编码器可以理解的格式(‘3rd St’)。虽然这不是非常优雅,但是一个快速而肮脏的解决方案是使用 Inflect.py生成一个字典进行翻译。

Py 有一个 number_to_words()函数,它将把一个数字(例如 2)转换成它的单词形式(例如 'two')。此外,还有一个 ordinal()函数,它可以接受任何数字(数字或单词形式)并将其转换为序数形式(例如,4-> fourthsix-> sixth)。这两种方法都不能单独完成您所需要的工作,但是您可以使用它们一起生成一个字典,将所提供的任何序数单词(在合理范围内)翻译成各自的序数单词。看看吧:

>>> import inflect
>>> p = inflect.engine()
>>> word_to_number_mapping = {}
>>>
>>> for i in range(1, 100):
...     word_form = p.number_to_words(i)  # 1 -> 'one'
...     ordinal_word = p.ordinal(word_form)  # 'one' -> 'first'
...     ordinal_number = p.ordinal(i)  # 1 -> '1st'
...     word_to_number_mapping[ordinal_word] = ordinal_number  # 'first': '1st'
...
>>> print word_to_number_mapping['sixth']
6th
>>> print word_to_number_mapping['eleventh']
11th
>>> print word_to_number_mapping['forty-third']
43rd

如果您愿意花一些时间,那么可以检查这两个函数的内部工作原理,并构建自己的代码来动态地完成这项工作(我还没有尝试过这样做)。

这样吧:

suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n%100 if (n%100)<20 else n%10,"th"))
print [suf(n) for n in xrange(1,32)]


['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th',
'11th', '12th', '13th', '14th', '15th', '16th', '17th', '18th', '19th',
'20th', '21st', '22nd', '23rd', '24th', '25th', '26th', '27th', '28th',
'29th', '30th', '31st']

这里有一个更复杂的解决方案,我刚刚写的,考虑到复合序数。所以它从 first一直工作到 nine hundred and ninety ninth。我需要它来将字符串街道名称转换为数字序号:

import re
from collections import OrderedDict


ONETHS = {
'first': '1ST', 'second': '2ND', 'third': '3RD', 'fourth': '4TH', 'fifth': '5TH', 'sixth': '6TH', 'seventh': '7TH',
'eighth': '8TH', 'ninth': '9TH'
}


TEENTHS = {
'tenth': '10TH', 'eleventh': '11TH', 'twelfth': '12TH', 'thirteenth': '13TH',
'fourteenth': '14TH', 'fifteenth': '15TH', 'sixteenth': '16TH', 'seventeenth': '17TH', 'eighteenth': '18TH',
'nineteenth': '19TH'
}


TENTHS = {
'twentieth': '20TH', 'thirtieth': '30TH', 'fortieth': '40TH', 'fiftieth': '50TH', 'sixtieth': '60TH',
'seventieth': '70TH', 'eightieth': '80TH', 'ninetieth': '90TH',
}


HUNDREDTH = {'hundredth': '100TH'}  # HUNDREDTH not s


ONES = {'one': '1', 'two': '2', 'three': '3', 'four': '4', 'five': '5', 'six': '6', 'seven': '7', 'eight': '8',
'nine': '9'}


TENS = {'twenty': '20', 'thirty': '30', 'forty': '40', 'fifty': '50', 'sixty': '60', 'seventy': '70', 'eighty': '80',
'ninety': '90'}


HUNDRED = {'hundred': '100'}


# Used below for ALL_ORDINALS
ALL_THS = {}
ALL_THS.update(ONETHS)
ALL_THS.update(TEENTHS)
ALL_THS.update(TENTHS)
ALL_THS.update(HUNDREDTH)


ALL_ORDINALS = OrderedDict()
ALL_ORDINALS.update(ALL_THS)
ALL_ORDINALS.update(TENS)
ALL_ORDINALS.update(HUNDRED)
ALL_ORDINALS.update(ONES)




def split_ordinal_word(word):
ordinals = []
if not word:
return ordinals


for key, value in ALL_ORDINALS.items():
if word.startswith(key):
ordinals.append(key)
ordinals += split_ordinal_word(word[len(key):])
break
return ordinals


def get_ordinals(s):
ordinals, start, end = [], [], []
s = s.strip().replace('-', ' ').replace('and', '').lower()
s = re.sub(' +',' ', s)  # Replace multiple spaces with a single space
s = s.split(' ')


for word in s:
found_ordinals = split_ordinal_word(word)
if found_ordinals:
ordinals += found_ordinals
else:  # else if word, for covering blanks
if ordinals:  # Already have some ordinals
end.append(word)
else:
start.append(word)
return start, ordinals, end




def detect_ordinal_pattern(ordinals):
ordinal_length = len(ordinals)
ordinal_string = '' # ' '.join(ordinals)
if ordinal_length == 1:
ordinal_string = ALL_ORDINALS[ordinals[0]]
elif ordinal_length == 2:
if ordinals[0] in ONES.keys() and ordinals[1] in HUNDREDTH.keys():
ordinal_string = ONES[ordinals[0]] + '00TH'
elif ordinals[0] in HUNDRED.keys() and ordinals[1] in ONETHS.keys():
ordinal_string = HUNDRED[ordinals[0]][:-1] + ONETHS[ordinals[1]]
elif ordinals[0] in TENS.keys() and ordinals[1] in ONETHS.keys():
ordinal_string = TENS[ordinals[0]][0] + ONETHS[ordinals[1]]
elif ordinal_length == 3:
if ordinals[0] in HUNDRED.keys() and ordinals[1] in TENS.keys() and ordinals[2] in ONETHS.keys():
ordinal_string = HUNDRED[ordinals[0]][0] + TENS[ordinals[1]][0] + ONETHS[ordinals[2]]
elif ordinals[0] in ONES.keys() and ordinals[1] in HUNDRED.keys() and ordinals[2] in ALL_THS.keys():
ordinal_string =  ONES[ordinals[0]] + ALL_THS[ordinals[2]]
elif ordinal_length == 4:
if ordinals[0] in ONES.keys() and ordinals[1] in HUNDRED.keys() and ordinals[2] in TENS.keys() and \
ordinals[3] in ONETHS.keys():
ordinal_string = ONES[ordinals[0]] + TENS[ordinals[2]][0] + ONETHS[ordinals[3]]


return ordinal_string

下面是一些使用示例:

# s = '32 one   hundred and forty-third st toronto, on'
#s = '32 forty-third st toronto, on'
#s = '32 one-hundredth st toronto, on'
#s = '32 hundred and third st toronto, on'
#s = '32 hundred and thirty first st toronto, on'
# s = '32 nine hundred and twenty third st toronto, on'
#s = '32 nine hundred and ninety ninth st toronto, on'
s = '32 sixty sixth toronto, on'


st, ords, en = get_ordinals(s)
print st, detect_ordinal_pattern(ords), en

这个函数对每个数字 N都很有效。如果 N为负值,则将其转换为正值。如果 N不是整数,则将其转换为整数。

def ordinal( n ):


suffix = ['th', 'st', 'nd', 'rd', 'th', 'th', 'th', 'th', 'th', 'th']


if n < 0:
n *= -1


n = int(n)


if n % 100 in (11,12,13):
s = 'th'
else:
s = suffix[n % 10]


return str(n) + s

如果使用 django,您可以:

from django.contrib.humanize.templatetags.humanize import ordinal
var = ordinal(number)

(或者在 django 模板中使用 ordinal 作为模板过滤器,不过在 Python 代码中这样调用也可以)

如果不使用 django 你可以窃取 实施,这是非常整洁。

我向 Gareth 的 Lambda 准则致敬。太优雅了。不过我还是不太明白它是怎么运作的。所以我试图解构它,想出了这个:

def ordinal(integer):


int_to_string = str(integer)


if int_to_string == '1' or int_to_string == '-1':
print int_to_string+'st'
return int_to_string+'st';
elif int_to_string == '2' or int_to_string == '-2':
print int_to_string+'nd'
return int_to_string+'nd';
elif int_to_string == '3' or int_to_string == '-3':
print int_to_string+'rd'
return int_to_string+'rd';


elif int_to_string[-1] == '1' and int_to_string[-2] != '1':
print int_to_string+'st'
return int_to_string+'st';
elif int_to_string[-1] == '2' and int_to_string[-2] != '1':
print int_to_string+'nd'
return int_to_string+'nd';
elif int_to_string[-1] == '3' and int_to_string[-2] != '1':
print int_to_string+'rd'
return int_to_string+'rd';


else:
print int_to_string+'th'
return int_to_string+'th';




>>> print [ordinal(n) for n in range(1,25)]
1st
2nd
3rd
4th
5th
6th
7th
8th
9th
10th
11th
12th
13th
14th
15th
16th
17th
18th
19th
20th
21st
22nd
23rd
24th
['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th',
'11th', '12th', '13th', '14th', '15th', '16th', '17th', '18th', '19th',
'20th', '21st', '22nd', '23rd', '24th']

使用 Modern. format ()表示的 Gareth 代码

ordinal = lambda n: "{}{}".format(n,"tsnrhtdd"[(n/10%10!=1)*(n%10<4)*n%10::4])

如果您不想导入一个外部模块,而更喜欢一行的解决方案,那么下面的内容可能(略微)比可接受的答案更易读:

def suffix(i):
return {1:"st", 2:"nd", 3:"rd"}.get(i%10*(i%100 not in [11,12,13]), "th"))

它使用字典 .get,正如 https://codereview.stackexchange.com/a/41300/90593https://stackoverflow.com/a/36977549/5069869所建议的那样。

我使用带有布尔值的乘法来处理特殊情况(11、12、13) ,而不必启动 if 块。如果条件 (i%100 not in [11,12,13])的计算结果为 False,则整个数字为0,并且我们得到默认的‘ th’情况。

另一个解决方案是 num2words库(Pip | Github)。 它特别提供 不同的语言,因此本地化/国际化(又名.l10n/i18n)是显而易见的。

安装 pip install num2words后使用方便:

from num2words import num2words
# english is default
num2words(4458, to="ordinal_num")
'4458th'


# examples for other languages
num2words(4458, lang="en", to="ordinal_num")
'4458th'


num2words(4458, lang="es", to="ordinal_num")
'4458º'


num2words(4458, lang="de", to="ordinal_num")
'4458.'


num2words(4458, lang="id", to="ordinal_num")
'ke-4458'

意外收获:

num2words(4458, lang="en", to="ordinal")
'four thousand, four hundred and fifty-eighth'

这可以处理任何长度的数字,例外的... # 11到... # 13和负整数。

def ith(i):return(('th'*(10<(abs(i)%100)<14))+['st','nd','rd',*['th']*7][(abs(i)-1)%10])[0:2]

我建议使用 ith ()作为名称,以避免覆盖内置 ord ()。

# test routine
for i in range(-200,200):
print(i,ith(i))

注意: 使用 Python 3.6进行了测试; abs ()函数可用,但没有显式地包含一个数学模块。

这是一个使用 num2words 包的替代选项。

>>> from num2words import num2words
>>> num2words(42, to='ordinal_num')
'42nd'

如果你不想引入对外部库的附加依赖(比如 由 Lucky Donald 建议的) ,但又不想让代码的未来维护者纠缠你并杀死你(因为你在生产环境中使用了 高尔夫密码) ,那么这里有一个简短但可维护的变体:

def make_ordinal(n):
'''
Convert an integer into its ordinal representation::


make_ordinal(0)   => '0th'
make_ordinal(3)   => '3rd'
make_ordinal(122) => '122nd'
make_ordinal(213) => '213th'
'''
n = int(n)
if 11 <= (n % 100) <= 13:
suffix = 'th'
else:
suffix = ['th', 'st', 'nd', 'rd', 'th'][min(n % 10, 4)]
return str(n) + suffix

试试这个

import sys


a = int(sys.argv[1])


for i in range(1,a+1):


j = i
if(j%100 == 11 or j%100 == 12 or j%100 == 13):
print("%dth Hello"%(j))
continue
i %= 10
if ((j%10 == 1) and ((i%10 != 0) or (i%10 != 1))):
print("%dst Hello"%(j))
elif ((j%10 == 2) and ((i%10 != 0) or (i%10 != 1))):
print("%dnd Hello"%(j))
elif ((j%10 == 3) and ((i%10 != 0) or (i%10 != 1))):
print("%drd Hello"%(j))
else:
print("%dth Hello"%(j))

人性化中有一个序数函数

pip install humanize

>>> [(x, humanize.ordinal(x)) for x in (1, 2, 3, 4, 20, 21, 22, 23, 24, 100, 101,
...                                     102, 103, 113, -1, 0, 1.2, 13.6)]
[(1, '1st'), (2, '2nd'), (3, '3rd'), (4, '4th'), (20, '20th'), (21, '21st'),
(22, '22nd'), (23, '23rd'), (24, '24th'), (100, '100th'), (101, '101st'),
(102, '102nd'), (103, '103rd'), (113, '113th'), (-1, '-1th'), (0, '0th'),
(1.2, '1st'), (13.6, '13th')]


导入 人性化模块并使用 顺序函数。

import humanize
humanize.ordinal(4)

输出

>>> '4th'