检查字符串是否有日期,任何格式

如何检查字符串是否可以解析为日期?

  • 1990年1月19日
  • 1990年1月19日
  • 1990年1月19日
  • 1990年1月19日
  • 90年1月19日
  • 1990年
  • 一九九零年一月
  • 1990年1月

这些都是有效日期。如果你担心第三项和上面最后一项之间缺少空格,如果需要的话,可以通过自动在字母/字符和数字之间插入空格来轻松解决。

但首先,最基本的是:

我试着把它放进 if statement:

if datetime.strptime(item, '%Y') or datetime.strptime(item, '%b %d %y') or datetime.strptime(item, '%b %d %Y')  or datetime.strptime(item, '%B %d %y') or datetime.strptime(item, '%B %d %Y'):

但是这是在 try-But 块中,并且不断返回类似这样的结果:

16343 time data 'JUNE1890' does not match format '%Y'

除非它满足 if语句中的第一个条件。

澄清一下,我实际上并不需要日期的值-我只是想知道它是否是。理想情况下,应该是这样的:

if item is date:
print date
else:
print "Not a date"

有什么办法吗?

119660 次浏览

The parse function in dateutils.parser is capable of parsing many date string formats to a datetime object.

If you simply want to know whether a particular string could represent or contain a valid date, you could try the following simple function:

from dateutil.parser import parse


def is_date(string, fuzzy=False):
"""
Return whether the string can be interpreted as a date.


:param string: str, string to check for date
:param fuzzy: bool, ignore unknown tokens in string if True
"""
try:
parse(string, fuzzy=fuzzy)
return True


except ValueError:
return False

Then you have:

>>> is_date("1990-12-1")
True
>>> is_date("2005/3")
True
>>> is_date("Jan 19, 1990")
True
>>> is_date("today is 2019-03-27")
False
>>> is_date("today is 2019-03-27", fuzzy=True)
True
>>> is_date("Monday at 12:01am")
True
>>> is_date("xyz_not_a_date")
False
>>> is_date("yesterday")
False

Custom parsing

parse might recognise some strings as dates which you don't want to treat as dates. For example:

  • Parsing "12" and "1999" will return a datetime object representing the current date with the day and year substituted for the number in the string

  • "23, 4" and "23 4" will be parsed as datetime.datetime(2023, 4, 16, 0, 0).

  • "Friday" will return the date of the nearest Friday in the future.
  • Similarly "August" corresponds to the current date with the month changed to August.

Also parse is not locale aware, so does not recognise months or days of the week in languages other than English.

Both of these issues can be addressed to some extent by using a custom parserinfo class, which defines how month and day names are recognised:

from dateutil.parser import parserinfo


class CustomParserInfo(parserinfo):


# three months in Spanish for illustration
MONTHS = [("Enero", "Enero"), ("Feb", "Febrero"), ("Marzo", "Marzo")]

An instance of this class can then be used with parse:

>>> parse("Enero 1990")
# ValueError: Unknown string format
>>> parse("Enero 1990", parserinfo=CustomParserInfo())
datetime.datetime(1990, 1, 27, 0, 0)

If you want to parse those particular formats, you can just match against a list of formats:

txt='''\
Jan 19, 1990
January 19, 1990
Jan 19,1990
01/19/1990
01/19/90
1990
Jan 1990
January1990'''


import datetime as dt


fmts = ('%Y','%b %d, %Y','%b %d, %Y','%B %d, %Y','%B %d %Y','%m/%d/%Y','%m/%d/%y','%b %Y','%B%Y','%b %d,%Y')


parsed=[]
for e in txt.splitlines():
for fmt in fmts:
try:
t = dt.datetime.strptime(e, fmt)
parsed.append((e, fmt, t))
break
except ValueError as err:
pass


# check that all the cases are handled
success={t[0] for t in parsed}
for e in txt.splitlines():
if e not in success:
print e


for t in parsed:
print '"{:20}" => "{:20}" => {}'.format(*t)

Prints:

"Jan 19, 1990        " => "%b %d, %Y           " => 1990-01-19 00:00:00
"January 19, 1990    " => "%B %d, %Y           " => 1990-01-19 00:00:00
"Jan 19,1990         " => "%b %d,%Y            " => 1990-01-19 00:00:00
"01/19/1990          " => "%m/%d/%Y            " => 1990-01-19 00:00:00
"01/19/90            " => "%m/%d/%y            " => 1990-01-19 00:00:00
"1990                " => "%Y                  " => 1990-01-01 00:00:00
"Jan 1990            " => "%b %Y               " => 1990-01-01 00:00:00
"January1990         " => "%B%Y                " => 1990-01-01 00:00:00