>>> mytext = u'i am a foobar bazbar'
>>> mytext.capitalize()
u'I am a foobar bazbar'
>>>
Ok as said in reply above, you have to make a custom capitalize:
mytext = u'i am a foobar bazbar'
def xcaptilize(word):
skipList = ['a', 'an', 'the', 'am']
if word not in skipList:
return word.capitalize()
return word
k = mytext.split(" ")
l = map(xcaptilize, k)
print " ".join(l)
not_these = ['a','the', 'of']
thestring = 'the secret of a disappointed programmer'
print ' '.join(word
if word in not_these
else word.title()
for word in thestring.capitalize().split(' '))
"""Output:
The Secret of a Disappointed Programmer
"""
The title starts with capitalized word and that does not match the article.
There are a few problems with this. If you use split and join, some white space characters will be ignored. The built-in capitalize and title methods do not ignore white space.
>>> 'There is a way'.title()
'There Is A Way'
If a sentence starts with an article, you do not want the first word of a title in lowercase.
Keeping these in mind:
import re
def title_except(s, exceptions):
word_list = re.split(' ', s) # re.split behaves as expected
final = [word_list[0].capitalize()]
for word in word_list[1:]:
final.append(word if word in exceptions else word.capitalize())
return " ".join(final)
articles = ['a', 'an', 'of', 'the', 'is']
print title_except('there is a way', articles)
# There is a Way
print title_except('a whim of an elephant', articles)
# A Whim of an Elephant
they capitalizes small words like if, in, of, on, etc., but will un-capitalize them if they’re erroneously capitalized in the input.
the scripts assume that words with capitalized letters other than the first character are already correctly capitalized. This means they will leave a word like “iTunes” alone, rather than mangling it into “ITunes” or, worse, “Itunes”.
they skip over any words with line dots; “example.com” and “del.icio.us” will remain lowercase.
they have hard-coded hacks specifically to deal with odd cases, like “AT&T” and “Q&A”, both of which contain small words (at and a) which normally should be lowercase.
The first and last word of the title are always capitalized, so input such as “Nothing to be afraid of” will be turned into “Nothing to Be Afraid Of”.
One important case that is not being considered is acronyms (the python-titlecase solution can handle acronyms if you explicitly provide them as exceptions). I prefer instead to simply avoid down-casing. With this approach, acronyms that are already upper case remain in upper case. The following code is a modification of that originally provided by dheerosaur.
# This is an attempt to provide an alternative to ''.title() that works with
# acronyms.
# There are several tricky cases to worry about in typical order of importance:
# 0. Upper case first letter of each word that is not an 'minor' word.
# 1. Always upper case first word.
# 2. Do not down case acronyms
# 3. Quotes
# 4. Hyphenated words: drive-in
# 5. Titles within titles: 2001 A Space Odyssey
# 6. Maintain leading spacing
# 7. Maintain given spacing: This is a test. This is only a test.
# The following code addresses 0-3 & 7. It was felt that addressing the others
# would add considerable complexity.
def titlecase(
s,
exceptions = (
'and', 'or', 'nor', 'but', 'a', 'an', 'and', 'the', 'as', 'at', 'by',
'for', 'in', 'of', 'on', 'per', 'to'
)
):
words = s.strip().split(' ')
# split on single space to maintain word spacing
# remove leading and trailing spaces -- needed for first word casing
def upper(s):
if s:
if s[0] in '‘“"‛‟' + "'":
return s[0] + upper(s[1:])
return s[0].upper() + s[1:]
return ''
# always capitalize the first word
first = upper(words[0])
return ' '.join([first] + [
word if word.lower() in exceptions else upper(word)
for word in words[1:]
])
cases = '''
CDC warns about "aggressive" rats as coronavirus shuts down restaurants
L.A. County opens churches, stores, pools, drive-in theaters
UConn senior accused of killing two men was looking for young woman
Giant asteroid that killed the dinosaurs slammed into Earth at ‘deadliest possible angle,’ study reveals
Maintain given spacing: This is a test. This is only a test.
'''.strip().splitlines()
for case in cases:
print(titlecase(case))
When run, it produces the following:
CDC Warns About "Aggressive" Rats as Coronavirus Shuts Down Restaurants L.A. County Opens Churches, Stores, Pools, Drive-in Theaters
UConn Senior Accused of Killing Two Men Was Looking for Young Woman
Giant Asteroid That Killed the Dinosaurs Slammed Into Earth at ‘Deadliest Possible Angle,’ Study Reveals
Maintain Given Spacing: This Is a Test. This Is Only a Test.