熊猫: 如何更改一个列的所有值?

我有一个名为 "Date"的列的数据框架,希望该列中的所有值都具有相同的值(仅限于年份)。例如:

City     Date
Paris    01/04/2004
Lisbon   01/09/2004
Madrid   2004
Pekin    31/2004

我想要的是:

City     Date
Paris    2004
Lisbon   2004
Madrid   2004
Pekin    2004

这是我的代码:

fr61_70xls = pd.ExcelFile('AMADEUS FRANCE 1961-1970.xlsx')


#Here we import the individual sheets and clean the sheets
years=(['1961','1962','1963','1964','1965','1966','1967','1968','1969','1970'])


fr={}


header=(['City','Country','NACE','Cons','Last_year','Op_Rev_EUR_Last_avail_yr','BvD_Indep_Indic','GUO_Name','Legal_status','Date_of_incorporation','Legal_status_date'])


for year in years:
# save every sheet in variable fr['1961'], fr['1962'] and so on
fr[year]=fr61_70xls.parse(year,header=0,parse_cols=10)
fr[year].columns=header
# drop the entire Legal status date column
fr[year]=fr[year].drop(['Legal_status_date','Date_of_incorporation'],axis=1)
# drop every row where GUO Name is empty
fr[year]=fr[year].dropna(axis=0,how='all',subset=[['GUO_Name']])
fr[year]=fr[year].set_index(['GUO_Name','Date_of_incorporation'])

碰巧在我的 DataFrames (例如 fr['1961'])中,Date_of_incorporation的值可以是任何值(字符串、整数等等) ,所以也许最好完全删除这个列,然后将另一个只有年份的列附加到 DataFrames?

219797 次浏览

As @DSM points out, you can do this more directly using the vectorised string methods:

df['Date'].str[-4:].astype(int)

Or using extract (assuming there is only one set of digits of length 4 somewhere in each string):

df['Date'].str.extract('(?P<year>\d{4})').astype(int)

An alternative slightly more flexible way, might be to use apply (or equivalently map) to do this:

df['Date'] = df['Date'].apply(lambda x: int(str(x)[-4:]))
#  converts the last 4 characters of the string to an integer

The lambda function, is taking the input from the Date and converting it to a year.
You could (and perhaps should) write this more verbosely as:

def convert_to_year(date_in_some_format):
date_as_string = str(date_in_some_format)  # cast to string
year_as_string = date_in_some_format[-4:] # last four characters
return int(year_as_string)


df['Date'] = df['Date'].apply(convert_to_year)

Perhaps 'Year' is a better name for this column...

You can do a column transformation by using apply

Define a clean function to remove the dollar and commas and convert your data to float.

def clean(x):
x = x.replace("$", "").replace(",", "").replace(" ", "")
return float(x)

Next, call it on your column like this.

data['Revenue'] = data['Revenue'].apply(clean)

Or if one want to use lambda function in the apply function:

data['Revenue']=data['Revenue'].apply(lambda x:float(x.replace("$","").replace(",", "").replace(" ", "")))