如何使用字段名称的变量访问名称元组的字段?

我可以通过名称访问命名元组的元素,如下所示(*) :

from collections import namedtuple
Car = namedtuple('Car', 'color mileage')
my_car = Car('red', 100)
print my_car.color

但是我怎样才能使用一个变量来指定我想要访问的字段的名称呢。

field = 'color'
my_car[field] # doesn't work
my_car.field # doesn't work

我的实际用例是使用 for row in data.itertuples()迭代熊猫数据框架。我正在对来自特定列的值进行操作,我希望能够指定按名称使用的列作为包含此循环的方法的参数。

(*) 从这里举的例子。我正在使用 Python 2.7。

38365 次浏览

You can use getattr

getattr(my_car, field)

The 'getattr' answer works, but there is another option which is slightly faster.

idx = {name: i for i, name in enumerate(list(df), start=1)}
for row in df.itertuples(name=None):
example_value = row[idx['product_price']]

Explanation

Make a dictionary mapping the column names to the row position. Call 'itertuples' with "name=None". Then access the desired values in each tuple using the indexes obtained using the column name from the dictionary.

  1. Make a dictionary to find the indexes.

idx = {name: i for i, name in enumerate(list(df), start=1)}

  1. Use the dictionary to access the desired values by name in the row tuples
for row in df.itertuples(name=None):
example_value = row[idx['product_price']]

Note: Use start=0 in enumerate if you call itertuples with index=False

Here is a working example showing both methods and the timing of both methods.

import numpy as np
import pandas as pd
import timeit


data_length = 3 * 10**5
fake_data = {
"id_code": list(range(data_length)),
"letter_code": np.random.choice(list('abcdefgz'), size=data_length),
"pine_cones": np.random.randint(low=1, high=100, size=data_length),
"area": np.random.randint(low=1, high=100, size=data_length),
"temperature": np.random.randint(low=1, high=100, size=data_length),
"elevation": np.random.randint(low=1, high=100, size=data_length),
}
df = pd.DataFrame(fake_data)




def iter_with_idx():
result_data = []
    

idx = {name: i for i, name in enumerate(list(df), start=1)}
    

for row in df.itertuples(name=None):
        

row_calc = row[idx['pine_cones']] / row[idx['area']]
result_data.append(row_calc)
        

return result_data


      

def iter_with_getaatr():
    

result_data = []
for row in df.itertuples():
row_calc = getattr(row, 'pine_cones') / getattr(row, 'area')
result_data.append(row_calc)
        

return result_data
    



dict_idx_method = timeit.timeit(iter_with_idx, number=100)
get_attr_method = timeit.timeit(iter_with_getaatr, number=100)


print(f'Dictionary index Method {dict_idx_method:0.4f} seconds')
print(f'Get attribute method {get_attr_method:0.4f} seconds')

Result:

Dictionary index Method 49.1814 seconds
Get attribute method 80.1912 seconds

I assume the difference is due to lower overhead in creating a tuple vs a named tuple and also lower overhead in accessing it by the index rather than getattr but both of those are just guesses. If anyone knows better please comment.

I have not explored how the number of columns vs number of rows effects the timing results.

Another way of accessing them can be:

field_idx = my_car._fields.index(field)
my_car[field_idx]

Extract index of the field and then use it to index the namedtuple.

since python version 3.6 one could inherit from typing.NamedTuple

import typing as tp




class HistoryItem(tp.NamedTuple):
inp: str
tsb: float
rtn: int
frequency: int = None


def __getitem__(self, item):
if isinstance(item, int):
item = self._fields[item]
return getattr(self, item)


def get(self, item, default=None):
try:
return self[item]
except (KeyError, AttributeError, IndexError):
return default




item = HistoryItem("inp", 10, 10, 10)


print(item[0])  # 'inp'
print(item["inp"])  # 'inp'

Use the following code

for i,x in enumerate(my_car._fields):
print(x, my_car[i])