使用 Python 读取 Excel File 时,如何获取具有指定列名的特定列的值?

我有一个 Excel 文件:

Arm_id      DSPName        DSPCode          HubCode          PinCode    PPTL
1            JaVAS            01              AGR             282001    1,2
2            JaVAS            01              AGR             282002    3,4
3            JaVAS            01              AGR             282003    5,6

我想以 Arm_id,DSPCode,Pincode格式保存一个字符串。这种格式是可配置的,也就是说,它可能会更改为 DSPCode,Arm_id,Pincode。我把它保存在一个列表中,比如:

FORMAT = ['Arm_id', 'DSPName', 'Pincode']

如果 FORMAT是可配置的,那么如何读取具有提供名称的特定列的内容?

这就是我所尝试的。目前我能够读取文件中的所有内容

from xlrd import open_workbook
wb = open_workbook('sample.xls')
for s in wb.sheets():
#print 'Sheet:',s.name
values = []
for row in range(s.nrows):
col_value = []
for col in range(s.ncols):
value  = (s.cell(row,col).value)
try : value = str(int(value))
except : pass
col_value.append(value)
values.append(col_value)
print values

我的输出是:

[
[u'Arm_id', u'DSPName', u'DSPCode', u'HubCode', u'PinCode', u'PPTL'],
['1', u'JaVAS', '1', u'AGR', '282001', u'1,2'],
['2', u'JaVAS', '1', u'AGR', '282002', u'3,4'],
['3', u'JaVAS', '1', u'AGR', '282003', u'5,6']
]

然后我绕着 values[0]循环,试图找出 values[0]中的 FORMAT内容,然后得到 values[0]Arm_id, DSPname and Pincode的索引,然后从下一个循环我知道所有 FORMAT因子的索引,从而知道我需要得到哪个值。

但这个解决方案太糟糕了。

如何获取 Excel 文件中具有名称的特定列的值?

411036 次浏览

So the key parts are to grab the header ( col_names = s.row(0) ) and when iterating through the rows, to skip the first row which isn't needed for row in range(1, s.nrows) - done by using range from 1 onwards (not the implicit 0). You then use zip to step through the rows holding 'name' as the header of the column.

from xlrd import open_workbook


wb = open_workbook('Book2.xls')
values = []
for s in wb.sheets():
#print 'Sheet:',s.name
for row in range(1, s.nrows):
col_names = s.row(0)
col_value = []
for name, col in zip(col_names, range(s.ncols)):
value  = (s.cell(row,col).value)
try : value = str(int(value))
except : pass
col_value.append((name.value, value))
values.append(col_value)
print values

This is one approach:

from xlrd import open_workbook


class Arm(object):
def __init__(self, id, dsp_name, dsp_code, hub_code, pin_code, pptl):
self.id = id
self.dsp_name = dsp_name
self.dsp_code = dsp_code
self.hub_code = hub_code
self.pin_code = pin_code
self.pptl = pptl


def __str__(self):
return("Arm object:\n"
"  Arm_id = {0}\n"
"  DSPName = {1}\n"
"  DSPCode = {2}\n"
"  HubCode = {3}\n"
"  PinCode = {4} \n"
"  PPTL = {5}"
.format(self.id, self.dsp_name, self.dsp_code,
self.hub_code, self.pin_code, self.pptl))


wb = open_workbook('sample.xls')
for sheet in wb.sheets():
number_of_rows = sheet.nrows
number_of_columns = sheet.ncols


items = []


rows = []
for row in range(1, number_of_rows):
values = []
for col in range(number_of_columns):
value  = (sheet.cell(row,col).value)
try:
value = str(int(value))
except ValueError:
pass
finally:
values.append(value)
item = Arm(*values)
items.append(item)


for item in items:
print item
print("Accessing one single value (eg. DSPName): {0}".format(item.dsp_name))
print

You don't have to use a custom class, you can simply take a dict(). If you use a class however, you can access all values via dot-notation, as you see above.

Here is the output of the script above:

Arm object:
Arm_id = 1
DSPName = JaVAS
DSPCode = 1
HubCode = AGR
PinCode = 282001
PPTL = 1
Accessing one single value (eg. DSPName): JaVAS


Arm object:
Arm_id = 2
DSPName = JaVAS
DSPCode = 1
HubCode = AGR
PinCode = 282002
PPTL = 3
Accessing one single value (eg. DSPName): JaVAS


Arm object:
Arm_id = 3
DSPName = JaVAS
DSPCode = 1
HubCode = AGR
PinCode = 282003
PPTL = 5
Accessing one single value (eg. DSPName): JaVAS

The approach I took reads the header information from the first row to determine the indexes of the columns of interest.

You mentioned in the question that you also want the values output to a string. I dynamically build a format string for the output from the FORMAT column list. Rows are appended to the values string separated by a new line char.

The output column order is determined by the order of the column names in the FORMAT list.

In my code below the case of the column name in the FORMAT list is important. In the question above you've got 'Pincode' in your FORMAT list, but 'PinCode' in your excel. This wouldn't work below, it would need to be 'PinCode'.

from xlrd import open_workbook
wb = open_workbook('sample.xls')


FORMAT = ['Arm_id', 'DSPName', 'PinCode']
values = ""


for s in wb.sheets():
headerRow = s.row(0)
columnIndex = [x for y in FORMAT for x in range(len(headerRow)) if y == firstRow[x].value]
formatString = ("%s,"*len(columnIndex))[0:-1] + "\n"


for row in range(1,s.nrows):
currentRow = s.row(row)
currentRowValues = [currentRow[x].value for x in columnIndex]
values += formatString % tuple(currentRowValues)


print values

For the sample input you gave above this code outputs:

>>> 1.0,JaVAS,282001.0
2.0,JaVAS,282002.0
3.0,JaVAS,282003.0

And because I'm a python noob, props be to: this answer, this answer, this question, this question and this answer.

A somewhat late answer, but with pandas, it is possible to get directly a column of an excel file:

import pandas


df = pandas.read_excel('sample.xls')
#print the column names
print df.columns
#get the values for a given column
values = df['Arm_id'].values
#get a data frame with selected columns
FORMAT = ['Arm_id', 'DSPName', 'Pincode']
df_selected = df[FORMAT]

Make sure you have installed xlrd and pandas:

pip install pandas xlrd

By using pandas we can read excel easily.

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile


DataF=pd.read_excel("Test.xlsx",sheet_name='Sheet1')


print("Column headings:")
print(DataF.columns)

Test at :https://repl.it Reference: https://pythonspot.com/read-excel-with-pandas/

Although I almost always just use pandas for this, my current little tool is being packaged into an executable and including pandas is overkill. So I created a version of poida's solution that resulted in a list of named tuples. His code with this change would look like this:

from xlrd import open_workbook
from collections import namedtuple
from pprint import pprint


wb = open_workbook('sample.xls')


FORMAT = ['Arm_id', 'DSPName', 'PinCode']
OneRow = namedtuple('OneRow', ' '.join(FORMAT))
all_rows = []


for s in wb.sheets():
headerRow = s.row(0)
columnIndex = [x for y in FORMAT for x in range(len(headerRow)) if y == headerRow[x].value]


for row in range(1,s.nrows):
currentRow = s.row(row)
currentRowValues = [currentRow[x].value for x in columnIndex]
all_rows.append(OneRow(*currentRowValues))


pprint(all_rows)

Here is the code to read an excel file and and print all the cells present in column 1 (except the first cell i.e the header):

import xlrd


file_location="C:\pythonprog\xxx.xlsv"
workbook=xlrd.open_workbook(file_location)
sheet=workbook.sheet_by_index(0)
print(sheet.cell_value(0,0))


for row in range(1,sheet.nrows):
print(sheet.cell_value(row,0))

I have read using openpyxl library,

import openpyxl
from pathlib import Path


xlsx_file = Path('C:\\Users\\Amit\\Desktop\\ReadExcel', 'ReadData.xlsx')
wb_obj = openpyxl.load_workbook(xlsx_file)


# Read the active sheet:
sheet = wb_obj.active


for i in range(sheet.max_column):
print(f'i = {i}')
for row in sheet.iter_rows():
print(row[i].value)