Sklearn: 在调用时发现样本数不一致的数组


regr = LinearRegression()[1:1000, 5].values, df2.iloc[1:1000, 2].values)


ValueError: Found arrays with inconsistent numbers of samples: [  1 999]


213798 次浏览

It looks like sklearn requires the data shape of (row number, column number). If your data shape is (row number, ) like (999, ), it does not work. By using numpy.reshape(), you should change the shape of the array to (999, 1), e.g. using


In my case, it worked with that.

I think the "X" argument of needs to be a matrix, so the following should work.

regr = LinearRegression()[1:1000, [5]].values, df2.iloc[1:1000, 2].values)
expects X(feature matrix)

Try to put your features in a tuple like this:

features = ['TV', 'Radio', 'Newspaper']
X = data[features]

Looks like you are using pandas dataframe (from the name df2).

You could also do the following:

regr = LinearRegression()[1:1000, 5].to_frame(), df2.iloc[1:1000, 2].to_frame())

NOTE: I have removed "values" as that converts the pandas Series to numpy.ndarray and numpy.ndarray does not have attribute to_frame().

I encountered this error because I converted my data to an np.array. I fixed the problem by converting my data to an np.matrix instead and taking the transpose.

ValueError:, np.array(y_list))

Correct:, np.transpose(np.matrix(y_list)))

To analyze two arrays (array1 and array2) they need to meet the following two requirements:

1) They need to be a numpy.ndarray

Check with

# and

If that is not the case for at least one of them perform

array1 = numpy.ndarray(array1)
# or
array2 = numpy.ndarray(array2)

2) The dimensions need to be as follows:

array1.shape #shall give (N, 1)
array2.shape #shall give (N,)

N is the number of items that are in the array. To provide array1 with the right number of axes perform:

array1 = array1[:, numpy.newaxis]

As it was mentioned above X argument must be a matrix or a numpy array with known dimensions. So you could probably use this:

df2.iloc[1:1000, 5:some_last_index].values

So your dataframe would be converted to an array with known dimensions and you won't need to reshape it

Seen on the Udacity deep learning foundation course:

df = pd.read_csv('my.csv')
regr = LinearRegression()[['column x']], df[['column y']])

I faced a similar problem. The problem in my case was, Number of rows in X was not equal to number of rows in y.

i.e. number of entries in feature columns was not equal to number of entires in target variable since I had dropped some rows from freature columns.

during train test split you might have done a mistake


The above code is correct

You might have done like below which is wrong
