How do I convert a Pandas dataframe to a PyTorch tensor?

How do I train a simple neural network with PyTorch on a pandas dataframe df?

The column df["Target"] is the target (e.g. labels) of the network. This doesn't work:

import pandas as pd
import torch.utils.data as data_utils


target = pd.DataFrame(df['Target'])
train = data_utils.TensorDataset(df, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)
135684 次浏览

I'm referring to the question in the title as you haven't really specified anything else in the text, so just converting the DataFrame into a PyTorch tensor.

Without information about your data, I'm just taking float values as example targets here.

Convert Pandas dataframe to PyTorch tensor?

import pandas as pd
import torch
import random


# creating dummy targets (float values)
targets_data = [random.random() for i in range(10)]


# creating DataFrame from targets_data
targets_df = pd.DataFrame(data=targets_data)
targets_df.columns = ['targets']


# creating tensor from targets_df
torch_tensor = torch.tensor(targets_df['targets'].values)


# printing out result
print(torch_tensor)

Output:

tensor([ 0.5827,  0.5881,  0.1543,  0.6815,  0.9400,  0.8683,  0.4289,
0.5940,  0.6438,  0.7514], dtype=torch.float64)

Tested with Pytorch 0.4.0.

I hope this helps, if you have any further questions - just ask. :)

Maybe try this to see if it can fix your problem(based on your sample code)?

train_target = torch.tensor(train['Target'].values.astype(np.float32))
train = torch.tensor(train.drop('Target', axis = 1).values.astype(np.float32))
train_tensor = data_utils.TensorDataset(train, train_target)
train_loader = data_utils.DataLoader(dataset = train_tensor, batch_size = batch_size, shuffle = True)

Simply convert the pandas dataframe -> numpy array -> pytorch tensor. An example of this is described below:

import pandas as pd
import numpy as np
import torch


df = pd.read_csv('train.csv')
target = pd.DataFrame(df['target'])
del df['target']
train = data_utils.TensorDataset(torch.Tensor(np.array(df)), torch.Tensor(np.array(target)))
train_loader = data_utils.DataLoader(train, batch_size = 10, shuffle = True)

Hopefully, this will help you to create your own datasets using pytorch (Compatible with the latest version of pytorch).

You can use below functions to convert any dataframe or pandas series to a pytorch tensor

import pandas as pd
import torch


# determine the supported device
def get_device():
if torch.cuda.is_available():
device = torch.device('cuda:0')
else:
device = torch.device('cpu') # don't have GPU
return device


# convert a df to tensor to be used in pytorch
def df_to_tensor(df):
device = get_device()
return torch.from_numpy(df.values).float().to(device)


df_tensor = df_to_tensor(df)
series_tensor = df_to_tensor(series)

You can pass the df.values attribute (a numpy array) to the Dataset constructor directly:

import torch.utils.data as data_utils


# Creating np arrays
target = df['Target'].values
features = df.drop('Target', axis=1).values


# Passing to DataLoader
train = data_utils.TensorDataset(features, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)

Note: Your features (df) also contains the target variable (df['Target']) i.e. your network is 'cheating', since it can see the targets in the input. You need to remove this column from the set of features.

#This works for me


target = torch.tensor(df['Targets'].values)
features = torch.tensor(df.drop('Targets', axis = 1).values)


train = data_utils.TensorDataset(features, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)

To convert dataframe to pytorch tensor: [you can use this to tackle any df to convert it into pytorch tensor]

steps:

  • convert df to numpy using df.to_numpy() or df.to_numpy().astype(np.float32) to change the datatype of each numpy array to float32
  • convert the numpy to tensor using torch.from_numpy(df) method

example:

tensor_ = torch.from_numpy(df.to_numpy().astype(np.float32))