如何加载一个数组的列表数据集加载?

我有一个庞大的 numpy 数组列表,其中每个数组代表一个图像,我想使用 torch.utils.data 加载它。Dataloader 对象。但是 torch.utils.data 的文档。Dataloader 提到它直接从文件夹加载数据。我如何修改它为我的事业?我是新来的火炬和任何帮助将不胜感激。 我的单个图像的数字数组看起来像这样。这个图像是 RBG 图像。

[[[ 70  82  94]
[ 67  81  93]
[ 66  82  94]
...,
[182 182 188]
[183 183 189]
[188 186 192]]


[[ 66  80  92]
[ 62  78  91]
[ 64  79  95]
...,
[176 176 182]
[178 178 184]
[180 180 186]]


[[ 62  82  93]
[ 62  81  96]
[ 65  80  99]
...,
[169 172 177]
[173 173 179]
[172 172 178]]


...,
90511 次浏览

I think what DataLoader actually requires is an input that subclasses Dataset. You can either write your own dataset class that subclasses Datasetor use TensorDataset as I have done below:

import torch
import numpy as np
from torch.utils.data import TensorDataset, DataLoader


my_x = [np.array([[1.0,2],[3,4]]),np.array([[5.,6],[7,8]])] # a list of numpy arrays
my_y = [np.array([4.]), np.array([2.])] # another list of numpy arrays (targets)


tensor_x = torch.Tensor(my_x) # transform to torch tensor
tensor_y = torch.Tensor(my_y)


my_dataset = TensorDataset(tensor_x,tensor_y) # create your datset
my_dataloader = DataLoader(my_dataset) # create your dataloader

Works for me. Hope it helps you.

PyTorch DataLoader need a DataSet as you can check in the docs. The right way to do that is to use:

torch.utils.data.TensorDataset(*tensors)

Which is a Dataset for wrapping tensors, where each sample will be retrieved by indexing tensors along the first dimension. The parameters *tensors means tensors that have the same size of the first dimension.

The other class torch.utils.data.Dataset is an abstract class.

Here is how to convert numpy arrays to tensors:

import torch
import numpy as np
n = np.arange(10)
print(n) #[0 1 2 3 4 5 6 7 8 9]
t1 = torch.Tensor(n)  # as torch.float32
print(t1) #tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
t2 = torch.from_numpy(n)  # as torch.int32
print(t2) #tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=torch.int32)

The accepted answer used the torch.Tensor construct. If you have an image with pixels from 0-255 you may use this:

timg = torch.from_numpy(img).float()

Or torchvision to_tensor method, that converts a PIL Image or numpy.ndarray to tensor.


But here is a little trick you can put your numpy arrays directly.

x1 = np.array([1,2,3])
d1 = DataLoader( x1, batch_size=3)

This also works, but if you print d1.dataset type:

print(type(d1.dataset)) # <class 'numpy.ndarray'>

While we actually need Tensors for working with CUDA so it is better to use Tensors to feed the DataLoader.

Since you have images you probably want to perform transformations on them. So TensorDataset is not the best option here. Instead you can create your own Dataset. Something like this:

import torch
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
import numpy as np
from PIL import Image




class MyDataset(Dataset):
def __init__(self, data, targets, transform=None):
self.data = data
self.targets = torch.LongTensor(targets)
self.transform = transform
        

def __getitem__(self, index):
x = self.data[index]
y = self.targets[index]
        

if self.transform:
x = Image.fromarray(self.data[index].astype(np.uint8).transpose(1,2,0))
x = self.transform(x)
        

return x, y
    

def __len__(self):
return len(self.data)


# Let's create 10 RGB images of size 128x128 and 10 labels {0, 1}
data = list(np.random.randint(0, 255, size=(10, 3, 128, 128)))
targets = list(np.random.randint(2, size=(10)))


transform = transforms.Compose([transforms.Resize(64), transforms.ToTensor()])
dataset = MyDataset(data, targets, transform=transform)
dataloader = DataLoader(dataset, batch_size=5)