当拟合一个模型时,批量大小和时代数应该是多少?

我的训练集有970个样本,验证集有243个样本。

当拟合一个模型来优化 val _ acc 时,批量大小和纪元数应该有多大?根据数据输入的大小,是否有一些经验法则可以使用?

149933 次浏览

Since you have a pretty small dataset (~ 1000 samples), you would probably be safe using a batch size of 32, which is pretty standard. It won't make a huge difference for your problem unless you're training on hundreds of thousands or millions of observations.

To answer your questions on Batch Size and Epochs:

In general: Larger batch sizes result in faster progress in training, but don't always converge as fast. Smaller batch sizes train slower, but can converge faster. It's definitely problem dependent.

In general, the models improve with more epochs of training, to a point. They'll start to plateau in accuracy as they converge. Try something like 50 and plot number of epochs (x axis) vs. accuracy (y axis). You'll see where it levels out.

What is the type and/or shape of your data? Are these images, or just tabular data? This is an important detail.

I use Keras to perform non-linear regression on speech data. Each of my speech files gives me features that are 25000 rows in a text file, with each row containing 257 real valued numbers. I use a batch size of 100, epoch 50 to train Sequential model in Keras with 1 hidden layer. After 50 epochs of training, it converges quite well to a low val_loss.

I used Keras to perform non linear regression for market mix modelling. I got best results with a batch size of 32 and epochs = 100 while training a Sequential model in Keras with 3 hidden layers. Generally batch size of 32 or 25 is good, with epochs = 100 unless you have large dataset. in case of large dataset you can go with batch size of 10 with epochs b/w 50 to 100. Again the above mentioned figures have worked fine for me.

Great answers above. Everyone gave good inputs.

Ideally, this is the sequence of the batch sizes that should be used:

{1, 2, 4, 8, 16} - slow


{ [32, 64],[ 128, 256] }- Good starters


[32, 64] - CPU


[128, 256] - GPU for more boost

Epochs is up to your wish, depending upon when validation loss stops improving further. This much should be batch size:


# To define function to find batch size for training the model
# use this function to find out the batch size


def FindBatchSize(model):
"""#model: model architecture, that is yet to be trained"""
import os, sys, psutil, gc, tensorflow, keras
import numpy as np
from keras import backend as K
BatchFound= 16


try:
total_params= int(model.count_params());    GCPU= "CPU"
#find whether gpu is available
try:
if K.tensorflow_backend._get_available_gpus()== []:
GCPU= "CPU";    #CPU and Cuda9GPU
else:
GCPU= "GPU"
except:
from tensorflow.python.client import device_lib;    #Cuda8GPU
def get_available_gpus():
local_device_protos= device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
if "gpu" not in str(get_available_gpus()).lower():
GCPU= "CPU"
else:
GCPU= "GPU"


#decide batch size on the basis of GPU availability and model complexity
if (GCPU== "GPU") and (os.cpu_count() >15) and (total_params <1000000):
BatchFound= 64
if (os.cpu_count() <16) and (total_params <500000):
BatchFound= 64
if (GCPU== "GPU") and (os.cpu_count() >15) and (total_params <2000000) and (total_params >=1000000):
BatchFound= 32
if (GCPU== "GPU") and (os.cpu_count() >15) and (total_params >=2000000) and (total_params <10000000):
BatchFound= 16
if (GCPU== "GPU") and (os.cpu_count() >15) and (total_params >=10000000):
BatchFound= 8
if (os.cpu_count() <16) and (total_params >5000000):
BatchFound= 8
if total_params >100000000:
BatchFound= 1


except:
pass
try:


#find percentage of memory used
memoryused= psutil.virtual_memory()
memoryused= float(str(memoryused).replace(" ", "").split("percent=")[1].split(",")[0])
if memoryused >75.0:
BatchFound= 8
if memoryused >85.0:
BatchFound= 4
if memoryused >90.0:
BatchFound= 2
if total_params >100000000:
BatchFound= 1
print("Batch Size:  "+ str(BatchFound));    gc.collect()
except:
pass


memoryused= [];    total_params= [];    GCPU= "";
del memoryused, total_params, GCPU;    gc.collect()
return BatchFound

tf.keras.callbacks.EarlyStopping

With Keras you can make use of tf.keras.callbacks.EarlyStopping which automatically stops training if the monitored loss has stopped improving. You can allow epochs with no improvement using the parameter patience.

It helps to find the plateau from which you can go on refining the number of epochs or may even suffice to reach your goal without having to deal with epochs at all.

Well I haven't seen the answer I was looking for so I made a research myself. enter image description here

In this article this is said:

  • Stochastic means 1 sample, mimibatch batch of few samples and batch means full train dataset = this I fould here
  • PROS of smaller batch: faster train, less RAM needed
  • CONS: The smaller the batch the less accurate the estimate of the gradient will be

In this paper, they were trying 256,512,1024 batch sizes and the performance of all models were in the standard deviation of each other. This means that the batch size didn't have any significant influence on performance.

Final word:

  • If have problem with RAM = decrease batch size
  • If you need to calculate faster = decrease batch size
  • If the performace decreased after smaller batch = increase batch size

If you find this post useful, please up-vote & comment. Took the time to share it with you. Thanks

From one study, a rule of thumb is that batch size and learning_rates have a high correlation, to achieve good performance.

High learning rate in the study below means 0.001, small learning rate is 0.0001.

In my case, I usually have a high batch size of 1024 to 2048 for a dataset of a million records for example, with learning rate at 0.001 (default of Adam optimizer). However, i also use a cyclical learning rate scheduler which changes this value during fitting, which is another topic.

from the study:

'In this paper, we compared the performance of CNN using different batch sizes and different learning rates. According to our results, we can conclude that the learning rate and the batch size have a significant impact on the performance of the network. There is a high correlation between the learning rate and the batch size, when the learning rates are high, the large batch size performs better than with small learning rates. We recommend choosing small batch size with low learning rate. In practical terms, to determine the optimum batch size, we recommend trying smaller batch sizes first(usually 32 or 64), also keeping in mind that small batch sizes require small learning rates. The number of batch sizes should be a power of 2 to take full advantage of the GPUs processing. Subsequently, it is possible to increase the batch size value till satisfactory results are obtained.' - https://www.sciencedirect.com/science/article/pii/S2405959519303455