小开

最佳答案

你可以通过传递tf.GPUOptions作为可选参数config的一部分来设置构造tf.Session时要分配的GPU内存的比例:

# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)


sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

per_process_gpu_memory_fraction充当同一台机器上每个GPU上的进程将使用的GPU内存量的硬上限。目前，这个分数统一应用于同一台机器上的所有gpu;没有办法在每个gpu基础上设置这个。

小开

config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)

https://github.com/tensorflow/tensorflow/issues/1578

小开

无耻插头:如果你安装了Tensorflow支持的GPU，无论你设置它只使用CPU还是GPU，会话都会首先分配所有的GPU。我可以补充我的提示，即使你设置图形只使用CPU，你也应该设置相同的配置(如上所述:))，以防止不必要的GPU占用。

在像IPython和Jupyter这样的交互界面中，您也应该设置该配置，否则，它将分配所有内存，而几乎没有内存留给其他内存。这一点有时很难注意到。

小开

下面是本书Deep Learning with TensorFlow的节选

在某些情况下，进程只分配可用内存的一个子集，或者只根据进程需要增加内存使用。TensorFlow在会话上提供了两个配置选项来控制这一点。第一个是allow_growth选项，它试图根据运行时分配分配尽可能多的GPU内存，它开始分配非常少的内存，随着会话运行，需要更多的GPU内存，我们扩展TensorFlow进程所需的GPU内存区域。

1)允许增长:(更灵活)

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

第二个方法是per_process_gpu_memory_fraction选项，它决定了each可见GPU应该分配的内存总量的比例。注意:不需要释放内存，这样做甚至会恶化内存碎片。

2)分配固定内存:

在每个GPU的总内存中只分配40%:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

<强>注意: 只有当你真的想绑定TensorFlow进程上可用的GPU内存数量时，这才有用

小开

上面所有的答案都假设使用sess.run()调用执行，这在TensorFlow的最新版本中成为例外而不是规则。

当使用tf.Estimator框架(TensorFlow 1.4及以上版本)时，将分数传递给隐式创建的MonitoredTrainingSession的方法是:

opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
conf = tf.ConfigProto(gpu_options=opts)
trainingConfig = tf.estimator.RunConfig(session_config=conf, ...)
tf.estimator.Estimator(model_fn=...,
config=trainingConfig)

类似地，在Eager模式下(TensorFlow 1.5及以上)，

opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
conf = tf.ConfigProto(gpu_options=opts)
tfe.enable_eager_execution(config=conf)

<强>编辑:11-04-2018 作为一个例子，如果你要使用tf.contrib.gan.train，那么你可以使用类似于下面的东西:

tf.contrib.gan.gan_train(........, config=conf)

小开

我尝试在voc数据集上训练unet，但由于图像大小巨大，内存结束。我尝试了上面所有的技巧，甚至尝试了batch size==1，但没有任何改善。有时候TensorFlow版本也会导致内存问题。尝试使用

PIP install tensorflow-gpu==1.8.0

小开

好吧，我是张sorflow的新手，我有Geforce 740m或一些带有2GB ram的GPU，我正在运行mnist手写的原生语言示例，训练数据包含38700张图像和4300张测试图像，并试图获得精度，回忆，F1使用以下代码，因为sklearn没有给我精确的结果。一旦我把这个添加到我现有的代码，我开始得到GPU错误。

TP = tf.count_nonzero(predicted * actual)
TN = tf.count_nonzero((predicted - 1) * (actual - 1))
FP = tf.count_nonzero(predicted * (actual - 1))
FN = tf.count_nonzero((predicted - 1) * actual)


prec = TP / (TP + FP)
recall = TP / (TP + FN)
f1 = 2 * prec * recall / (prec + recall)

加上我的模型是沉重的我猜,我是内存错误147年之后,148年的时代,然后我想为什么不创建函数的任务,所以我不知道如果在tensrorflow这种方式工作,但我认为如果使用局部变量,当定义的范围可能释放内存和i上述元素的训练和测试模块,我能够达到10000时代没有任何问题,我希望这将帮助. .

小开

对于TensorFlow 2.0和2.1 (文档):

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth(True)

对于TensorFlow 2.2+ (文档):

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)

文档还列出了更多的方法:

设置环境变量TF_FORCE_GPU_ALLOW_GROWTH为true。
使用tf.config.experimental.set_virtual_device_configuration来设置虚拟GPU设备的硬限制。

小开

你可以使用

TF_FORCE_GPU_ALLOW_GROWTH=true

在环境变量中。

在tensorflow代码中:

bool GPUBFCAllocator::GetAllowGrowthValue(const GPUOptions& gpu_options) {
const char* force_allow_growth_string =
std::getenv("TF_FORCE_GPU_ALLOW_GROWTH");
if (force_allow_growth_string == nullptr) {
return gpu_options.allow_growth();
}

小开

Tensorflow 2.0 Beta和(可能)更高版本

API再次改变。现在可以在以下地方找到它:

tf.config.experimental.set_memory_growth(
device,
enable
)

别名:

tf.compat.v1.config.experimental.set_memory_growth
tf.compat.v2.config.experimental.set_memory_growth

引用:

< >强参见: Tensorflow -使用GPU: https://www.tensorflow.org/guide/gpu

Tensorflow 2.0 Alpha参见: 这个答案

小开

# allocate 60% of GPU memory
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.6
set_session(tf.Session(config=config))

小开

对于Tensorflow 2.0，这个这个解决方案适用于我。(TF-GPU 2.0, Windows 10, GeForce RTX 2070)

physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
tf.config.experimental.set_memory_growth(physical_devices[0], True)

小开

对于Tensorflow 2.0和2.1版本，请使用以下代码片段:

 import tensorflow as tf
gpu_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpu_devices[0], True)

对于以前的版本，以下片段曾经为我工作:

import tensorflow as tf
tf_config=tf.ConfigProto()
tf_config.gpu_options.allow_growth=True
sess = tf.Session(config=tf_config)

小开

如果你正在使用Tensorflow 2，请尝试以下步骤:

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)

小开

这段代码对我有用:

import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.InteractiveSession(config=config)

小开

上面所有的答案都是指在TensorFlow 1.X版本中将内存设置到一定程度，或者在TensorFlow 2.X中允许内存增长。

tf.config.experimental.set_memory_growth方法确实适用于在分配/预处理期间允许动态增长。然而，人们可能喜欢从一开始就分配一个特定的GPU内存上限。

分配特定GPU内存的逻辑也是为了防止在训练期间使用OOM内存。例如，如果在打开消耗视频内存的Chrome选项卡/任何其他视频消耗进程时进行训练，tf.config.experimental.set_memory_growth(gpu, True)可能会导致OOM抛出错误，因此在某些情况下必须从一开始就分配更多内存。

TensorFlow 2中为每个GPU分配内存的推荐和正确方法。X是通过以下方式完成的:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only allocate 1GB of memory on the first GPU
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)]

如何防止张量流分配GPU内存的总量?

Tensorflow 2.0 Beta和(可能)更高版本