溢出排序阶段缓冲数据使用超过内部限制

使用代码:

all_reviews = db_handle.find().sort('reviewDate', pymongo.ASCENDING)
print all_reviews.count()


print all_reviews[0]
print all_reviews[2000000]

计数器打印 2043484,它打印 all_reviews[0]

但是,当打印 all_reviews[2000000]时,我得到了错误:

皮蒙古语,错误。数据库错误: 运行器错误: 溢出排序阶段缓冲数据使用量为33554495字节,超过了33554432字节的内部限制

我该怎么办?

96183 次浏览

solved with indexing

db_handle.ensure_index([("reviewDate", pymongo.ASCENDING)])

You're running into the 32MB limit on an in-memory sort:

https://docs.mongodb.com/manual/reference/limits/#Sort-Operations

Add an index to the sort field. That allows MongoDB to stream documents to you in sorted order, rather than attempting to load them all into memory on the server and sort them in memory before sending them to the client.

As said by kumar_harsh in the comments section, i would like to add another point.

You can view the current buffer usage using the below command over the admin database:

> use admin
switched to db admin
> db.runCommand( { getParameter : 1, "internalQueryExecMaxBlockingSortBytes" : 1 } )
{ "internalQueryExecMaxBlockingSortBytes" : 33554432, "ok" : 1 }

It has a default value of 32 MB(33554432 bytes).In this case you're running short of buffer data so you can increase buffer limit with your own defined optimal value, example 50 MB as below:

>  db.adminCommand({setParameter: 1, internalQueryExecMaxBlockingSortBytes:50151432})
{ "was" : 33554432, "ok" : 1 }

We can also set this limit permanently by the below parameter in the mongodb config file:

setParameter=internalQueryExecMaxBlockingSortBytes=309715200

Hope this helps !!!

Note:This commands supports only after version 3.0 +

In my case, it was necessary to fix nessary indexes in code and recreate them:

rake db:mongoid:create_indexes RAILS_ENV=production

As the memory overflow does not occur when there is a needed index of field.

PS Before this I had to disable the errors when creating long indexes:

# mongo
MongoDB shell version: 2.6.12
connecting to: test
> db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )

Also may be needed reIndex:

# mongo
MongoDB shell version: 2.6.12
connecting to: test
> use your_db
switched to db your_db
> db.getCollectionNames().forEach( function(collection){ db[collection].reIndex() } )

If you want to avoid creating an index (e.g. you just want a quick-and-dirty check to explore the data), you can use aggregation with disk usage:

all_reviews = db_handle.aggregate([{$sort: {'reviewDate': 1}}], {allowDiskUse: true})

(Not sure how to do this in pymongo, though).

JavaScript API syntax for the index:

db_handle.ensureIndex({executedDate: 1})