如何识别 Linux 块设备的请求队列

我正在开发这个驱动程序，它通过网络连接硬盘。有一个错误，如果我在计算机上启用两个或更多的硬盘，只有第一个得到分区检查和识别。结果是，如果我在 hda 上有1个分区，在 hdb 上有1个分区，那么只要我连接 hda，就会有一个可以挂载的分区。所以 hda1一旦安装就会得到一个 blchild xyz123。但是当我继续挂载 hdb1时，它也会出现同样的 blchild，实际上，驱动程序是从 hda 而不是 hdb 读取它。

我想我找到司机搞砸的地方了。下面是一个包含 dump _ stack 的调试输出，我把它放在了似乎访问错误设备的第一个位置。

以下是代码部分:

/*basically, this is just the request_queue processor. In the log output that
follows, the second device, (hdb) has just been connected, right after hda
was connected and hda1 was mounted to the system. */


void nblk_request_proc(struct request_queue *q)
{
struct request *req;
ndas_error_t err = NDAS_OK;


dump_stack();


while((req = NBLK_NEXT_REQUEST(q)) != NULL)
{
dbgl_blk(8,"processing queue request from slot %d",SLOT_R(req));


if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED, &(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags)))  {
printk ("ndas: Queue is suspended\n");
/* Queue is suspended */
#if ( LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31) )
blk_start_request(req);
#else
blkdev_dequeue_request(req);
#endif

下面是日志输出。我已经添加了一些评论，以帮助理解正在发生的事情和坏电话似乎出现的地方。

  /* Just below here you can see "slot" mentioned many times. This is the
identification for the network case in which the hd is connected to the
network. So you will see slot 2 in this log because the first device has
already been connected and mounted. */


kernel: [231644.155503] BL|4|slot_enable|/driver/block/ctrldev.c:281|adding disk: slot=2, first_minor=16, capacity=976769072|nd/dpcd1,64:15:44.38,3828:10
kernel: [231644.155588] BL|3|ndop_open|/driver/block/ops.c:233|ing bdev=f6823400|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155598] BL|2|ndop_open|/driver/block/ops.c:247|slot =0x2|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155606] BL|2|ndop_open|/driver/block/ops.c:248|dev_t=0x3c00010|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155615] ND|3|ndas_query_slot|netdisk/nddev.c:791|slot=2 sdev=d33e2080|nd/dpcd1,64:15:44.38,3696:10
kernel: [231644.155624] ND|3|ndas_query_slot|netdisk/nddev.c:817|ed|nd/dpcd1,64:15:44.38,3696:10
kernel: [231644.155631] BL|3|ndop_open|/driver/block/ops.c:326|mode=1|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155640] BL|3|ndop_open|/driver/block/ops.c:365|ed open|nd/dpcd1,64:15:44.38,3724:10
kernel: [231644.155653] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2334|gendisk=c6afd800={major=60,first_minor=16,minors=0x10,disk_name=ndas-44700486-0,private_data=00000002,capacity=%lld}|nd/dpcd1,64:15:44.38,3660:10
kernel: [231644.155668] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2346|ed|nd/dpcd1,64:15:44.38,3652:10


/* So at this point the hard disk is added (gendisk=c6...) and the identifications
all match the network device. The driver is now about to begin scanning the
hard drive for existing partitions. the little 'ed', at the end of the previous
line indicates that revalidate_disk has finished it's job.


Also, I think the request queue is indicated by the output dpcd1 near the very
end of the line.


Now below we have entered the function that is pasted above. In the function
you can see that the slot can be determined by the queue. And the log output
after the stack dump shows it is from slot 1. (The first network drive that was
already mounted.) */


kernel: [231644.155677]  ndas-44700486-0:Pid: 467, comm: nd/dpcd1 Tainted: P           2.6.32-5-686 #1
kernel: [231644.155711] Call Trace:
kernel: [231644.155723]  [<fc5a7685>] ? nblk_request_proc+0x9/0x10c [ndas_block]
kernel: [231644.155732]  [<c11298db>] ? __generic_unplug_device+0x23/0x25
kernel: [231644.155737]  [<c1129afb>] ? generic_unplug_device+0x1e/0x2e
kernel: [231644.155743]  [<c1123090>] ? blk_unplug+0x2e/0x31
kernel: [231644.155750]  [<c10cceec>] ? block_sync_page+0x33/0x34
kernel: [231644.155756]  [<c108770c>] ? sync_page+0x35/0x3d
kernel: [231644.155763]  [<c126d568>] ? __wait_on_bit_lock+0x31/0x6a
kernel: [231644.155768]  [<c10876d7>] ? sync_page+0x0/0x3d
kernel: [231644.155773]  [<c10876aa>] ? __lock_page+0x76/0x7e
kernel: [231644.155780]  [<c1043f1f>] ? wake_bit_function+0x0/0x3c
kernel: [231644.155785]  [<c1087b76>] ? do_read_cache_page+0xdf/0xf8
kernel: [231644.155791]  [<c10d21b9>] ? blkdev_readpage+0x0/0xc
kernel: [231644.155796]  [<c1087bbc>] ? read_cache_page_async+0x14/0x18
kernel: [231644.155801]  [<c1087bc9>] ? read_cache_page+0x9/0xf
kernel: [231644.155808]  [<c10ed6fc>] ? read_dev_sector+0x26/0x60
kernel: [231644.155813]  [<c10ee368>] ? adfspart_check_ICS+0x20/0x14c
kernel: [231644.155819]  [<c10ee138>] ? rescan_partitions+0x17e/0x378
kernel: [231644.155825]  [<c10ee348>] ? adfspart_check_ICS+0x0/0x14c
kernel: [231644.155830]  [<c10d26a3>] ? __blkdev_get+0x225/0x2c7
kernel: [231644.155836]  [<c10ed7e6>] ? register_disk+0xb0/0xfd
kernel: [231644.155843]  [<c112e33b>] ? add_disk+0x9a/0xe8
kernel: [231644.155848]  [<c112dafd>] ? exact_match+0x0/0x4
kernel: [231644.155853]  [<c112deae>] ? exact_lock+0x0/0xd
kernel: [231644.155861]  [<fc5a8b80>] ? slot_enable+0x405/0x4a5 [ndas_block]
kernel: [231644.155868]  [<fc5a8c63>] ? ndcmd_enabled_handler+0x43/0x9e [ndas_block]
kernel: [231644.155874]  [<fc5a8c20>] ? ndcmd_enabled_handler+0x0/0x9e [ndas_block]
kernel: [231644.155891]  [<fc54b22b>] ? notify_func+0x38/0x4b [ndas_core]
kernel: [231644.155906]  [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
kernel: [231644.155919]  [<fc562005>] ? _dpc_cancel+0x4c7/0x626 [ndas_core]
kernel: [231644.155933]  [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
kernel: [231644.155941]  [<c1003d47>] ? kernel_thread_helper+0x7/0x10


/* here are the output of the driver debugs. They show that this operation is
being performed on the first devices request queue. */


kernel: [231644.155948] BL|8|nblk_request_proc|/driver/block/block26.c:494|processing queue request from slot 1|nd/dpcd1,64:15:44.38,3408:10
kernel: [231644.155959] BL|8|nblk_handle_io|/driver/block/block26.c:374|struct ndas_slot sd = NDAS GET SLOT DEV(slot 1)
kernel: [231644.155966] |nd/dpcd1,64:15:44.38,3328:10
kernel: [231644.155970] BL|8|nblk_handle_io|/driver/block/block26.c:458|case READA call ndas_read(slot=1, ndas_req)|nd/dpcd1,64:15:44.38,3328:10
kernel: [231644.155979] ND|8|ndas_read|netdisk/nddev.c:824|read io: slot=1, cmd=0, req=x00|nd/dpcd1,64:15:44.38,3320:10

我希望这是足够的背景资料。此时，一个显而易见的问题可能是“分配 request _ queue 的时间和地点?”

这是在 add _ disk 函数之前处理的。添加磁盘，是日志输出的第一行。

slot->disk = NULL;
spin_lock_init(&slot->lock);
slot->queue = blk_init_queue(
nblk_request_proc,
&slot->lock
);

据我所知，这是标准操作。回到我最初的问题。我是否可以在某个地方找到请求队列，并确保它对于每个新设备都是递增的或唯一的，或者 Linux 内核对于每个 Major 数字只使用一个队列？我想知道为什么这个驱动程序在两个不同的块存储上加载相同的队列，并确定这是否会在初始注册过程中导致重复的 blchild。

谢谢你帮我处理这件事。

3967

小开

我分享了导致我发布这个问题的 bug 的解决方案。尽管它实际上并没有回答如何识别设备请求队列的问题。

上述守则载列以下资料:

if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED,
&(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags)))

Well, that "SLOT_R(req)" was causing the trouble. That is defined else where to return the gendisk device.

#define SLOT_R(_request_) SLOT((_request_)->rq_disk)

这会返回磁盘，但不会为以后的各种操作返回正确的值。因此，当额外的块设备被加载时，这个函数基本上保持返回1。(我认为它是以布尔值的形式处理的。)因此，所有请求都被堆放到磁盘1的请求队列中。

补丁是访问正确的磁盘标识值，这个值在磁盘的 private _ data 被添加到系统时已经存储在磁盘中。

Correct identifier definition:
#define SLOT_R(_request_) ( (int) _request_->rq_disk->private_data )


How the correct disk number was stored.
slot->disk->queue = slot->queue;
slot->disk->private_data = (void*) (long) s;  <-- 's' is the disk id
slot->queue_flags = 0;

Now the correct disk id is returned from private data, so all requests to the correct queue.

如前所述，这并没有展示如何识别队列。一个未受过教育的猜测可能是:

 x = (int) _request_->rq_disk->queue->id;

Ref. the request_queue function in linux http://lxr.free-electrons.com/source/include/linux/blkdev.h#L270 & 321

谢谢大家的帮助！