忽略 robot.txt 中带有特定参数的 URL?

我希望 Google 忽略这样的 URL:

http://www.mydomain.example/new-printers?dir=asc&order=price&p=3

换句话说,所有包含参数 dirorderprice的 URL 都应该被忽略。我如何做到这一点与 robots.txt

83023 次浏览

Here's a solutions if you want to disallow query strings:

Disallow: /*?*

or if you want to be more precise on your query string:

Disallow: /*?dir=*&order=*&p=*

You can also add to the robots.txt which url to allow

Allow: /new-printer$

The $ will make sure only the /new-printer will be allowed.

More info:

http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

http://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/

Register your website with Google WebMaster Tools. There you can tell Google how to deal with your parameters.

Site Configuration -> URL Parameters

You should have the pages that contain those parameters indicate that they should be excluded from indexing via the robots meta tag. e.g.

You can block those specific query string parameters with the following lines

Disallow: /*?*dir=
Disallow: /*?*order=
Disallow: /*?*p=

So if any URL contains dir=, order=, or p= anywhere in the query string, it will be blocked.