在 PHP 中测试404 URL 的简单方法？

小开

我找到了这个答案:

if(($twitter_XML_raw=file_get_contents($timeline))==false){
// Retrieve HTTP status code
list($version,$status_code,$msg) = explode(' ',$http_response_header[0], 3);


// Check the HTTP Status code
switch($status_code) {
case 200:
$error_status="200: Success";
break;
case 401:
$error_status="401: Login failure.  Try logging out and back in.  Password are ONLY used when posting.";
break;
case 400:
$error_status="400: Invalid request.  You may have exceeded your rate limit.";
break;
case 404:
$error_status="404: Not found.  This shouldn't happen.  Please let me know what happened using the feedback link above.";
break;
case 500:
$error_status="500: Twitter servers replied with an error. Hopefully they'll be OK soon!";
break;
case 502:
$error_status="502: Twitter servers may be down or being upgraded. Hopefully they'll be OK soon!";
break;
case 503:
$error_status="503: Twitter service unavailable. Hopefully they'll be OK soon!";
break;
default:
$error_status="Undocumented error: " . $status_code;
break;
}

实际上，您使用“ file get content”方法来检索 URL，该 URL 将自动用状态代码填充 http 响应头变量。

小开

最佳答案

如果您正在使用 PHP 的 curl绑定，您可以使用 curl_getinfo检查错误代码:

$handle = curl_init($url);
curl_setopt($handle,  CURLOPT_RETURNTRANSFER, TRUE);


/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);


/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 404) {
/* Handle 404 here. */
}


curl_close($handle);


/* Handle $response here. */

小开

正如 strager 所建议的，考虑使用 cURL。您可能还有兴趣使用 Curl _ setopt设置 CURLOPT _ NOBODY 以跳过下载整个页面(您只需要标题)。

小开

这只是一小段代码, 希望对你有用

            $ch = @curl_init();
@curl_setopt($ch, CURLOPT_URL, 'http://example.com');
@curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1");
@curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
@curl_setopt($ch, CURLOPT_TIMEOUT, 10);


$response       = @curl_exec($ch);
$errno          = @curl_errno($ch);
$error          = @curl_error($ch);


$response = $response;
$info = @curl_getinfo($ch);
return $info['http_code'];

小开

如果你的 php5正在运行，你可以使用:

$url = 'http://www.example.com';
print_r(get_headers($url, 1));

另外，使用 php4的用户提供了以下信息:

/**
This is a modified version of code from "stuart at sixletterwords dot com", at 14-Sep-2005 04:52. This version tries to emulate get_headers() function at PHP4. I think it works fairly well, and is simple. It is not the best emulation available, but it works.


Features:
- supports (and requires) full URLs.
- supports changing of default port in URL.
- stops downloading from socket as soon as end-of-headers is detected.


Limitations:
- only gets the root URL (see line with "GET / HTTP/1.1").
- don't support HTTPS (nor the default HTTPS port).
*/


if(!function_exists('get_headers'))
{
function get_headers($url,$format=0)
{
$url=parse_url($url);
$end = "\r\n\r\n";
$fp = fsockopen($url['host'], (empty($url['port'])?80:$url['port']), $errno, $errstr, 30);
if ($fp)
{
$out  = "GET / HTTP/1.1\r\n";
$out .= "Host: ".$url['host']."\r\n";
$out .= "Connection: Close\r\n\r\n";
$var  = '';
fwrite($fp, $out);
while (!feof($fp))
{
$var.=fgets($fp, 1280);
if(strpos($var,$end))
break;
}
fclose($fp);


$var=preg_replace("/\r\n\r\n.*\$/",'',$var);
$var=explode("\r\n",$var);
if($format)
{
foreach($var as $i)
{
if(preg_match('/^([a-zA-Z -]+): +(.*)$/',$i,$parts))
$v[$parts[1]]=$parts[2];
}
return $v;
}
else
return $var;
}
}
}

两者的结果都类似于:

Array
(
[0] => HTTP/1.1 200 OK
[Date] => Sat, 29 May 2004 12:28:14 GMT
[Server] => Apache/1.3.27 (Unix)  (Red-Hat/Linux)
[Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
[ETag] => "3f80f-1b6-3e1cb03b"
[Accept-Ranges] => bytes
[Content-Length] => 438
[Connection] => close
[Content-Type] => text/html
)

因此，您可以检查头部响应是否正常，例如:

$headers = get_headers($url, 1);
if ($headers[0] == 'HTTP/1.1 200 OK') {
//valid
}


if ($headers[0] == 'HTTP/1.1 301 Moved Permanently') {
//moved or redirect page
}

W3C 代码和定义

小开

使用 strager 的代码，您还可以检查 CURLINFO _ HTTP _ CODE 以查找其他代码。有些网站不报告404，而是简单地重定向到一个自定义的404页面并返回302(重定向)或类似的东西。我使用它来检查服务器上是否存在实际的文件(例如 robots.txt)。显然，如果存在这种文件，它不会导致重定向，但如果不存在，它会重定向到一个404页面，正如我之前所说，这个页面可能没有404代码。

function is_404($url) {
$handle = curl_init($url);
curl_setopt($handle,  CURLOPT_RETURNTRANSFER, TRUE);


/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);


/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
curl_close($handle);


/* If the document has loaded successfully without any redirection or error */
if ($httpCode >= 200 && $httpCode < 300) {
return false;
} else {
return true;
}
}

小开

如果你正在寻找一个简单的解决方案，你可以尝试在一个去 php5做

file_get_contents('www.yoursite.com');
//and check by echoing
echo $http_response_header[0];

小开

作为对公认答案的补充提示:

在使用所提出的解决方案的一个变体时，由于 php 设置了“ max _ execute _ time”，我得到了一些错误。所以我做了以下几件事:

set_time_limit(120);
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_NOBODY, true);
$result = curl_exec($curl);
set_time_limit(ini_get('max_execution_time'));
curl_close($curl);

首先我将时间限制设置为更高的秒数，最后我将其设置为 php 设置中定义的值。

小开

你也可以使用这个代码，查看任何链接的状态:

<?php


function get_url_status($url, $timeout = 10)
{
$ch = curl_init();
// set cURL options
$opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
CURLOPT_URL => $url,            // set URL
CURLOPT_NOBODY => true,         // do a HEAD request only
CURLOPT_TIMEOUT => $timeout);   // set timeout
curl_setopt_array($ch, $opts);
curl_exec($ch); // do it!
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE); // find HTTP status
curl_close($ch); // close handle
echo $status; //or return $status;
//example checking
if ($status == '302') { echo 'HEY, redirection';}
}


get_url_status('http://yourpage.comm');
?>

小开

这里有一个简短的解决方案。

$handle = curl_init($uri);
curl_setopt($handle,  CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($handle,CURLOPT_HTTPHEADER,array ("Accept: application/rdf+xml"));
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_exec($handle);
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 200||$httpCode == 303)
{
echo "you might get a reply";
}
curl_close($handle);

在您的情况下，您可以将 application/rdf+xml更改为您使用的任何内容。

小开

考虑性能对这3种方法进行了测试。

结果，至少在我的测试环境中是这样的:

科尔赢了

这个测试是在只需要头(noBody)的情况下进行的。测试你自己:

$url = "http://de.wikipedia.org/wiki/Pinocchio";


$start_time = microtime(TRUE);
$headers = get_headers($url);
echo $headers[0]."<br>";
$end_time = microtime(TRUE);
echo $end_time - $start_time."<br>";




$start_time = microtime(TRUE);
$response = file_get_contents($url);
echo $http_response_header[0]."<br>";
$end_time = microtime(TRUE);
echo $end_time - $start_time."<br>";


$start_time = microtime(TRUE);
$handle = curl_init($url);
curl_setopt($handle,  CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($handle, CURLOPT_NOBODY, 1); // and *only* get the header
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
// if($httpCode == 404) {
// /* Handle 404 here. */
// }
echo $httpCode."<br>";
curl_close($handle);
$end_time = microtime(TRUE);
echo $end_time - $start_time."<br>";

小开

<?php


$url= 'www.something.com';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.4");
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_ENCODING, "gzip");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$output = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);




echo $httpcode;
?>

小开

如果 url 不返回200，这将给你真

function check_404($url) {
$headers=get_headers($url, 1);
if ($headers[0]!='HTTP/1.1 200 OK') return true; else return false;
}

小开

有办法了！

<?php


$url = "http://www.google.com";


if(@file_get_contents($url)){
echo "Url Exists!";
} else {
echo "Url Doesn't Exist!";
}


?>

这个简单的脚本只是向 URL 请求其源代码。如果请求成功完成，它将输出“ URL 存在!”.如果没有，它将输出“ URL 不存在!”.

小开

这个函数返回 PHP 7中 URL 的状态代码:

/**
* @param string $url
* @return bool
*/
function isHttpStatusCode200(string $url): bool
{
return getHttpResponseCode($url) === 200;
}


/**
* @param string $url
* @return int
*/
function getHttpResponseCode(string $url): int
{
$headers = get_headers($url);
return substr($headers[0], 9, 3);
}

例如:

echo isHttpStatusCode200('https://www.google.com');
//displays: true