在 PHP 中如何使用 cURL 找到我将被重定向到的位置?

我试图让 curl 跟随一个重定向,但是我不能让它正常工作。我有一个字符串,我想作为一个 GET 参数发送到服务器,并得到结果的 URL。

例如:

字符串 = 狗头人害虫
Www.wowhead.com/search?q=kobold+worker

如果你访问那个 url,它会将你重定向到“ www.wowhead.com/npc=257”。我希望 curl 将这个 URL 返回给我的 PHP 代码,这样我就可以提取“ npc = 257”并使用它。

目前代码:

function npcID($name) {
$urltopost = "http://www.wowhead.com/search?q=" . $name;
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1");
curl_setopt($ch, CURLOPT_URL, $urltopost);
curl_setopt($ch, CURLOPT_REFERER, "http://www.wowhead.com");
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("Content-Type:application/x-www-form-urlencoded"));
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
return curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
}

然而,这将返回 Www.wowhead.com/search?q=kobold+worker而不是 Www.wowhead.com/npc=257

我怀疑 PHP 是在外部重定向发生之前返回的。我该如何解决这个问题?

278934 次浏览

要使 cURL 遵循重定向,请使用:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

呃... 我不认为你真的在做卷发... 试试:

curl_exec($ch);

... 在设置选项之后,在 curl_getinfo()电话之前。

编辑: 如果你只是想找出一个页面重定向到哪里,我会使用建议 给你,并只是使用 Curl 抓取标题和提取位置: 标题从他们:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
if (preg_match('~Location: (.*)~i', $result, $match)) {
$location = trim($match[1]);
}

上面的答案在我的一个服务器上不起作用,与 basedir 有关,所以我重新散列了一下。下面的代码适用于我所有的服务器。

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch);
curl_close( $ch );
// the returned headers
$headers = explode("\n",$a);
// if there is no redirection this will be the final url
$redir = $url;
// loop through the headers and check for a Location: str
$j = count($headers);
for($i = 0; $i < $j; $i++){
// if we find the Location header strip it and fill the redir var
if(strpos($headers[$i],"Location:") !== false){
$redir = trim(str_replace("Location:","",$headers[$i]));
break;
}
}
// do whatever you want with the result
echo $redir;

这里选择的答案是体面的,但它的大小写敏感,不保护相对 location:标题(一些网站这样做)或页面,可能实际上有短语 Location:在他们的内容... (zillow 目前这样做)。

有点草率,但是一些快速的编辑可以使它更加聪明:

function getOriginalURL($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);
$httpStatus = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);


// if it's not a redirection (3XX), move along
if ($httpStatus < 300 || $httpStatus >= 400)
return $url;


// look for a location: header to find the target URL
if(preg_match('/location: (.*)/i', $result, $r)) {
$location = trim($r[1]);


// if the location is a relative URL, attempt to make it absolute
if (preg_match('/^\/(.*)/', $location)) {
$urlParts = parse_url($url);
if ($urlParts['scheme'])
$baseURL = $urlParts['scheme'].'://';


if ($urlParts['host'])
$baseURL .= $urlParts['host'];


if ($urlParts['port'])
$baseURL .= ':'.$urlParts['port'];


return $baseURL.$location;
}


return $location;
}
return $url;
}

注意,这仍然只能进行1个重定向深度。要进一步深入,您实际上需要获取内容并遵循重定向。

有时候你需要得到 HTTP 头,但同时你又不想返回那些头。 * *

这个框架使用递归处理 cookie 和 HTTP 重定向。这里的主要思想是对客户机代码使用 以避免返回 HTTP 头

您可以在它上面构建一个非常强大的 curl 类。

<?php


class curl {


static private $cookie_file            = '';
static private $user_agent             = '';
static private $max_redirects          = 10;
static private $followlocation_allowed = true;


function __construct()
{
// set a file to store cookies
self::$cookie_file = 'cookies.txt';


// set some general User Agent
self::$user_agent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)';


if ( ! file_exists(self::$cookie_file) || ! is_writable(self::$cookie_file))
{
throw new Exception('Cookie file missing or not writable.');
}


// check for PHP settings that unfits
// correct functioning of CURLOPT_FOLLOWLOCATION
if (ini_get('open_basedir') != '' || ini_get('safe_mode') == 'On')
{
self::$followlocation_allowed = false;
}
}


/**
* Main method for GET requests
* @param  string $url URI to get
* @return string      request's body
*/
static public function get($url)
{
$process = curl_init($url);


self::_set_basic_options($process);


// this function is in charge of output request's body
// so DO NOT include HTTP headers
curl_setopt($process, CURLOPT_HEADER, 0);


if (self::$followlocation_allowed)
{
// if PHP settings allow it use AUTOMATIC REDIRECTION
curl_setopt($process, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($process, CURLOPT_MAXREDIRS, self::$max_redirects);
}
else
{
curl_setopt($process, CURLOPT_FOLLOWLOCATION, false);
}


$return = curl_exec($process);


if ($return === false)
{
throw new Exception('Curl error: ' . curl_error($process));
}


// test for redirection HTTP codes
$code = curl_getinfo($process, CURLINFO_HTTP_CODE);
if ($code == 301 || $code == 302)
{
curl_close($process);


try
{
// go to extract new Location URI
$location = self::_parse_redirection_header($url);
}
catch (Exception $e)
{
throw $e;
}


// IMPORTANT return
return self::get($location);
}


curl_close($process);


return $return;
}


static function _set_basic_options($process)
{


curl_setopt($process, CURLOPT_USERAGENT, self::$user_agent);
curl_setopt($process, CURLOPT_COOKIEFILE, self::$cookie_file);
curl_setopt($process, CURLOPT_COOKIEJAR, self::$cookie_file);
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1);
// curl_setopt($process, CURLOPT_VERBOSE, 1);
// curl_setopt($process, CURLOPT_SSL_VERIFYHOST, false);
// curl_setopt($process, CURLOPT_SSL_VERIFYPEER, false);
}


static function _parse_redirection_header($url)
{
$process = curl_init($url);


self::_set_basic_options($process);


// NOW we need to parse HTTP headers
curl_setopt($process, CURLOPT_HEADER, 1);


$return = curl_exec($process);


if ($return === false)
{
throw new Exception('Curl error: ' . curl_error($process));
}


curl_close($process);


if ( ! preg_match('#Location: (.*)#', $return, $location))
{
throw new Exception('No Location found');
}


if (self::$max_redirects-- <= 0)
{
throw new Exception('Max redirections reached trying to get: ' . $url);
}


return trim($location[1]);
}


}

你可使用:

$redirectURL = curl_getinfo($ch,CURLINFO_REDIRECT_URL);

将此行添加到 curl 初始化

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

并在 curl _ close 之前使用 getinfo

$redirectURL = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL );

答:

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
$html = curl_exec($ch);
$redirectURL = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL );
curl_close($ch);

这里有很多正则表达式,尽管事实上我真的很喜欢这种方式可能对我来说更稳定:

$resultCurl=curl_exec($curl); //get curl result
//Optional line if you want to store the http status code
$headerHttpCode=curl_getinfo($curl,CURLINFO_HTTP_CODE);


//let's use dom and xpath
$dom = new \DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($resultCurl, LIBXML_HTML_NODEFDTD);
libxml_use_internal_errors(false);
$xpath = new \DOMXPath($dom);
$head=$xpath->query("/html/body/p/a/@href");


$newUrl=$head[0]->nodeValue;

位置部分是 apache 发送的 HTML 中的一个链接,所以 Xpath 非常适合恢复它。