如何从一个 php curl 得到 cookie 到一个变量

因此,其他公司的一些人认为,如果不使用肥皂或 xml-rpc 或 rest 或任何其他合理的通信协议,而是将所有响应作为 cookie 嵌入到头文件中,那将是非常棒的。

我需要把这些 cookie 从这个 curl 响应中提取出来,希望它们是一个数组。如果我必须为此浪费大量时间编写解析器,我会非常不高兴。

有人知道如何简单地实现这一点吗? 最好不要向文件中写任何内容?

如果有人能帮我解决这个问题,我将非常感激。

210403 次浏览

如果使用 CURLOPT _ COOKIE _ FILE,CURLOPT _ COOKIE _ JAR curl 将把 cookie 从/读/写到一个文件中。在 curl 完成后,您可以根据自己的需要读取和/或修改它。

我的理解是,来自 curl的 cookie 必须写到一个文件(curl -c cookie_file)中。如果您通过 PHP 的 execsystem函数(或该系列中的任何函数)运行 curl,那么您应该能够将 cookie 保存到一个文件中,然后打开该文件并读取它们。

$ch = curl_init('http://www.google.com/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// get headers too with this line
curl_setopt($ch, CURLOPT_HEADER, 1);
$result = curl_exec($ch);
// get cookie
// multi-cookie variant contributed by @Combuster in comments
preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $result, $matches);
$cookies = array();
foreach($matches[1] as $item) {
parse_str($item, $cookie);
$cookies = array_merge($cookies, $cookie);
}
var_dump($cookies);

Libcurl 还提供 CURLOPT _ COOKIELIST,它提取所有已知 cookie。您所需要的只是确保 PHP/CURL 绑定可以使用它。

这样做不需要 regexp,但需要 PECL HTTP 扩展

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$result = curl_exec($ch);
curl_close($ch);


$headers = http_parse_headers($result);
$cookobjs = Array();
foreach($headers AS $k => $v){
if (strtolower($k)=="set-cookie"){
foreach($v AS $k2 => $v2){
$cookobjs[] = http_parse_cookie($v2);
}
}
}


$cookies = Array();
foreach($cookobjs AS $row){
$cookies[] = $row->cookies;
}


$tmp = Array();
// sort k=>v format
foreach($cookies AS $v){
foreach ($v  AS $k1 => $v1){
$tmp[$k1]=$v1;
}
}


$cookies = $tmp;
print_r($cookies);

尽管这个问题已经很老了,而且接受的响应是有效的,但是我发现它有点令人不舒服,因为 HTTP 响应的内容(HTML、 XML、 JSON、二进制或其他)与标题混在了一起。

我找到了另一个选择。CURL 提供了一个选项(CURLOPT_HEADERFUNCTION)来设置将为每个响应头行调用的回调。该函数将接收 curl 对象和包含标题行的字符串。

您可以使用这样的代码(改编自 TML 响应) :

$cookies = Array();
$ch = curl_init('http://www.google.com/');
// Ask for the callback.
curl_setopt($ch, CURLOPT_HEADERFUNCTION, "curlResponseHeaderCallback");
$result = curl_exec($ch);
var_dump($cookies);


function curlResponseHeaderCallback($ch, $headerLine) {
global $cookies;
if (preg_match('/^Set-Cookie:\s*([^;]*)/mi', $headerLine, $cookie) == 1)
$cookies[] = $cookie;
return strlen($headerLine); // Needed by curl
}

这个解决方案有使用全局变量的缺点,但是我想这对于短脚本来说不是问题。并且,如果 curl 被包装到类中,则始终可以使用静态方法和属性。

这里有人可能会觉得有用。Hhb _ curl _ Exec2的工作原理与 curl _ exec 非常相似,但 arg3是一个数组,将用返回的 http 头(数字索引)填充,arg4是一个数组,将用返回的 cookie 填充($cookies [“ expires”] = > “ Fri,06-May-201605:58:51 GMT”) ,arg5将用... ... 有关 curl 发出的原始请求的信息填充。

缺点是它需要 CURLOPT _ RETURNTRANSFER 处于打开状态,否则它将出错,并且如果您已经将 CURLOPT _ STDERR 还有 CURLOPT _ VERBOSE 用于其他用途,它将覆盖 CURLOPT _ STDERR 还有 CURLOPT _ VERBOSE。.(我可能稍后会解决这个问题)

如何使用它的例子:

<?php
header("content-type: text/plain;charset=utf8");
$ch=curl_init();
$headers=array();
$cookies=array();
$debuginfo="";
$body="";
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
$body=hhb_curl_exec2($ch,'https://www.youtube.com/',$headers,$cookies,$debuginfo);
var_dump('$cookies:',$cookies,'$headers:',$headers,'$debuginfo:',$debuginfo,'$body:',$body);

和函数本身。

function hhb_curl_exec2($ch, $url, &$returnHeaders = array(), &$returnCookies = array(), &$verboseDebugInfo = "")
{
$returnHeaders    = array();
$returnCookies    = array();
$verboseDebugInfo = "";
if (!is_resource($ch) || get_resource_type($ch) !== 'curl') {
throw new InvalidArgumentException('$ch must be a curl handle!');
}
if (!is_string($url)) {
throw new InvalidArgumentException('$url must be a string!');
}
$verbosefileh = tmpfile();
$verbosefile  = stream_get_meta_data($verbosefileh);
$verbosefile  = $verbosefile['uri'];
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_STDERR, $verbosefileh);
curl_setopt($ch, CURLOPT_HEADER, 1);
$html             = hhb_curl_exec($ch, $url);
$verboseDebugInfo = file_get_contents($verbosefile);
curl_setopt($ch, CURLOPT_STDERR, NULL);
fclose($verbosefileh);
unset($verbosefile, $verbosefileh);
$headers       = array();
$crlf          = "\x0d\x0a";
$thepos        = strpos($html, $crlf . $crlf, 0);
$headersString = substr($html, 0, $thepos);
$headerArr     = explode($crlf, $headersString);
$returnHeaders = $headerArr;
unset($headersString, $headerArr);
$htmlBody = substr($html, $thepos + 4); //should work on utf8/ascii headers... utf32? not so sure..
unset($html);
//I REALLY HOPE THERE EXIST A BETTER WAY TO GET COOKIES.. good grief this looks ugly..
//at least it's tested and seems to work perfectly...
$grabCookieName = function($str)
{
$ret = "";
$i   = 0;
for ($i = 0; $i < strlen($str); ++$i) {
if ($str[$i] === ' ') {
continue;
}
if ($str[$i] === '=') {
break;
}
$ret .= $str[$i];
}
return urldecode($ret);
};
foreach ($returnHeaders as $header) {
//Set-Cookie: crlfcoookielol=crlf+is%0D%0A+and+newline+is+%0D%0A+and+semicolon+is%3B+and+not+sure+what+else
/*Set-Cookie:ci_spill=a%3A4%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%22305d3d67b8016ca9661c3b032d4319df%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A14%3A%2285.164.158.128%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A109%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F43.0.2357.132+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1436874639%3B%7Dcab1dd09f4eca466660e8a767856d013; expires=Tue, 14-Jul-2015 13:50:39 GMT; path=/
Set-Cookie: sessionToken=abc123; Expires=Wed, 09 Jun 2021 10:18:14 GMT;
//Cookie names cannot contain any of the following '=,; \t\r\n\013\014'
//
*/
if (stripos($header, "Set-Cookie:") !== 0) {
continue;
/**/
}
$header = trim(substr($header, strlen("Set-Cookie:")));
while (strlen($header) > 0) {
$cookiename                 = $grabCookieName($header);
$returnCookies[$cookiename] = '';
$header                     = substr($header, strlen($cookiename) + 1); //also remove the =
if (strlen($header) < 1) {
break;
}
;
$thepos = strpos($header, ';');
if ($thepos === false) { //last cookie in this Set-Cookie.
$returnCookies[$cookiename] = urldecode($header);
break;
}
$returnCookies[$cookiename] = urldecode(substr($header, 0, $thepos));
$header                     = trim(substr($header, $thepos + 1)); //also remove the ;
}
}
unset($header, $cookiename, $thepos);
return $htmlBody;
}


function hhb_curl_exec($ch, $url)
{
static $hhb_curl_domainCache = "";
//$hhb_curl_domainCache=&$this->hhb_curl_domainCache;
//$ch=&$this->curlh;
if (!is_resource($ch) || get_resource_type($ch) !== 'curl') {
throw new InvalidArgumentException('$ch must be a curl handle!');
}
if (!is_string($url)) {
throw new InvalidArgumentException('$url must be a string!');
}


$tmpvar = "";
if (parse_url($url, PHP_URL_HOST) === null) {
if (substr($url, 0, 1) !== '/') {
$url = $hhb_curl_domainCache . '/' . $url;
} else {
$url = $hhb_curl_domainCache . $url;
}
}
;


curl_setopt($ch, CURLOPT_URL, $url);
$html = curl_exec($ch);
if (curl_errno($ch)) {
throw new Exception('Curl error (curl_errno=' . curl_errno($ch) . ') on url ' . var_export($url, true) . ': ' . curl_error($ch));
// echo 'Curl error: ' . curl_error($ch);
}
if ($html === '' && 203 != ($tmpvar = curl_getinfo($ch, CURLINFO_HTTP_CODE)) /*203 is "success, but no output"..*/ ) {
throw new Exception('Curl returned nothing for ' . var_export($url, true) . ' but HTTP_RESPONSE_CODE was ' . var_export($tmpvar, true));
}
;
//remember that curl (usually) auto-follows the "Location: " http redirects..
$hhb_curl_domainCache = parse_url(curl_getinfo($ch, CURLINFO_EFFECTIVE_URL), PHP_URL_HOST);
return $html;
}

接受的答案似乎将搜索整个响应消息。如果单词“ Set-Cookie”位于一行的开头,这可能会给出与 Cookie 头的错误匹配。虽然在大多数情况下应该没问题。更安全的方法可能是从头到尾阅读消息,直到第一个表示消息标题结束的空行。这只是一个替代解决方案,它应该查找第一个空行,然后在这些行上使用 preg _ grep,只找到“ Set-Cookie”。

    curl_setopt($ch, CURLOPT_HEADER, 1);
//Return everything
$res = curl_exec($ch);
//Split into lines
$lines = explode("\n", $res);
$headers = array();
$body = "";
foreach($lines as $num => $line){
$l = str_replace("\r", "", $line);
//Empty line indicates the start of the message body and end of headers
if(trim($l) == ""){
$headers = array_slice($lines, 0, $num);
$body = $lines[$num + 1];
//Pull only cookies out of the headers
$cookies = preg_grep('/^Set-Cookie:/', $headers);
break;
}
}