通过 DomDocument (PHP)加载格式不正确的 HTML 时禁用警告

我需要解析一些 HTML 文件,但是,它们不是格式良好的,PHP 打印出警告。我希望以编程方式避免这种调试/警告行为。请指示。谢谢!

密码:

// create a DOM document and load the HTML data
$xmlDoc = new DomDocument;
// this dumps out the warnings
$xmlDoc->loadHTML($fetchResult);

这个:

@$xmlDoc->loadHTML($fetchResult)

可以禁止显示这些警告,但是如何以编程方式捕获这些警告?

51449 次浏览

You can install a temporary error handler with set_error_handler

class ErrorTrap {
protected $callback;
protected $errors = array();
function __construct($callback) {
$this->callback = $callback;
}
function call() {
$result = null;
set_error_handler(array($this, 'onError'));
try {
$result = call_user_func_array($this->callback, func_get_args());
} catch (Exception $ex) {
restore_error_handler();
throw $ex;
}
restore_error_handler();
return $result;
}
function onError($errno, $errstr, $errfile, $errline) {
$this->errors[] = array($errno, $errstr, $errfile, $errline);
}
function ok() {
return count($this->errors) === 0;
}
function errors() {
return $this->errors;
}
}

Usage:

// create a DOM document and load the HTML data
$xmlDoc = new DomDocument();
$caller = new ErrorTrap(array($xmlDoc, 'loadHTML'));
// this doesn't dump out any warnings
$caller->call($fetchResult);
if (!$caller->ok()) {
var_dump($caller->errors());
}

Call

libxml_use_internal_errors(true);

prior to processing with with $xmlDoc->loadHTML()

This tells libxml2 not to send errors and warnings through to PHP. Then, to check for errors and handle them yourself, you can consult libxml_get_last_error() and/or libxml_get_errors() when you're ready:

libxml_use_internal_errors(true);
$dom->loadHTML($html);
$errors = libxml_get_errors();
foreach ($errors as $error) {
// handle the errors as you wish
}

To hide the warnings, you have to give special instructions to libxml which is used internally to perform the parsing:

libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();

The libxml_use_internal_errors(true) indicates that you're going to handle the errors and warnings yourself and you don't want them to mess up the output of your script.

This is not the same as the @ operator. The warnings get collected behind the scenes and afterwards you can retrieve them by using libxml_get_errors() in case you wish to perform logging or return the list of issues to the caller.

Whether or not you're using the collected warnings you should always clear the queue by calling libxml_clear_errors().

Preserving the state

If you have other code that uses libxml it may be worthwhile to make sure your code doesn't alter the global state of the error handling; for this, you can use the return value of libxml_use_internal_errors() to save the previous state.

// modify state
$libxml_previous_state = libxml_use_internal_errors(true);
// parse
$dom->loadHTML($html);
// handle errors
libxml_clear_errors();
// restore
libxml_use_internal_errors($libxml_previous_state);

Setting the options "LIBXML_NOWARNING" & "LIBXML_NOERROR" works perfectly fine too:

$dom->loadHTML($html, LIBXML_NOWARNING | LIBXML_NOERROR);