如何获得 DOMNode 的 innerHTML?

在 PHPDOM 实现中,您使用什么函数来获取给定 DOMNode 的 innerHTML?有人能给出可靠的解决方案吗?

当然 outerHTML 也可以。

108210 次浏览

Compare this updated variant with PHP Manual User Note #89718:

<?php
function DOMinnerHTML(DOMNode $element)
{
$innerHTML = "";
$children  = $element->childNodes;


foreach ($children as $child)
{
$innerHTML .= $element->ownerDocument->saveHTML($child);
}


return $innerHTML;
}
?>

Example:

<?php
$dom= new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput       = true;
$dom->load($html_string);


$domTables = $dom->getElementsByTagName("table");


// Iterate over DOMNodeList (Implements Traversable)
foreach ($domTables as $table)
{
echo DOMinnerHTML($table);
}
?>
function setnodevalue($doc, $node, $newvalue){
while($node->childNodes->length> 0){
$node->removeChild($node->firstChild);
}
$fragment= $doc->createDocumentFragment();
$fragment->preserveWhiteSpace= false;
if(!empty($newvalue)){
$fragment->appendXML(trim($newvalue));
$nod= $doc->importNode($fragment, true);
$node->appendChild($nod);
}
}

To return the html of an element, you can use C14N():

$dom = new DOMDocument();
$dom->loadHtml($html);
$x = new DOMXpath($dom);
foreach($x->query('//table') as $table){
echo $table->C14N();
}

A simplified version of Haim Evgi's answer:

<?php


function innerHTML(\DOMElement $element)
{
$doc = $element->ownerDocument;


$html = '';


foreach ($element->childNodes as $node) {
$html .= $doc->saveHTML($node);
}


return $html;
}

Example usage:

<?php


$doc = new \DOMDocument();
$doc->loadHTML("<body><div id='foo'><p>This is <b>an <i>example</i></b> paragraph<br>\n\ncontaining newlines.</p><p>This is another paragraph.</p></div></body>");


print innerHTML($doc->getElementById('foo'));


/*
<p>This is <b>an <i>example</i></b> paragraph<br>


containing newlines.</p>
<p>This is another paragraph.</p>
*/

There's no need to set preserveWhiteSpace or formatOutput.

Here is a version in a functional programming style:

function innerHTML($node) {
return implode(array_map([$node->ownerDocument,"saveHTML"],
iterator_to_array($node->childNodes)));
}

In addition to trincot's nice version with array_map and implode but this time with array_reduce:

return array_reduce(
iterator_to_array($node->childNodes),
function ($carry, \DOMNode $child) {
return $carry.$child->ownerDocument->saveHTML($child);
}
);

Still don't understand, why there's no reduce() method which accepts arrays and iterators alike.

Here's another approach based on this comment by Drupella on php.net, that worked well for my project. It defines the innerHTML() by creating a new DOMDocument, importing and appending to it the target node, instead of explicitly iterating over child nodes.

InnerHTML

Let's define this helper function:

function innerHTML( \DOMNode $n, $include_target_tag = true ) {
$doc = new \DOMDocument();
$doc->appendChild( $doc->importNode( $n, true ) );
$html = trim( $doc->saveHTML() );
if ( $include_target_tag ) {
return $html;
}
return preg_replace( '@^<' . $n->nodeName .'[^>]*>|</'. $n->nodeName .'>$@', '', $html );
}

where we can include/exclude the outer target tag through the second input argument.

Usage Example

Here we extract the inner HTML for a target tag given by the "first" id attribute:

$html = '<div id="first"><h1>Hello</h1></div><div id="second"><p>World!</p></div>';
$doc  = new \DOMDocument();
$doc->loadHTML( $html );
$node = $doc->getElementById( 'first' );


if ( $node instanceof \DOMNode ) {


echo innerHTML( $node, true );
// Output: <div id="first"><h1>Hello</h1></div>


echo innerHTML( $node, false );
// Output: <h1>Hello</h1>
}

Live example:

http://sandbox.onlinephpfunctions.com/code/2714ea116aad9957c3c437d46134a1688e9133b8

Old query, but there is a built-in method to do that. Just pass the target node to DomDocument->saveHtml().

Full example:

$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);

Output: <p>ciao questa è una <b>prova</b>.</p>

For people who want to get the HTML from XPath query, here is my version:

$xpath = new DOMXpath( $my_dom_object );


$DOMNodeList = $xpath->query('//div[contains(@class, "some_custom_class_in_html")]');


if( $DOMNodeList->count() > 0 ) {
$page_html = $my_dom_object->saveHTML( $DOMNodeList->item(0) );
}