如何从Java漂亮打印XML ?

我有一个包含XML的Java字符串,没有换行或缩进。我想把它变成一个字符串与格式良好的XML。我怎么做呢?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意:我的输入是字符串。我的输出是字符串

(基本)模拟结果:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<tag>
<nested>hello</nested>
</tag>
</root>
532862 次浏览
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
// initialize StreamResult with File object to save to file
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
System.out.println(xmlString);

注意:根据Java版本的不同,结果可能有所不同。搜索特定于您的平台的解决方案。

由于从String开始,在使用Transformer之前,需要转换到DOM对象(例如Node)。但是,如果您知道XML字符串是有效的,并且您不希望引起将字符串解析为DOM的内存开销,然后在DOM上运行转换以返回字符串—您可以通过字符解析来执行一些老式的字符。在每个</...>字符后插入换行符和空格,保留和缩进计数器(以确定空格的数量),为每个<...>递增,为每个</...>递减。

免责声明-我对下面的函数做了剪切/粘贴/文本编辑,所以它们可能不能按原样编译。

public static final Element createDOM(String strXML)
throws ParserConfigurationException, SAXException, IOException {


DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource sourceXML = new InputSource(new StringReader(strXML));
Document xmlDoc = db.parse(sourceXML);
Element e = xmlDoc.getDocumentElement();
e.normalize();
return e;
}


public static final void prettyPrint(Node xml, OutputStream out)
throws TransformerConfigurationException, TransformerFactoryConfigurationError, TransformerException {
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.transform(new DOMSource(xml), new StreamResult(out));
}

这是我自己问题的答案。我将各种结果的答案结合起来,编写了一个输出XML的类。

不保证它如何响应无效的XML或大型文档。

package ecb.sdw.pretty;


import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;


import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;


/**
* Pretty-prints xml, supplied as a string.
* <p/>
* eg.
* <code>
* String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
* </code>
*/
public class XmlFormatter {


public XmlFormatter() {
}


public String format(String unformattedXml) {
try {
final Document document = parseXmlFile(unformattedXml);


OutputFormat format = new OutputFormat(document);
format.setLineWidth(65);
format.setIndenting(true);
format.setIndent(2);
Writer out = new StringWriter();
XMLSerializer serializer = new XMLSerializer(out, format);
serializer.serialize(document);


return out.toString();
} catch (IOException e) {
throw new RuntimeException(e);
}
}


private Document parseXmlFile(String in) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(in));
return db.parse(is);
} catch (ParserConfigurationException e) {
throw new RuntimeException(e);
} catch (SAXException e) {
throw new RuntimeException(e);
} catch (IOException e) {
throw new RuntimeException(e);
}
}


public static void main(String[] args) {
String unformattedXml =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
"        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
"        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
"    <Query>\n" +
"        <query:CategorySchemeWhere>\n" +
"   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
"        </query:CategorySchemeWhere>\n" +
"    </Query>\n\n\n\n\n" +
"</QueryMessage>";


System.out.println(new XmlFormatter().format(unformattedXml));
}


}

有一个非常好的命令行XML实用程序叫做xmlstarlet(http://xmlstar.sourceforge.net/),它可以做很多事情,很多人都在使用。

您可以使用Runtime以编程方式执行此程序。然后读入格式化的输出文件。它具有比几行Java代码所能提供的更多选项和更好的错误报告。

下载xmlstarlet: http://sourceforge.net/project/showfiles.php?group_id=66612&package_id=64589

我在过去使用org.dom4j.io.OutputFormat.createPrettyPrint ()方法打印过

public String prettyPrint(final String xml){


if (StringUtils.isBlank(xml)) {
throw new RuntimeException("xml was null or blank in prettyPrint()");
}


final StringWriter sw;


try {
final OutputFormat format = OutputFormat.createPrettyPrint();
final org.dom4j.Document document = DocumentHelper.parseText(xml);
sw = new StringWriter();
final XMLWriter writer = new XMLWriter(sw, format);
writer.write(document);
}
catch (Exception e) {
throw new RuntimeException("Error pretty printing xml:\n" + xml, e);
}
return sw.toString();
}

关于“您必须首先构建DOM树”的评论:不,您不需要也不应该这样做。

相反,创建一个StreamSource(new StreamSource(new StringReader(str)),并将其提供给前面提到的标识转换器。这将使用SAX解析器,结果将快得多。 在这种情况下,构建中间树纯粹是开销。

如果使用第三方XML库是可以的,你可以使用比当前好像是 答案建议的简单得多的东西。

它声明输入和输出都应该是字符串,所以这里有一个实用程序方法,用< a href = " http://xom。nu" rel="nofollow noreferrer">XOM库实现:

import nu.xom.*;
import java.io.*;


[...]


public static String format(String xml) throws ParsingException, IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
Serializer serializer = new Serializer(out);
serializer.setIndent(4);  // or whatever you like
serializer.write(new Builder().build(xml, ""));
return out.toString("UTF-8");
}

我测试了它的工作,结果取决于你的JRE版本或类似的东西。要了解如何根据自己的喜好定制输出格式,请查看Serializer API。

这实际上比我想象的要长——需要一些额外的行,因为Serializer想要写入OutputStream。但是请注意,这里很少有用于实际XML处理的代码。

(这个答案是我对XOM的评估的一部分,它是建议作为我的关于最佳Java XML库的问题中的一个选项来取代dom4j。在dom4j中,你可以使用XMLWriterOutputFormat轻松实现这一点。编辑:…如mlo55的回答所示。)

基于这个答案的一个更简单的解决方案:

public static String prettyFormat(String input, int indent) {
try {
Source xmlInput = new StreamSource(new StringReader(input));
StringWriter stringWriter = new StringWriter();
StreamResult xmlOutput = new StreamResult(stringWriter);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", indent);
transformerFactory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
transformerFactory.setAttribute(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, "");
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(xmlInput, xmlOutput);
return xmlOutput.getWriter().toString();
} catch (Exception e) {
throw new RuntimeException(e); // simple exception handling, please review it
}
}


public static String prettyFormat(String input) {
return prettyFormat(input, 2);
}

testcase:

prettyFormat("<root><child>aaa</child><child/></root>");

返回:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<child>aaa</child>
<child/>
</root>

//忽略:原始编辑只需要在代码中的类名中缺少s。为了在SO上获得超过6个字符的验证,添加了多余的6个字符

< p >嗯…面对这样的事情,这是一个已知的bug… 只需添加这个OutputProperty ..

transformer.setOutputProperty(OutputPropertiesFactory.S_KEY_INDENT_AMOUNT, "8");

希望这对你有所帮助……

下面是一种使用dom4j的方法:

进口:

import org.dom4j.Document;
import org.dom4j.DocumentHelper;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.XMLWriter;

代码:

String xml = "<your xml='here'/>";
Document doc = DocumentHelper.parseText(xml);
StringWriter sw = new StringWriter();
OutputFormat format = OutputFormat.createPrettyPrint();
XMLWriter xw = new XMLWriter(sw, format);
xw.write(doc);
String result = sw.toString();
Kevin Hakanson说: 但是,如果您知道您的XML字符串是有效的,并且您不想引起将字符串解析为DOM的内存开销,然后在DOM上运行转换以获得字符串—您可以通过字符解析进行一些老式的字符。在每个字符后插入换行符和空格,保持和缩进计数器(以确定空格的数量),您为每个<…>和递减你看到的每一个。" < / p >

同意了。这种方法要快得多,依赖关系也少得多。

示例解决方案:

/**
* XML utils, including formatting.
*/
public class XmlUtils
{
private static XmlFormatter formatter = new XmlFormatter(2, 80);


public static String formatXml(String s)
{
return formatter.format(s, 0);
}


public static String formatXml(String s, int initialIndent)
{
return formatter.format(s, initialIndent);
}


private static class XmlFormatter
{
private int indentNumChars;
private int lineLength;
private boolean singleLine;


public XmlFormatter(int indentNumChars, int lineLength)
{
this.indentNumChars = indentNumChars;
this.lineLength = lineLength;
}


public synchronized String format(String s, int initialIndent)
{
int indent = initialIndent;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++)
{
char currentChar = s.charAt(i);
if (currentChar == '<')
{
char nextChar = s.charAt(i + 1);
if (nextChar == '/')
indent -= indentNumChars;
if (!singleLine)   // Don't indent before closing element if we're creating opening and closing elements on a single line.
sb.append(buildWhitespace(indent));
if (nextChar != '?' && nextChar != '!' && nextChar != '/')
indent += indentNumChars;
singleLine = false;  // Reset flag.
}
sb.append(currentChar);
if (currentChar == '>')
{
if (s.charAt(i - 1) == '/')
{
indent -= indentNumChars;
sb.append("\n");
}
else
{
int nextStartElementPos = s.indexOf('<', i);
if (nextStartElementPos > i + 1)
{
String textBetweenElements = s.substring(i + 1, nextStartElementPos);


// If the space between elements is solely newlines, let them through to preserve additional newlines in source document.
if (textBetweenElements.replaceAll("\n", "").length() == 0)
{
sb.append(textBetweenElements + "\n");
}
// Put tags and text on a single line if the text is short.
else if (textBetweenElements.length() <= lineLength * 0.5)
{
sb.append(textBetweenElements);
singleLine = true;
}
// For larger amounts of text, wrap lines to a maximum line length.
else
{
sb.append("\n" + lineWrap(textBetweenElements, lineLength, indent, null) + "\n");
}
i = nextStartElementPos - 1;
}
else
{
sb.append("\n");
}
}
}
}
return sb.toString();
}
}


private static String buildWhitespace(int numChars)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < numChars; i++)
sb.append(" ");
return sb.toString();
}


/**
* Wraps the supplied text to the specified line length.
* @lineLength the maximum length of each line in the returned string (not including indent if specified).
* @indent optional number of whitespace characters to prepend to each line before the text.
* @linePrefix optional string to append to the indent (before the text).
* @returns the supplied text wrapped so that no line exceeds the specified line length + indent, optionally with
* indent and prefix applied to each line.
*/
private static String lineWrap(String s, int lineLength, Integer indent, String linePrefix)
{
if (s == null)
return null;


StringBuilder sb = new StringBuilder();
int lineStartPos = 0;
int lineEndPos;
boolean firstLine = true;
while(lineStartPos < s.length())
{
if (!firstLine)
sb.append("\n");
else
firstLine = false;


if (lineStartPos + lineLength > s.length())
lineEndPos = s.length() - 1;
else
{
lineEndPos = lineStartPos + lineLength - 1;
while (lineEndPos > lineStartPos && (s.charAt(lineEndPos) != ' ' && s.charAt(lineEndPos) != '\t'))
lineEndPos--;
}
sb.append(buildWhitespace(indent));
if (linePrefix != null)
sb.append(linePrefix);


sb.append(s.substring(lineStartPos, lineEndPos + 1));
lineStartPos = lineEndPos + 1;
}
return sb.toString();
}


// other utils removed for brevity
}

我有同样的问题,我有很大的成功与JTidy (http://jtidy.sourceforge.net/index.html)

例子:

Tidy t = new Tidy();
t.setIndentContent(true);
Document d = t.parseDOM(
new ByteArrayInputStream("HTML goes here", null);


OutputStream out = new ByteArrayOutputStream();
t.pprint(d, out);
String html = out.toString();

请注意,排名靠前的答案需要使用xerces。

如果您不想添加这个外部依赖,那么您可以简单地使用标准jdk库(实际上是在内部使用xerces构建的)。

注意:jdk 1.5版有一个bug,见http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6296446,但现在已经解决了。

(注意,如果发生错误,将返回原始文本)

package com.test;


import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;


import javax.xml.transform.OutputKeys;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.stream.StreamResult;


import org.xml.sax.InputSource;


public class XmlTest {
public static void main(String[] args) {
XmlTest t = new XmlTest();
System.out.println(t.formatXml("<a><b><c/><d>text D</d><e value='0'/></b></a>"));
}


public String formatXml(String xml){
try{
Transformer serializer= SAXTransformerFactory.newInstance().newTransformer();
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
//serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
serializer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
//serializer.setOutputProperty("{http://xml.customer.org/xslt}indent-amount", "2");
Source xmlSource=new SAXSource(new InputSource(new ByteArrayInputStream(xml.getBytes())));
StreamResult res =  new StreamResult(new ByteArrayOutputStream());
serializer.transform(xmlSource, res);
return new String(((ByteArrayOutputStream)res.getOutputStream()).toByteArray());
}catch(Exception e){
//TODO log error
return xml;
}
}


}

使用scala:

import xml._
val xml = XML.loadString("<tag><nested>hello</nested></tag>")
val formatted = new PrettyPrinter(150, 2).format(xml)
println(formatted)

如果你依赖scala-library.jar,你也可以在Java中这样做。它是这样的:

import scala.xml.*;


public class FormatXML {
public static void main(String[] args) {
String unformattedXml = "<tag><nested>hello</nested></tag>";
PrettyPrinter pp = new PrettyPrinter(150, 3);
String formatted = pp.format(XML.loadString(unformattedXml), TopScope$.MODULE$);
System.out.println(formatted);
}
}

PrettyPrinter对象由两个整型对象构造,第一个是最大行长,第二个是缩进步骤。

为了将来的参考,这里有一个对我有用的解决方案(感谢@George Hawkins在其中一个答案中发表的评论):

DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
LSOutput output = impl.createLSOutput();
ByteArrayOutputStream out = new ByteArrayOutputStream();
output.setByteStream(out);
writer.write(document, output);
String xmlStr = new String(out.toByteArray());

我发现在Java 1.6.0_32中,漂亮打印XML 字符串< em > < / em >的正常方法(使用带有null或身份xslt的Transformer)不像我希望的那样,如果标记只是用空白分隔,而不是没有分隔文本。我尝试在模板中使用<xsl:strip-space elements="*"/>,但没有用。我发现的最简单的解决方案是使用SAXSource和XML过滤器按我想要的方式剥离空间。由于我的解决方案是用于日志记录,所以我还将其扩展到处理不完整的XML片段。注意,如果使用DOMSource,正常的方法似乎工作得很好,但由于不完整和内存开销,我不想使用这种方法。

public static class WhitespaceIgnoreFilter extends XMLFilterImpl
{


@Override
public void ignorableWhitespace(char[] arg0,
int arg1,
int arg2) throws SAXException
{
//Ignore it then...
}


@Override
public void characters( char[] ch,
int start,
int length) throws SAXException
{
if (!new String(ch, start, length).trim().equals(""))
super.characters(ch, start, length);
}
}


public static String prettyXML(String logMsg, boolean allowBadlyFormedFragments) throws SAXException, IOException, TransformerException
{
TransformerFactory transFactory = TransformerFactory.newInstance();
transFactory.setAttribute("indent-number", new Integer(2));
Transformer transformer = transFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
StringWriter out = new StringWriter();
XMLReader masterParser = SAXHelper.getSAXParser(true);
XMLFilter parser = new WhitespaceIgnoreFilter();
parser.setParent(masterParser);


if(allowBadlyFormedFragments)
{
transformer.setErrorListener(new ErrorListener()
{
@Override
public void warning(TransformerException exception) throws TransformerException
{
}


@Override
public void fatalError(TransformerException exception) throws TransformerException
{
}


@Override
public void error(TransformerException exception) throws TransformerException
{
}
});
}


try
{
transformer.transform(new SAXSource(parser, new InputSource(new StringReader(logMsg))), new StreamResult(out));
}
catch (TransformerException e)
{
if(e.getCause() != null && e.getCause() instanceof SAXParseException)
{
if(!allowBadlyFormedFragments || !"XML document structures must start and end within the same entity.".equals(e.getCause().getMessage()))
{
throw e;
}
}
else
{
throw e;
}
}
out.flush();
return out.toString();
}

现在已经是2012年了,Java可以比以前的XML做更多的事情,我想在我公认的答案之外添加一个替代方案。这在Java 6之外没有依赖关系。

import org.w3c.dom.Node;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;


import javax.xml.parsers.DocumentBuilderFactory;
import java.io.StringReader;


/**
* Pretty-prints xml, supplied as a string.
* <p/>
* eg.
* <code>
* String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
* </code>
*/
public class XmlFormatter {


public String format(String xml) {


try {
final InputSource src = new InputSource(new StringReader(xml));
final Node document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
final Boolean keepDeclaration = Boolean.valueOf(xml.startsWith("<?xml"));


//May need this: System.setProperty(DOMImplementationRegistry.PROPERTY,"com.sun.org.apache.xerces.internal.dom.DOMImplementationSourceImpl");




final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
final DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
final LSSerializer writer = impl.createLSSerializer();


writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE); // Set this to true if the output needs to be beautified.
writer.getDomConfig().setParameter("xml-declaration", keepDeclaration); // Set this to true if the declaration is needed to be outputted.


return writer.writeToString(document);
} catch (Exception e) {
throw new RuntimeException(e);
}
}


public static void main(String[] args) {
String unformattedXml =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
"        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
"        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
"    <Query>\n" +
"        <query:CategorySchemeWhere>\n" +
"   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
"        </query:CategorySchemeWhere>\n" +
"    </Query>\n\n\n\n\n" +
"</QueryMessage>";


System.out.println(new XmlFormatter().format(unformattedXml));
}
}

对于那些寻找快速和肮脏的解决方案的人——它不需要XML是100%有效的。例如,在REST / SOAP日志的情况下(你永远不知道其他人发送了什么;-))

我发现并改进了一个我在网上找到的代码剪辑,我认为这仍然是一个有效的可能的方法:

public static String prettyPrintXMLAsString(String xmlString) {
/* Remove new lines */
final String LINE_BREAK = "\n";
xmlString = xmlString.replaceAll(LINE_BREAK, "");
StringBuffer prettyPrintXml = new StringBuffer();
/* Group the xml tags */
Pattern pattern = Pattern.compile("(<[^/][^>]+>)?([^<]*)(</[^>]+>)?(<[^/][^>]+/>)?");
Matcher matcher = pattern.matcher(xmlString);
int tabCount = 0;
while (matcher.find()) {
String str1 = (null == matcher.group(1) || "null".equals(matcher.group())) ? "" : matcher.group(1);
String str2 = (null == matcher.group(2) || "null".equals(matcher.group())) ? "" : matcher.group(2);
String str3 = (null == matcher.group(3) || "null".equals(matcher.group())) ? "" : matcher.group(3);
String str4 = (null == matcher.group(4) || "null".equals(matcher.group())) ? "" : matcher.group(4);


if (matcher.group() != null && !matcher.group().trim().equals("")) {
printTabs(tabCount, prettyPrintXml);
if (!str1.equals("") && str3.equals("")) {
++tabCount;
}
if (str1.equals("") && !str3.equals("")) {
--tabCount;
prettyPrintXml.deleteCharAt(prettyPrintXml.length() - 1);
}


prettyPrintXml.append(str1);
prettyPrintXml.append(str2);
prettyPrintXml.append(str3);
if (!str4.equals("")) {
prettyPrintXml.append(LINE_BREAK);
printTabs(tabCount, prettyPrintXml);
prettyPrintXml.append(str4);
}
prettyPrintXml.append(LINE_BREAK);
}
}
return prettyPrintXml.toString();
}


private static void printTabs(int count, StringBuffer stringBuffer) {
for (int i = 0; i < count; i++) {
stringBuffer.append("\t");
}
}


public static void main(String[] args) {
String x = new String(
"<soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\"><soap:Body><soap:Fault><faultcode>soap:Client</faultcode><faultstring>INVALID_MESSAGE</faultstring><detail><ns3:XcbSoapFault xmlns=\"\" xmlns:ns3=\"http://www.someapp.eu/xcb/types/xcb/v1\"><CauseCode>20007</CauseCode><CauseText>INVALID_MESSAGE</CauseText><DebugInfo>Problems creating SAAJ object model</DebugInfo></ns3:XcbSoapFault></detail></soap:Fault></soap:Body></soap:Envelope>");
System.out.println(prettyPrintXMLAsString(x));
}

输出如下:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<soap:Fault>
<faultcode>soap:Client</faultcode>
<faultstring>INVALID_MESSAGE</faultstring>
<detail>
<ns3:XcbSoapFault xmlns="" xmlns:ns3="http://www.someapp.eu/xcb/types/xcb/v1">
<CauseCode>20007</CauseCode>
<CauseText>INVALID_MESSAGE</CauseText>
<DebugInfo>Problems creating SAAJ object model</DebugInfo>
</ns3:XcbSoapFault>
</detail>
</soap:Fault>
</soap:Body>
</soap:Envelope>

如果您确信您有一个有效的XML,那么这个很简单,并且避免了XML DOM树。可能有一些错误,如果你看到任何错误,请评论

public String prettyPrint(String xml) {
if (xml == null || xml.trim().length() == 0) return "";


int stack = 0;
StringBuilder pretty = new StringBuilder();
String[] rows = xml.trim().replaceAll(">", ">\n").replaceAll("<", "\n<").split("\n");


for (int i = 0; i < rows.length; i++) {
if (rows[i] == null || rows[i].trim().length() == 0) continue;


String row = rows[i].trim();
if (row.startsWith("<?")) {
// xml version tag
pretty.append(row + "\n");
} else if (row.startsWith("</")) {
// closing tag
String indent = repeatString("    ", --stack);
pretty.append(indent + row + "\n");
} else if (row.startsWith("<")) {
// starting tag
String indent = repeatString("    ", stack++);
pretty.append(indent + row + "\n");
} else {
// tag data
String indent = repeatString("    ", stack);
pretty.append(indent + row + "\n");
}
}


return pretty.toString().trim();
}

milosmns

public static String getPrettyXml(String xml) {
if (xml == null || xml.trim().length() == 0) return "";


int stack = 0;
StringBuilder pretty = new StringBuilder();
String[] rows = xml.trim().replaceAll(">", ">\n").replaceAll("<", "\n<").split("\n");


for (int i = 0; i < rows.length; i++) {
if (rows[i] == null || rows[i].trim().length() == 0) continue;


String row = rows[i].trim();
if (row.startsWith("<?")) {
pretty.append(row + "\n");
} else if (row.startsWith("</")) {
String indent = repeatString(--stack);
pretty.append(indent + row + "\n");
} else if (row.startsWith("<") && row.endsWith("/>") == false) {
String indent = repeatString(stack++);
pretty.append(indent + row + "\n");
if (row.endsWith("]]>")) stack--;
} else {
String indent = repeatString(stack);
pretty.append(indent + row + "\n");
}
}


return pretty.toString().trim();
}


private static String repeatString(int stack) {
StringBuilder indent = new StringBuilder();
for (int i = 0; i < stack; i++) {
indent.append(" ");
}
return indent.toString();
}

我在这里找到的针对Java 1.6+的解决方案不会重新格式化已经格式化的代码。对我有效的方法(重新格式化已经格式化的代码)如下。

import org.apache.xml.security.c14n.CanonicalizationException;
import org.apache.xml.security.c14n.Canonicalizer;
import org.apache.xml.security.c14n.InvalidCanonicalizerException;
import org.w3c.dom.Element;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;


import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerException;
import java.io.IOException;
import java.io.StringReader;


public class XmlUtils {
public static String toCanonicalXml(String xml) throws InvalidCanonicalizerException, ParserConfigurationException, SAXException, CanonicalizationException, IOException {
Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
byte canonXmlBytes[] = canon.canonicalize(xml.getBytes());
return new String(canonXmlBytes);
}


public static String prettyFormat(String input) throws TransformerException, ParserConfigurationException, IOException, SAXException, InstantiationException, IllegalAccessException, ClassNotFoundException {
InputSource src = new InputSource(new StringReader(input));
Element document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
Boolean keepDeclaration = input.startsWith("<?xml");
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
writer.getDomConfig().setParameter("xml-declaration", keepDeclaration);
return writer.writeToString(document);
}
}

它是在单元测试中比较全字符串xml的好工具。

private void assertXMLEqual(String expected, String actual) throws ParserConfigurationException, IOException, SAXException, CanonicalizationException, InvalidCanonicalizerException, TransformerException, IllegalAccessException, ClassNotFoundException, InstantiationException {
String canonicalExpected = prettyFormat(toCanonicalXml(expected));
String canonicalActual = prettyFormat(toCanonicalXml(actual));
assertEquals(canonicalExpected, canonicalActual);
}

作为马克斯codeskraps大卫-milosmns答案的替代,看看我的轻量级,高性能漂亮打印机库:xml-formatter

// construct lightweight, threadsafe, instance
PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().build();


StringBuilder buffer = new StringBuilder();
String xml = ..; // also works with char[] or Reader


if(prettyPrinter.process(xml, buffer)) {
// valid XML, print buffer
} else {
// invalid XML, print xml
}

有时,就像直接从文件运行模拟SOAP服务时,有一个漂亮的打印机也能处理已经打印好的XML是很好的:

PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();

正如一些人评论的那样,漂亮打印只是一种以更适合人类阅读的形式表示XML的方法——严格来说,XML数据中不应该有空格。

该库用于日志记录的漂亮打印,还包括用于过滤(子树移除/匿名化)和漂亮打印CDATA和Text节点中的XML的函数。

我看到一个答案使用Scala,所以这里是Groovy中的另一个,以防有人觉得有趣。默认缩进为2步,XmlNodePrinter构造函数也可以传递另一个值。

def xml = "<tag><nested>hello</nested></tag>"
def stringWriter = new StringWriter()
def node = new XmlParser().parseText(xml);
new XmlNodePrinter(new PrintWriter(stringWriter)).print(node)
println stringWriter.toString()

如果groovy jar在类路径中,则使用Java

  String xml = "<tag><nested>hello</nested></tag>";
StringWriter stringWriter = new StringWriter();
Node node = new XmlParser().parseText(xml);
new XmlNodePrinter(new PrintWriter(stringWriter)).print(node);
System.out.println(stringWriter.toString());

使用jdom2: http://www.jdom.org/

import java.io.StringReader;
import org.jdom2.input.SAXBuilder;
import org.jdom2.output.Format;
import org.jdom2.output.XMLOutputter;


String prettyXml = new XMLOutputter(Format.getPrettyFormat()).
outputString(new SAXBuilder().build(new StringReader(uglyXml)));

上面所有的解决方案都不适合我,然后我发现了这个http://myshittycode.com/2014/02/10/java-properly-indenting-xml-string/

线索就是用XPath删除空格

    String xml = "<root>" +
"\n   " +
"\n<name>Coco Puff</name>" +
"\n        <total>10</total>    </root>";


try {
Document document = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));


XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
document,
XPathConstants.NODESET);


for (int i = 0; i < nodeList.getLength(); ++i) {
Node node = nodeList.item(i);
node.getParentNode().removeChild(node);
}


Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");


StringWriter stringWriter = new StringWriter();
StreamResult streamResult = new StreamResult(stringWriter);


transformer.transform(new DOMSource(document), streamResult);


System.out.println(stringWriter.toString());
}
catch (Exception e) {
e.printStackTrace();
}

只是另一种适合我们的解决方法

import java.io.StringWriter;
import org.dom4j.DocumentHelper;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.XMLWriter;


**
* Pretty Print XML String
*
* @param inputXmlString
* @return
*/
public static String prettyPrintXml(String xml) {


final StringWriter sw;


try {
final OutputFormat format = OutputFormat.createPrettyPrint();
final org.dom4j.Document document = DocumentHelper.parseText(xml);
sw = new StringWriter();
final XMLWriter writer = new XMLWriter(sw, format);
writer.write(document);
}
catch (Exception e) {
throw new RuntimeException("Error pretty printing xml:\n" + xml, e);
}
return sw.toString();
}

如果你不需要缩进那么多,但一些换行,这可能是足够的简单regex…

String leastPrettifiedXml = uglyXml.replaceAll("><", ">\n<");

代码很好,而不是因为缺少缩进而导致的结果。


(对于有缩进的解,请参见其他答案。)

试试这个:

 try
{
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = null;
transformer = transFactory.newTransformer();
StringWriter buffer = new StringWriter();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(element),
new StreamResult(buffer));
String str = buffer.toString();
System.out.println("XML INSIDE IS #########################################"+str);
return element;
}
catch (TransformerConfigurationException e)
{
e.printStackTrace();
}
catch (TransformerException e)
{
e.printStackTrace();
}

下面的代码工作得很好

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;


String formattedXml1 = prettyFormat("<root><child>aaa</child><child/></root>");


public static String prettyFormat(String input) {
return prettyFormat(input, "2");
}


public static String prettyFormat(String input, String indent) {
Source xmlInput = new StreamSource(new StringReader(input));
StringWriter stringWriter = new StringWriter();
try {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", indent);
transformer.transform(xmlInput, new StreamResult(stringWriter));


String pretty = stringWriter.toString();
pretty = pretty.replace("\r\n", "\n");
return pretty;
} catch (Exception e) {
throw new RuntimeException(e);
}
}

在提出我自己的解决方案之前,我应该先看看这一页!不管怎样,我使用Java递归来解析xml页面。此代码是完全自包含的,不依赖于第三方库。也. .它使用递归!

// you call this method passing in the xml text
public static void prettyPrint(String text){
prettyPrint(text, 0);
}


// "index" corresponds to the number of levels of nesting and/or the number of tabs to print before printing the tag
public static void prettyPrint(String xmlText, int index){
boolean foundTagStart = false;
StringBuilder tagChars = new StringBuilder();
String startTag = "";
String endTag = "";
String[] chars = xmlText.split("");
// find the next start tag
for(String ch : chars){
if(ch.equalsIgnoreCase("<")){
tagChars.append(ch);
foundTagStart = true;
} else if(ch.equalsIgnoreCase(">") && foundTagStart){
startTag = tagChars.append(ch).toString();
String tempTag = startTag;
endTag = (tempTag.contains("\"") ? (tempTag.split(" ")[0] + ">") : tempTag).replace("<", "</"); // <startTag attr1=1 attr2=2> => </startTag>
break;
} else if(foundTagStart){
tagChars.append(ch);
}
}
// once start and end tag are calculated, print start tag, then content, then end tag
if(foundTagStart){
int startIndex = xmlText.indexOf(startTag);
int endIndex = xmlText.indexOf(endTag);
// handle if matching tags NOT found
if((startIndex < 0) || (endIndex < 0)){
if(startIndex < 0) {
// no start tag found
return;
} else {
// start tag found, no end tag found (handles single tags aka "<mytag/>" or "<?xml ...>")
printTabs(index);
System.out.println(startTag);
// move on to the next tag
// NOTE: "index" (not index+1) because next tag is on same level as this one
prettyPrint(xmlText.substring(startIndex+startTag.length(), xmlText.length()), index);
return;
}
// handle when matching tags found
} else {
String content = xmlText.substring(startIndex+startTag.length(), endIndex);
boolean isTagContainsTags = content.contains("<"); // content contains tags
printTabs(index);
if(isTagContainsTags){ // ie: <tag1><tag2>stuff</tag2></tag1>
System.out.println(startTag);
prettyPrint(content, index+1); // "index+1" because "content" is nested
printTabs(index);
} else {
System.out.print(startTag); // ie: <tag1>stuff</tag1> or <tag1></tag1>
System.out.print(content);
}
System.out.println(endTag);
int nextIndex = endIndex + endTag.length();
if(xmlText.length() > nextIndex){ // if there are more tags on this level, continue
prettyPrint(xmlText.substring(nextIndex, xmlText.length()), index);
}
}
} else {
System.out.print(xmlText);
}
}


private static void printTabs(int counter){
while(counter-- > 0){
System.out.print("\t");
}
}

我把它们混合在一起,写了一个小程序。它从xml文件中读取并打印出来。而不是xzy给出你的文件路径。

    public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File("C:/Users/xyz.xml")));
prettyPrint(doc);


}


private static String prettyPrint(Document document)
throws TransformerException {
TransformerFactory transformerFactory = TransformerFactory
.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
DOMSource source = new DOMSource(document);
StringWriter strWriter = new StringWriter();
StreamResult result = new StreamResult(strWriter);transformer.transform(source, result);
System.out.println(strWriter.getBuffer().toString());


return strWriter.getBuffer().toString();


}

Underscore-java有静态方法U.formatXml(string)生活的例子

import com.github.underscore.U;


public class MyClass {
public static void main(String args[]) {
String xml = "<tag><nested>hello</nested></tag>";


System.out.println(U.formatXml("<?xml version=\"1.0\" encoding=\"UTF-8\"?><root>" + xml + "</root>"));
}
}

输出:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<tag>
<nested>hello</nested>
</tag>
</root>

我总是使用下面的函数:

public static String prettyPrintXml(String xmlStringToBeFormatted) {
String formattedXmlString = null;
try {
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setValidating(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
InputSource inputSource = new InputSource(new StringReader(xmlStringToBeFormatted));
Document document = documentBuilder.parse(inputSource);


Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");


StreamResult streamResult = new StreamResult(new StringWriter());
DOMSource dOMSource = new DOMSource(document);
transformer.transform(dOMSource, streamResult);
formattedXmlString = streamResult.getWriter().toString().trim();
} catch (Exception ex) {
StringWriter sw = new StringWriter();
ex.printStackTrace(new PrintWriter(sw));
System.err.println(sw.toString());
}
return formattedXmlString;
}

我试图实现类似的东西,但没有任何外部依赖。应用程序已经在使用DOM来格式化xml了!

下面是我的示例片段

public void formatXML(final String unformattedXML) {
final int length = unformattedXML.length();
final int indentSpace = 3;
final StringBuilder newString = new StringBuilder(length + length / 10);
final char space = ' ';
int i = 0;
int indentCount = 0;
char currentChar = unformattedXML.charAt(i++);
char previousChar = currentChar;
boolean nodeStarted = true;
newString.append(currentChar);
for (; i < length - 1;) {
currentChar = unformattedXML.charAt(i++);
if(((int) currentChar < 33) && !nodeStarted) {
continue;
}
switch (currentChar) {
case '<':
if ('>' == previousChar && '/' != unformattedXML.charAt(i - 1) && '/' != unformattedXML.charAt(i) && '!' != unformattedXML.charAt(i)) {
indentCount++;
}
newString.append(System.lineSeparator());
for (int j = indentCount * indentSpace; j > 0; j--) {
newString.append(space);
}
newString.append(currentChar);
nodeStarted = true;
break;
case '>':
newString.append(currentChar);
nodeStarted = false;
break;
case '/':
if ('<' == previousChar || '>' == unformattedXML.charAt(i)) {
indentCount--;
}
newString.append(currentChar);
break;
default:
newString.append(currentChar);
}
previousChar = currentChar;
}
newString.append(unformattedXML.charAt(length - 1));
System.out.println(newString.toString());
}