让 DocumentBuilder.parse 忽略 DTD 引用

当我在这个方法中解析 xml 文件(变量 f)时,我得到一个错误

C: 文档和设置 joe Desktop aicpcudev OnlineModule map.dtd (系统无法找到指定的路径)

我知道我没有 DTD,我也不需要它。如何在忽略 DTD 引用错误的情况下将 File 对象解析为 Document 对象?

private static Document getDoc(File f, String docId) throws Exception{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(f);




return doc;
}
83740 次浏览

I know I do not have the dtd, nor do I need it.

I am suspicious of this statement; does your document contain any entity references? If so, you definitely need the DTD.

Anyway, the usual way of preventing this from happening is using an XML catalog to define a local path for "map.dtd".

here's another user who got the same issue : http://forums.sun.com/thread.jspa?threadID=284209&forumID=34

user ddssot on that post says

myDocumentBuilder.setEntityResolver(new EntityResolver() {
public InputSource resolveEntity(java.lang.String publicId, java.lang.String systemId)
throws SAXException, java.io.IOException
{
if (publicId.equals("--myDTDpublicID--"))
// this deactivates the open office DTD
return new InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
else return null;
}
});

The user further mentions "As you can see, when the parser hits the DTD, the entity resolver is called. I recognize my DTD with its specific ID and return an empty XML doc instead of the real DTD, stopping all validation..."

Hope this helps.

A similar approach to the one suggested by @anjanb

    builder.setEntityResolver(new EntityResolver() {
@Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
if (systemId.contains("foo.dtd")) {
return new InputSource(new StringReader(""));
} else {
return null;
}
}
});

I found that simply returning an empty InputSource worked just as well?

Try setting features on the DocumentBuilderFactory:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();


dbf.setValidating(false);
dbf.setNamespaceAware(true);
dbf.setFeature("http://xml.org/sax/features/namespaces", false);
dbf.setFeature("http://xml.org/sax/features/validation", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);


DocumentBuilder db = dbf.newDocumentBuilder();
...

Ultimately, I think the options are specific to the parser implementation. Here is some documentation for Xerces2 if that helps.

I found an issue where the DTD file was in the jar file along with the XML. I solved the issue based on the examples here, as follows: -

DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new EntityResolver() {
public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
if (systemId.contains("doc.dtd")) {
InputStream dtdStream = MyClass.class
.getResourceAsStream("/my/package/doc.dtd");
return new InputSource(dtdStream);
} else {
return null;
}
}
});

Source XML (With DTD)

<!DOCTYPE MYSERVICE SYSTEM "./MYSERVICE.DTD">
<MYACCSERVICE>
<REQ_PAYLOAD>
<ACCOUNT>1234567890</ACCOUNT>
<BRANCH>001</BRANCH>
<CURRENCY>USD</CURRENCY>
<TRANS_REFERENCE>201611100000777</TRANS_REFERENCE>
</REQ_PAYLOAD>
</MYACCSERVICE>

Java DOM implementation for accepting above XML as String and removing DTD declaration

public Document removeDTDFromXML(String payload) throws Exception {


System.out.println("### Payload received in XMlDTDRemover: " + payload);


Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {


dbf.setValidating(false);
dbf.setNamespaceAware(true);
dbf.setFeature("http://xml.org/sax/features/namespaces", false);
dbf.setFeature("http://xml.org/sax/features/validation", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);


DocumentBuilder db = dbf.newDocumentBuilder();


InputSource is = new InputSource();
is.setCharacterStream(new StringReader(payload));
doc = db.parse(is);


} catch (ParserConfigurationException e) {
System.out.println("Parse Error: " + e.getMessage());
return null;
} catch (SAXException e) {
System.out.println("SAX Error: " + e.getMessage());
return null;
} catch (IOException e) {
System.out.println("IO Error: " + e.getMessage());
return null;
}
return doc;


}

Destination XML (Without DTD)

<MYACCSERVICE>
<REQ_PAYLOAD>
<ACCOUNT>1234567890</ACCOUNT>
<BRANCH>001</BRANCH>
<CURRENCY>USD</CURRENCY>
<TRANS_REFERENCE>201611100000777</TRANS_REFERENCE>
</REQ_PAYLOAD>
</MYACCSERVICE>

I'm working with sonarqube, and sonarlint for eclipse showed me Untrusted XML should be parsed without resolving external data (squid:S2755)

I managed to solve it using:

    factory = DocumentBuilderFactory.newInstance();


factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);


// If you can't completely disable DTDs, then at least do the following:
// Xerces 1 - http://xerces.apache.org/xerces-j/features.html#external-general-entities
// Xerces 2 - http://xerces.apache.org/xerces2-j/features.html#external-general-entities
// JDK7+ - http://xml.org/sax/features/external-general-entities
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);


// Xerces 1 - http://xerces.apache.org/xerces-j/features.html#external-parameter-entities
// Xerces 2 - http://xerces.apache.org/xerces2-j/features.html#external-parameter-entities
// JDK7+ - http://xml.org/sax/features/external-parameter-entities
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);


// Disable external DTDs as well
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);


// and these as well, per Timothy Morgan's 2014 paper: "XML Schema, DTD, and Entity Attacks"
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);