LoadData-根级别的数据无效

我正在尝试解析 WiX 安装程序中的一些 XML。XML 将是我从 Web 服务器返回的所有错误的对象。我得到了这段代码中问题题名的错误:

XmlDocument xml = new XmlDocument();
try
{
xml.LoadXml(myString);
}
catch (Exception ex)
{
System.IO.File.WriteAllText(@"C:\text.txt", myString + "\r\n\r\n" + ex.Message);
throw ex;
}

myString是这样的(如 text.txt的输出所示)

<?xml version="1.0" encoding="utf-8"?>
<Errors></Errors>

text.txt出来的时候是这样的:

<?xml version="1.0" encoding="utf-8"?>
<Errors></Errors>


Data at the root level is invalid. Line 1, position 1.

我需要解析这个 XML,以便查看是否有错误。

288997 次浏览

The issue here was that myString had that header line. Either there was some hidden character at the beginning of the first line or the line itself was causing the error. I sliced off the first line like so:

xml.LoadXml(myString.Substring(myString.IndexOf(Environment.NewLine)));

This solved my problem.

Use Load() method instead, it will solve the problem. See more

The hidden character is probably BOM. The explanation to the problem and the solution can be found here, credits to James Schubert, based on an answer by James Brankin found here.

Though the previous answer does remove the hidden character, it also removes the whole first line. The more precise version would be:

string _byteOrderMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (xml.StartsWith(_byteOrderMarkUtf8))
{
xml = xml.Remove(0, _byteOrderMarkUtf8.Length);
}

I encountered this problem when fetching an XSLT file from Azure blob and loading it into an XslCompiledTransform object. On my machine the file looked just fine, but after uploading it as a blob and fetching it back, the BOM character was added.

I Think that the problem is about encoding. That's why removing first line(with encoding byte) might solve the problem.

My solution for Data at the root level is invalid. Line 1, position 1. in XDocument.Parse(xmlString) was replacing it with XDocument.Load( new MemoryStream( xmlContentInBytes ) );

I've noticed that my xml string looked ok:

<?xml version="1.0" encoding="utf-8"?>

but in different text editor encoding it looked like this:

?<?xml version="1.0" encoding="utf-8"?>

At the end i did not need the xml string but xml byte[]. If you need to use the string you should look for "invisible" bytes in your string and play with encodings to adjust the xml content for parsing or loading.

Hope it will help

Save your file with different encoding:

File > Save file as... > Save as UTF-8 without signature.

In VS 2017 you find encoding as a dropdown next to Save button.

If your xml is in a string use the following to remove any byte order mark:

        xml = new Regex("\\<\\?xml.*\\?>").Replace(xml, "");

I have found out one of the solutions. For your code this could be as follows -

XmlDocument xml = new XmlDocument();
try
{
// assuming the location of the file is in the current directory
// assuming the file name be loadData.xml
string myString = "./loadData.xml";
xml.Load(myString);
}
catch (Exception ex)
{
System.IO.File.WriteAllText(@"C:\text.txt", myString + "\r\n\r\n" + ex.Message);
throw ex;
}

I've solved this issue by directly editing the byte array. Collect the UTF8 preamble and remove directly the header. Afterward you can transform the byte[]to a string with GetString method, see below. The \r and \t I've removed as well, just as precaution.

XmlDocument configurationXML = new XmlDocument();
List<byte> byteArray = new List<byte>(webRequest.downloadHandler.data);


foreach(byte singleByte in Encoding.UTF8.GetPreamble())
{
byteArray.RemoveAt(byteArray.IndexOf(singleByte));
}
string xml = System.Text.Encoding.UTF8.GetString(byteArray.ToArray());
xml = xml.Replace("\\r", "");
xml = xml.Replace("\\t", "");

if we are using XDocument.Parse(@""). Use @ it resolves the issue.

At first I had problems escaping the "&" character, then diacritics and special letters were shown as question marks and ended up with the issue OP mentioned.

I looked at the answers and I used @Ringo's suggestion to try Load() method as an alternative. That made me realize that I can deal with my response in other ways not just as a string.

using System.IO.Stream instead of string solved all the issues for me.

var response = await this.httpClient.GetAsync(url);
var responseStream = await response.Content.ReadAsStreamAsync();
var xmlDocument = new XmlDocument();
xmlDocument.Load(responseStream);

The cool thing about Load() is that this method automatically detects the string format of the input XML (for example, UTF-8, ANSI, and so on). See more

Main culprit for this error is logic which determines encoding when converting Stream or byte[] array to .NET string.

Using StreamReader created with 2nd constructor parameter detectEncodingFromByteOrderMarks set to true, will determine proper encoding and create string which does not break XmlDocument.LoadXml method.

public string GetXmlString(string url)
{
using var stream = GetResponseStream(url);
using var reader = new StreamReader(stream, true);
return reader.ReadToEnd(); // no exception on `LoadXml`
}

Common mistake would be to just blindly use UTF8 encoding on the stream or byte[]. Code bellow would produce string that looks valid when inspected in Visual Studio debugger, or copy-pasted somewhere, but it will produce the exception when used with Load or LoadXml if file is encoded differently then UTF8 without BOM.

public string GetXmlString(string url)
{
byte[] bytes = GetResponseByteArray(url);
return System.Text.Encoding.UTF8.GetString(bytes); // potentially exception on `LoadXml`
}