使用 StringWriter 进行 XML 序列化

我目前正在寻找一种简单的方法来序列化对象(在 C # 3中)。

我在谷歌上搜索了一些例子,结果是这样的:

MemoryStream memoryStream = new MemoryStream ( );
XmlSerializer xs = new XmlSerializer ( typeof ( MyObject) );
XmlTextWriter xmlTextWriter = new XmlTextWriter ( memoryStream, Encoding.UTF8 );
xs.Serialize ( xmlTextWriter, myObject);
string result = Encoding.UTF8.GetString(memoryStream .ToArray());

读完这篇 有个问题后,我问自己,为什么不用 StringWriter 呢? 它看起来简单多了。

XmlSerializer ser = new XmlSerializer(typeof(MyObject));
StringWriter writer = new StringWriter();
ser.Serialize(writer, myObject);
serializedValue = writer.ToString();

另一个问题是,第一个示例生成的 XML 不能写入 SQLServer2005DB 的 XML 列。

第一个问题是: 当事后需要一个 Object 作为字符串时,我为什么不应该使用 StringWriter 来序列化它?我在谷歌搜索时从未发现过使用 StringWriter 的结果。

当然,第二个问题是: 如果您不应该使用 StringWriter (不管是出于什么原因) ,那么使用哪种方法是好的、正确的呢?


附加:

正如两个答案都已经提到的,我将进一步讨论 XML 到 DB 的问题。

在写入数据库时,我得到了以下异常:

System.Data.SqlClient.SqlException: XML 解析: 第1行,字符38, 无法切换编码

为了绳子

<?xml version="1.0" encoding="utf-8"?><test/>

我获取了从 XmlTextWriter 创建的字符串,并将其作为 xml 放在那里。这个方法不起作用(手动插入数据库也不起作用)。

之后,我尝试了手动插入(只是编写 INSERT INTO...) ,编码为 = “ utf-16”,但也失败了。 删除编码完全起作用了。在得到这个结果之后,我切换回 StringWriter 代码,瞧——它工作了。

问题: 我真的不明白为什么。

在 Christian Hayter: 通过这些测试,我不确定我必须使用 utf-16来写数据库。那么,将编码设置为 UTF-16(在 xml 标记中)不就行了吗?

165077 次浏览

One problem with StringWriter is that by default it doesn't let you set the encoding which it advertises - so you can end up with an XML document advertising its encoding as UTF-16, which means you need to encode it as UTF-16 if you write it to a file. I have a small class to help with that though:

public sealed class StringWriterWithEncoding : StringWriter
{
public override Encoding Encoding { get; }


public StringWriterWithEncoding (Encoding encoding)
{
Encoding = encoding;
}
}

Or if you only need UTF-8 (which is all I often need):

public sealed class Utf8StringWriter : StringWriter
{
public override Encoding Encoding => Encoding.UTF8;
}

As for why you couldn't save your XML to the database - you'll have to give us more details about what happened when you tried, if you want us to be able to diagnose/fix it.

When serialising an XML document to a .NET string, the encoding must be set to UTF-16. Strings are stored as UTF-16 internally, so this is the only encoding that makes sense. If you want to store data in a different encoding, you use a byte array instead.

SQL Server works on a similar principle; any string passed into an xml column must be encoded as UTF-16. SQL Server will reject any string where the XML declaration does not specify UTF-16. If the XML declaration is not present, then the XML standard requires that it default to UTF-8, so SQL Server will reject that as well.

Bearing this in mind, here are some utility methods for doing the conversion.

public static string Serialize<T>(T value) {


if(value == null) {
return null;
}


XmlSerializer serializer = new XmlSerializer(typeof(T));


XmlWriterSettings settings = new XmlWriterSettings()
{
Encoding = new UnicodeEncoding(false, false), // no BOM in a .NET string
Indent = false,
OmitXmlDeclaration = false
};


using(StringWriter textWriter = new StringWriter()) {
using(XmlWriter xmlWriter = XmlWriter.Create(textWriter, settings)) {
serializer.Serialize(xmlWriter, value);
}
return textWriter.ToString();
}
}


public static T Deserialize<T>(string xml) {


if(string.IsNullOrEmpty(xml)) {
return default(T);
}


XmlSerializer serializer = new XmlSerializer(typeof(T));


XmlReaderSettings settings = new XmlReaderSettings();
// No settings need modifying here


using(StringReader textReader = new StringReader(xml)) {
using(XmlReader xmlReader = XmlReader.Create(textReader, settings)) {
return (T) serializer.Deserialize(xmlReader);
}
}
}

First of all, beware of finding old examples. You've found one that uses XmlTextWriter, which is deprecated as of .NET 2.0. XmlWriter.Create should be used instead.

Here's an example of serializing an object into an XML column:

public void SerializeToXmlColumn(object obj)
{
using (var outputStream = new MemoryStream())
{
using (var writer = XmlWriter.Create(outputStream))
{
var serializer = new XmlSerializer(obj.GetType());
serializer.Serialize(writer, obj);
}


outputStream.Position = 0;
using (var conn = new SqlConnection(Settings.Default.ConnectionString))
{
conn.Open();


const string INSERT_COMMAND = @"INSERT INTO XmlStore (Data) VALUES (@Data)";
using (var cmd = new SqlCommand(INSERT_COMMAND, conn))
{
using (var reader = XmlReader.Create(outputStream))
{
var xml = new SqlXml(reader);


cmd.Parameters.Clear();
cmd.Parameters.AddWithValue("@Data", xml);
cmd.ExecuteNonQuery();
}
}
}
}
}
public static T DeserializeFromXml<T>(string xml)
{
T result;
XmlSerializerFactory serializerFactory = new XmlSerializerFactory();
XmlSerializer serializer =serializerFactory.CreateSerializer(typeof(T));


using (StringReader sr3 = new StringReader(xml))
{
XmlReaderSettings settings = new XmlReaderSettings()
{
CheckCharacters = false // default value is true;
};


using (XmlReader xr3 = XmlTextReader.Create(sr3, settings))
{
result = (T)serializer.Deserialize(xr3);
}
}


return result;
}

It may have been covered elsewhere but simply changing the encoding line of the XML source to 'utf-16' allows the XML to be inserted into a SQL Server 'xml'data type.

using (DataSetTableAdapters.SQSTableAdapter tbl_SQS = new DataSetTableAdapters.SQSTableAdapter())
{
try
{
bodyXML = @"<?xml version="1.0" encoding="UTF-8" standalone="yes"?><test></test>";
bodyXMLutf16 = bodyXML.Replace("UTF-8", "UTF-16");
tbl_SQS.Insert(messageID, receiptHandle, md5OfBody, bodyXMLutf16, sourceType);
}
catch (System.Data.SqlClient.SqlException ex)
{
Console.WriteLine(ex.Message);
Console.ReadLine();
}
}

The result is all of the XML text is inserted into the 'xml' data type field but the 'header' line is removed. What you see in the resulting record is just

<test></test>

Using the serialization method described in the "Answered" entry is a way of including the original header in the target field but the result is that the remaining XML text is enclosed in an XML <string></string> tag.

The table adapter in the code is a class automatically built using the Visual Studio 2013 "Add New Data Source: wizard. The five parameters to the Insert method map to fields in a SQL Server table.

<TL;DR> The problem is rather simple, actually: you are not matching the declared encoding (in the XML declaration) with the datatype of the input parameter. If you manually added <?xml version="1.0" encoding="utf-8"?><test/> to the string, then declaring the SqlParameter to be of type SqlDbType.Xml or SqlDbType.NVarChar would give you the "unable to switch the encoding" error. Then, when inserting manually via T-SQL, since you switched the declared encoding to be utf-16, you were clearly inserting a VARCHAR string (not prefixed with an upper-case "N", hence an 8-bit encoding, such as UTF-8) and not an NVARCHAR string (prefixed with an upper-case "N", hence the 16-bit UTF-16 LE encoding).

The fix should have been as simple as:

  1. In the first case, when adding the declaration stating encoding="utf-8": simply don't add the XML declaration.
  2. In the second case, when adding the declaration stating encoding="utf-16": either
    1. simply don't add the XML declaration, OR
    2. simply add an "N" to the input parameter type: SqlDbType.NVarChar instead of SqlDbType.VarChar :-) (or possibly even switch to using SqlDbType.Xml)

(Detailed response is below)


All of the answers here are over-complicated and unnecessary (regardless of the 121 and 184 up-votes for Christian's and Jon's answers, respectively). They might provide working code, but none of them actually answer the question. The issue is that nobody truly understood the question, which ultimately is about how the XML datatype in SQL Server works. Nothing against those two clearly intelligent people, but this question has little to nothing to do with serializing to XML. Saving XML data into SQL Server is much easier than what is being implied here.

It doesn't really matter how the XML is produced as long as you follow the rules of how to create XML data in SQL Server. I have a more thorough explanation (including working example code to illustrate the points outlined below) in an answer on this question: How to solve “unable to switch the encoding” error when inserting XML into SQL Server, but the basics are:

  1. The XML declaration is optional
  2. The XML datatype stores strings always as UCS-2 / UTF-16 LE
  3. If your XML is UCS-2 / UTF-16 LE, then you:
    1. pass in the data as either NVARCHAR(MAX) or XML / SqlDbType.NVarChar (maxsize = -1) or SqlDbType.Xml, or if using a string literal then it must be prefixed with an upper-case "N".
    2. if specifying the XML declaration, it must be either "UCS-2" or "UTF-16" (no real difference here)
  4. If your XML is 8-bit encoded (e.g. "UTF-8" / "iso-8859-1" / "Windows-1252"), then you:
    1. need to specify the XML declaration IF the encoding is different than the code page specified by the default Collation of the database
    2. you must pass in the data as VARCHAR(MAX) / SqlDbType.VarChar (maxsize = -1), or if using a string literal then it must not be prefixed with an upper-case "N".
    3. Whatever 8-bit encoding is used, the "encoding" noted in the XML declaration must match the actual encoding of the bytes.
    4. The 8-bit encoding will be converted into UTF-16 LE by the XML datatype

With the points outlined above in mind, and given that strings in .NET are always UTF-16 LE / UCS-2 LE (there is no difference between those in terms of encoding), we can answer your questions:

Is there a reason why I shouldn't use StringWriter to serialize an Object when I need it as a string afterwards?

No, your StringWriter code appears to be just fine (at least I see no issues in my limited testing using the 2nd code block from the question).

Wouldn't setting the encoding to UTF-16 (in the xml tag) work then?

It isn't necessary to provide the XML declaration. When it is missing, the encoding is assumed to be UTF-16 LE if you pass the string into SQL Server as NVARCHAR (i.e. SqlDbType.NVarChar) or XML (i.e. SqlDbType.Xml). The encoding is assumed to be the default 8-bit Code Page if passing in as VARCHAR (i.e. SqlDbType.VarChar). If you have any non-standard-ASCII characters (i.e. values 128 and above) and are passing in as VARCHAR, then you will likely see "?" for BMP characters and "??" for Supplementary Characters as SQL Server will convert the UTF-16 string from .NET into an 8-bit string of the current Database's Code Page before converting it back into UTF-16 / UCS-2. But you shouldn't get any errors.

On the other hand, if you do specify the XML declaration, then you must pass into SQL Server using the matching 8-bit or 16-bit datatype. So if you have a declaration stating that the encoding is either UCS-2 or UTF-16, then you must pass in as SqlDbType.NVarChar or SqlDbType.Xml. Or, if you have a declaration stating that the encoding is one of the 8-bit options (i.e. UTF-8, Windows-1252, iso-8859-1, etc), then you must pass in as SqlDbType.VarChar. Failure to match the declared encoding with the proper 8 or 16 -bit SQL Server datatype will result in the "unable to switch the encoding" error that you were getting.

For example, using your StringWriter-based serialization code, I simply printed the resulting string of the XML and used it in SSMS. As you can see below, the XML declaration is included (because StringWriter does not have an option to OmitXmlDeclaration like XmlWriter does), which poses no problem so long as you pass the string in as the correct SQL Server datatype:

-- Upper-case "N" prefix == NVARCHAR, hence no error:
DECLARE @Xml XML = N'<?xml version="1.0" encoding="utf-16"?>
<string>Test ሴ😸</string>';
SELECT @Xml;
-- <string>Test ሴ😸</string>

As you can see, it even handles characters beyond standard ASCII, given that is BMP Code Point U+1234, and 😸 is Supplementary Character Code Point U+1F638. However, the following:

-- No upper-case "N" prefix on the string literal, hence VARCHAR:
DECLARE @Xml XML = '<?xml version="1.0" encoding="utf-16"?>
<string>Test ሴ😸</string>';

results in the following error:

Msg 9402, Level 16, State 1, Line XXXXX
XML parsing: line 1, character 39, unable to switch the encoding

Ergo, all of that explanation aside, the full solution to your original question is:

You were clearly passing the string in as SqlDbType.VarChar. Switch to SqlDbType.NVarChar and it will work without needing to go through the extra step of removing the XML declaration. This is preferred over keeping SqlDbType.VarChar and removing the XML declaration because this solution will prevent data loss when the XML includes non-standard-ASCII characters. For example:

-- No upper-case "N" prefix on the string literal == VARCHAR, and no XML declaration:
DECLARE @Xml2 XML = '<string>Test ሴ😸</string>';
SELECT @Xml2;
-- <string>Test ???</string>

As you can see, there is no error this time, but now there is data-loss 🙀.

For anyone in need of an F# version of the approved answer:

type private Utf8StringWriter() =
inherit StringWriter()
override _.Encoding = System.Text.Encoding.UTF8