如何构造一个嵌入了 null 的 std: : 字符串?

如果我想构造一个 std: : string,其中包含如下行:

std::string my_string("a\0b");

如果我想在结果字符串中有三个字符(a,null,b) ,我只能得到一个。正确的语法是什么?

51641 次浏览

I have no idea why you'd want to do such a thing, but try this:

std::string my_string("a\0b", 3);

Almost all implementations of std::strings are null-terminated, so you probably shouldn't do this. Note that "a\0b" is actually four characters long because of the automatic null terminator (a, null, b, null). If you really want to do this and break std::string's contract, you can do:

std::string s("aab");
s.at(1) = '\0';

but if you do, all your friends will laugh at you, you will never find true happiness.

The following will work...

std::string s;
s.push_back('a');
s.push_back('\0');
s.push_back('b');

If you are doing manipulation like you would with a c-style string (array of chars) consider using

std::vector<char>

You have more freedom to treat it like an array in the same manner you would treat a c-string. You can use copy() to copy into a string:

std::vector<char> vec(100)
strncpy(&vec[0], "blah blah blah", 100);
std::string vecAsStr( vec.begin(), vec.end());

and you can use it in many of the same places you can use c-strings

printf("%s" &vec[0])
vec[10] = '\0';
vec[11] = 'b';

Naturally, however, you suffer from the same problems as c-strings. You may forget your null terminal or write past the allocated space.

Better to use std::vector<char> if this question isn't just for educational purposes.

Since C++14

we have been able to create literal std::string

#include <iostream>
#include <string>


int main()
{
using namespace std::string_literals;


std::string s = "pl-\0-op"s;    // <- Notice the "s" at the end
// This is a std::string literal not
// a C-String literal.
std::cout << s << "\n";
}

Before C++14

The problem is the std::string constructor that takes a const char* assumes the input is a C-string. C-strings are \0 terminated and thus parsing stops when it reaches the \0 character.

To compensate for this, you need to use the constructor that builds the string from a char array (not a C-String). This takes two parameters - a pointer to the array and a length:

std::string   x("pq\0rs");   // Two characters because input assumed to be C-String
std::string   x("pq\0rs",5); // 5 Characters as the input is now a char array with 5 characters.

Note: C++ std::string is NOT \0-terminated (as suggested in other posts). However, you can extract a pointer to an internal buffer that contains a C-String with the method c_str().

Also check out Doug T's answer below about using a vector<char>.

Also check out RiaD for a C++14 solution.

I know it is a long time this question has been asked. But for anyone who is having a similar problem might be interested in the following code.

CComBSTR(20,"mystring1\0mystring2\0")

What new capabilities do user-defined literals add to C++? presents an elegant answer: Define

std::string operator "" _s(const char* str, size_t n)
{
return std::string(str, n);
}

then you can create your string this way:

std::string my_string("a\0b"_s);

or even so:

auto my_string = "a\0b"_s;

There's an "old style" way:

#define S(s) s, sizeof s - 1 // trailing NUL does not belong to the string

then you can define

std::string my_string(S("a\0b"));

You'll have to be careful with this. If you replace 'b' with any numeric character, you will silently create the wrong string using most methods. See: Rules for C++ string literals escape character.

For example, I dropped this innocent looking snippet in the middle of a program

// Create '\0' followed by '0' 40 times ;)
std::string str("\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", 80);
std::cerr << "Entering loop.\n";
for (char & c : str) {
std::cerr << c;
// 'Q' is way cooler than '\0' or '0'
c = 'Q';
}
std::cerr << "\n";
for (char & c : str) {
std::cerr << c;
}
std::cerr << "\n";

Here is what this program output for me:

Entering loop.
Entering loop.


vector::_M_emplace_ba
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ

That was my first print statement twice, several non-printing characters, followed by a newline, followed by something in internal memory, which I just overwrote (and then printed, showing that it has been overwritten). Worst of all, even compiling this with thorough and verbose gcc warnings gave me no indication of something being wrong, and running the program through valgrind didn't complain about any improper memory access patterns. In other words, it's completely undetectable by modern tools.

You can get this same problem with the much simpler std::string("0", 100);, but the example above is a little trickier, and thus harder to see what's wrong.

Fortunately, C++11 gives us a good solution to the problem using initializer list syntax. This saves you from having to specify the number of characters (which, as I showed above, you can do incorrectly), and avoids combining escaped numbers. std::string str({'a', '\0', 'b'}) is safe for any string content, unlike versions that take an array of char and a size.

In C++14 you now may use literals

using namespace std::literals::string_literals;
std::string s = "a\0b"s;
std::cout << s.size(); // 3

anonym's answer is excellent, but there's a non-macro solution in C++98 as well:

template <size_t N>
std::string RawString(const char (&ch)[N])
{
return std::string(ch, N-1);  // Again, exclude trailing `null`
}

With this function, RawString(/* literal */) will produce the same string as S(/* literal */):

std::string my_string_t(RawString("a\0b"));
std::string my_string_m(S("a\0b"));
std::cout << "Using template: " << my_string_t << std::endl;
std::cout << "Using macro: " << my_string_m << std::endl;

Additionally, there's an issue with the macro: the expression is not actually a std::string as written, and therefore can't be used e.g. for simple assignment-initialization:

std::string s = S("a\0b"); // ERROR!

...so it might be preferable to use:

#define std::string(s, sizeof s - 1)

Obviously you should only use one or the other solution in your project and call it whatever you think is appropriate.