What are the mechanics of short string optimization in libc++?

This answer gives a nice high-level overview of short string optimization (SSO). However, I would like to know in more detail how it works in practice, specifically in the libc++ implementation:

How short does the string have to be in order to qualify for SSO? Does this depend on the target architecture?
How does the implementation distinguish between short and long strings when accessing the string data? Is it as simple as m_size <= 16 or is it a flag that is part of some other member variable? (I imagine that m_size or part of it might also be used to store string data).

I asked this question specifically for libc++ because I know that it uses SSO, this is even mentioned on the libc++ home page.

Here are some observations after looking at the source:

libc++ can be compiled with two slightly different memory layouts for the string class, this is governed by the _LIBCPP_ALTERNATE_STRING_LAYOUT flag. Both of the layouts also distinguish between little-endian and big-endian machines which leaves us with a total of 4 different variants. I will assume the "normal" layout and little-endian in what follows.

Assuming further that size_type is 4 bytes and that value_type is 1 byte, this is what the first 4 bytes of a string would look like in memory:

// short string: (s)ize and 3 bytes of char (d)ata
sssssss0;dddddddd;dddddddd;dddddddd
^- is_long = 0


// long string: (c)apacity
ccccccc1;cccccccc;cccccccc;cccccccc
^- is_long = 1

Since the size of the short string is in the upper 7 bits, it needs to be shifted when accessing it:

size_type __get_short_size() const {
return __r_.first().__s.__size_ >> 1;
}

Similarly, the getter and setter for the capacity of a long string uses __long_mask to work around the is_long bit.

I am still looking for an answer to my first question, i.e. what value would __min_cap, the capacity of short strings, take for different architectures?

Other standard library implementations

This answer gives a nice overview of std::string memory layouts in other standard library implementations.

51465

template <...> class basic_string { /* many many things */ struct __long { size_type __cap_; size_type __size_; pointer __data_; }; enum {__short_mask = 0x01}; enum {__long_mask = 0x1ul}; enum {__min_cap = (sizeof(__long) - 1)/sizeof(value_type) > 2 ? (sizeof(__long) - 1)/sizeof(value_type) : 2}; struct __short { union { unsigned char __size_; value_type __lx; }; value_type __data_[__min_cap]; }; union __ulx{__long __lx; __short __lxx;}; enum {__n_words = sizeof(__ulx) / sizeof(size_type)}; struct __raw { size_type __words[__n_words]; }; struct __rep { union { __long __l; __short __s; __raw __r; }; }; __compressed_pair<__rep, allocator_type> __r_; }; // basic_string

小开

最佳答案

Libc + + basic_string被设计为在所有架构上都有一个 sizeof3个单词，其中 sizeof(word) == sizeof(void*)。您已经正确地剖析了 long/short 标志和 size 字段的短格式。

what value would __min_cap, the capacity of short strings, take for different architectures?

在这个简短的句子中，有三个单词需要处理:

1位指向长/短标志。
7比特的大小。
假设 char为1字节，则尾随 null (libc + + 将始终在数据后面存储尾随 null)。

剩下3个单词减去2个字节来存储一个短字符串(即没有分配的最大 capacity())。

在一台32位机器上，短字符串中可以容纳10个字符。 sizeof (string)是12。

在一台64位机器上，短字符串可以容纳22个字符。 sizeof (string)是24。

一个主要的设计目标是最小化 sizeof(string)，同时使内部缓冲区尽可能大。其基本原理是加快移动施工和移动分配。sizeof越大，在移动构造或移动任务中需要移动的单词就越多。

长格式需要至少3个字来存储数据指针，大小和容量。因此，我把短语限制在这三个词上。有人建议，4个单词的大小可能会有更好的性能。我还没有测试那个设计选择。

_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

There is a configuration flag called _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT which rearranges the data members such that the "long layout" changes from:

struct __long
{
size_type __cap_;
size_type __size_;
pointer   __data_;
};

to:

struct __long
{
pointer   __data_;
size_type __size_;
size_type __cap_;
};

这个变化的动机是相信将 __data_放在第一位将会有一些性能优势，因为更好的一致性。试图衡量性能优势，但很难衡量。它不会使性能变差，而且可能会稍微好一点。

使用旗帜时要小心。它是一个不同的 ABI，如果意外地与使用不同的 _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT设置编译的 libc + + std::string混合，将会产生运行时错误。

I recommend this flag only be changed by a vendor of libc++.