What are the mechanics of short string optimization in libc++?

This answer gives a nice high-level overview of short string optimization (SSO). However, I would like to know in more detail how it works in practice, specifically in the libc++ implementation:

  • How short does the string have to be in order to qualify for SSO? Does this depend on the target architecture?

  • How does the implementation distinguish between short and long strings when accessing the string data? Is it as simple as m_size <= 16 or is it a flag that is part of some other member variable? (I imagine that m_size or part of it might also be used to store string data).

I asked this question specifically for libc++ because I know that it uses SSO, this is even mentioned on the libc++ home page.

Here are some observations after looking at the source:

libc++ can be compiled with two slightly different memory layouts for the string class, this is governed by the _LIBCPP_ALTERNATE_STRING_LAYOUT flag. Both of the layouts also distinguish between little-endian and big-endian machines which leaves us with a total of 4 different variants. I will assume the "normal" layout and little-endian in what follows.

Assuming further that size_type is 4 bytes and that value_type is 1 byte, this is what the first 4 bytes of a string would look like in memory:

// short string: (s)ize and 3 bytes of char (d)ata
sssssss0;dddddddd;dddddddd;dddddddd
^- is_long = 0


// long string: (c)apacity
ccccccc1;cccccccc;cccccccc;cccccccc
^- is_long = 1

Since the size of the short string is in the upper 7 bits, it needs to be shifted when accessing it:

size_type __get_short_size() const {
return __r_.first().__s.__size_ >> 1;
}

Similarly, the getter and setter for the capacity of a long string uses __long_mask to work around the is_long bit.

I am still looking for an answer to my first question, i.e. what value would __min_cap, the capacity of short strings, take for different architectures?

Other standard library implementations

This answer gives a nice overview of std::string memory layouts in other standard library implementations.

51465 次浏览

Libc + + 实现有点复杂,我将忽略它的替代设计,假设一个小端计算机:

template <...>
class basic_string {
/* many many things */


struct __long
{
size_type __cap_;
size_type __size_;
pointer   __data_;
};


enum {__short_mask = 0x01};
enum {__long_mask  = 0x1ul};


enum {__min_cap = (sizeof(__long) - 1)/sizeof(value_type) > 2 ?
(sizeof(__long) - 1)/sizeof(value_type) : 2};


struct __short
{
union
{
unsigned char __size_;
value_type __lx;
};
value_type __data_[__min_cap];
};


union __ulx{__long __lx; __short __lxx;};


enum {__n_words = sizeof(__ulx) / sizeof(size_type)};


struct __raw
{
size_type __words[__n_words];
};


struct __rep
{
union
{
__long  __l;
__short __s;
__raw   __r;
};
};


__compressed_pair<__rep, allocator_type> __r_;
}; // basic_string

注意: __compressed_pair本质上是为 Empty Base Optimization(又名 template <T1, T2> struct __compressed_pair: T1, T2 {};)优化的一对; 从各方面来说,你可以认为它是一个普通的对。因为 std::allocator是无状态的,所以它是空的,所以它的重要性就出现了。

好吧,这是相当原始的,所以让我们检查的机制!在内部,许多函数将调用 __get_pointer()__get_pointer()本身调用 __is_long来确定字符串是使用 __long还是 __short表示:

bool __is_long() const _NOEXCEPT
{ return bool(__r_.first().__s.__size_ & __short_mask); }


// __r_.first() -> __rep const&
//     .__s     -> __short const&
//     .__size_ -> unsigned char

To be honest, I am not too sure this is Standard C++ (I know the initial subsequence provision in union but do not know how it meshes with an anonymous union and aliasing thrown together), but a Standard Library is allowed to take advantage of implementation defined behavior anyway.

Libc + + basic_string被设计为在所有架构上都有一个 sizeof3个单词,其中 sizeof(word) == sizeof(void*)。您已经正确地剖析了 long/short 标志和 size 字段的短格式。

what value would __min_cap, the capacity of short strings, take for different architectures?

在这个简短的句子中,有三个单词需要处理:

  • 1位指向长/短标志。
  • 7比特的大小。
  • 假设 char为1字节,则尾随 null (libc + + 将始终在数据后面存储尾随 null)。

剩下3个单词减去2个字节来存储一个短字符串(即没有分配的最大 capacity())。

在一台32位机器上,短字符串中可以容纳10个字符。 sizeof (string)是12。

在一台64位机器上,短字符串可以容纳22个字符。 sizeof (string)是24。

一个主要的设计目标是最小化 sizeof(string),同时使内部缓冲区尽可能大。其基本原理是加快移动施工和移动分配。sizeof越大,在移动构造或移动任务中需要移动的单词就越多。

长格式需要至少3个字来存储数据指针,大小和容量。因此,我把短语限制在这三个词上。有人建议,4个单词的大小可能会有更好的性能。我还没有测试那个设计选择。

_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

There is a configuration flag called _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT which rearranges the data members such that the "long layout" changes from:

struct __long
{
size_type __cap_;
size_type __size_;
pointer   __data_;
};

to:

struct __long
{
pointer   __data_;
size_type __size_;
size_type __cap_;
};

这个变化的动机是相信将 __data_放在第一位将会有一些性能优势,因为更好的一致性。试图衡量性能优势,但很难衡量。它不会使性能变差,而且可能会稍微好一点。

使用旗帜时要小心。它是一个不同的 ABI,如果意外地与使用不同的 _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT设置编译的 libc + + std::string混合,将会产生运行时错误。

I recommend this flag only be changed by a vendor of libc++.