如何有效地获取‘ std: : string’子字符串的‘ string_view’

使用 http://en.cppreference.com/w/cpp/string/basic_string_view作为参考,我认为没有更优雅的方法了:

std::string s = "hello world!";
std::string_view v = s;
v = v.substr(6, 5); // "world"

更糟糕的是,这种幼稚的方法是一个陷阱,使得 v成为一个悬而未决的临时参考:

std::string s = "hello world!";
std::string_view v(s.substr(6, 5)); // OOPS!

类似于 似乎记得的东西可能会在标准库中添加一个子字符串作为视图返回:

auto v(s.substr_view(6, 5));

我可以想出以下变通办法:

std::string_view(s).substr(6, 5);
std::string_view(s.data()+6, 5);
// or even "worse":
std::string_view(s).remove_prefix(6).remove_suffix(1);

老实说,我觉得这些都不怎么样。现在我能想到的最好的办法就是使用别名来简单地减少冗长。

using sv = std::string_view;
sv(s).substr(6, 5);
20892 次浏览

There's the free-function route, but unless you also provide overloads for std::string it's a snake-pit.

#include <string>
#include <string_view>


std::string_view sub_string(
std::string_view s,
std::size_t p,
std::size_t n = std::string_view::npos)
{
return s.substr(p, n);
}


int main()
{
using namespace std::literals;


auto source = "foobar"s;


// this is fine and elegant...
auto bar = sub_string(source, 3);


// but uh-oh...
bar = sub_string("foobar"s, 3);
}

IMHO the whole design of string_view is a horror show which will take us back to a world of segfaults and angry customers.

update:

Even adding overloads for std::string is a horror show. See if you can spot the subtle segfault timebomb...

#include <string>
#include <string_view>


std::string_view sub_string(std::string_view s,
std::size_t p,
std::size_t n = std::string_view::npos)
{
return s.substr(p, n);
}


std::string sub_string(std::string&& s,
std::size_t p,
std::size_t n = std::string::npos)
{
return s.substr(p, n);
}


std::string sub_string(std::string const& s,
std::size_t p,
std::size_t n = std::string::npos)
{
return s.substr(p, n);
}


int main()
{
using namespace std::literals;


auto source = "foobar"s;
auto bar = sub_string(std::string_view(source), 3);


// but uh-oh...
bar = sub_string("foobar"s, 3);
}

The compiler found nothing to warn about here. I am certain that a code review would not either.

I've said it before and I'll say it again, in case anyone on the c++ committee is watching, allowing implicit conversions from ABC0 to std::string_view is a terrible error which will only serve to bring c++ into disrepute.

Update

Having raised this (to me) rather alarming property of string_view on the cpporg message board, my concerns have been met with indifference.

The consensus of advice from this group is that std::string_view must never be returned from a function, which means that my first offering above is bad form.

There is of course no compiler help to catch times when this happens by accident (for example through template expansion).

As a result, std::string_view should be used with the utmost care, because from a memory management point of view it is equivalent to a copyable pointer pointing into the state of another object, which may no longer exist. However, it looks and behaves in all other respects like a value type.

Thus code like this:

auto s = get_something().get_suffix();

Is safe when get_suffix() returns a std::string (either by value or reference)

but is UB if get_suffix() is ever refactored to return a std::string_view.

Which in my humble view means that any user code that stores returned strings using auto will break if the libraries they are calling are ever refactored to return std::string_view in place of std::string const&.

So from now on, at least for me, "almost always auto" will have to become, "almost always auto, except when it's strings".

You can use the conversion operator from std::string to std::string_view:

std::string s = "hello world!";
std::string_view v = std::string_view(s).substr(6, 5);

This is how you can efficiently create a sub-string string_view.

#include <string>
inline std::string_view substr_view(const std::string& source, size_t offset = 0,
std::string_view::size_type count =
std::numeric_limits<std::string_view::size_type>::max()) {
if (offset < source.size())
return std::string_view(source.data() + offset,
std::min(source.size() - offset, count));
return {};
}


#include <iostream>
int main(void) {
std::cout << substr_view("abcd",3,11) << "\n";


std::string s {"0123456789"};
std::cout << substr_view(s,3,2) << "\n";


// be cautious about lifetime, as illustrated at https://en.cppreference.com/w/cpp/string/basic_string_view
std::string_view bad = substr_view("0123456789"s, 3, 2); // "bad" holds a dangling pointer
std::cout << bad << "\n"; // possible access violation


return 0;
}

I realize that the question is about C++17, but it's worth noting that C++20 introduced a string_view constructor that accepts two iterators to char (or whatever the base type is) which allows writing

std::string_view v{ s.begin() +6, s.begin()+6 +5 };

Not sure if there is a cleaner syntax, but it's not difficult to

#define RANGE(_container,_start,_length) (_container).begin() + (_start), (_container).begin() + (_start) + (_length)

for a final

std::string_view v{ RANGE(s,6,5) };

PS: I called RANGE's first parameter _container instead of _string for a reason: the macro can be used with any Container (or class supporting at least begin() and end()), even as part of a function call like

auto pisPosition= std::find( RANGE(myDoubleVector,11,23), std::numbers::pi );

PPS: When possible, prefer C++20's actual ranges library to this poor person's solution.