如何迭代字符串的单词?

如何遍历由空格分隔的单词组成的字符串的单词?

请注意,我对C字符串函数或那种字符操作/访问不感兴趣。我更喜欢优雅而不是效率。我目前的解决方案:

#include <iostream>#include <sstream>#include <string>
using namespace std;
int main() {string s = "Somewhere down the road";istringstream iss(s);
do {string subs;iss >> subs;cout << "Substring: " << subs << endl;} while (iss);}
2360850 次浏览

STL目前还没有这样的方法。

但是,您可以通过使用#1成员来使用C的#0函数,也可以编写自己的函数。这是我在快速Google搜索后找到的代码示例("STL字符串拆分"):

void Tokenize(const string& str,vector<string>& tokens,const string& delimiters = " "){// Skip delimiters at beginning.string::size_type lastPos = str.find_first_not_of(delimiters, 0);// Find first "non-delimiter".string::size_type pos     = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos){// Found a token, add it to the vector.tokens.push_back(str.substr(lastPos, pos - lastPos));// Skip delimiters.  Note the "not_of"lastPos = str.find_first_not_of(delimiters, pos);// Find next "non-delimiter"pos = str.find_first_of(delimiters, lastPos);}}

图片来源:http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html

如果您对代码示例有任何疑问,请留下评论,我会解释的。

仅仅因为它没有实现typedef称为迭代器或重载<<运算符并不意味着它是糟糕的代码。我经常使用C函数。例如,#2#3都比#4<<0快(显着),<<1语法对二进制类型更友好,它们也倾向于产生更小的EXE。

不要在这个“优雅胜于表演”交易中出售。

这是我最喜欢的遍历字符串的方式。你可以对每个单词做任何你想做的事情。

string line = "a line of text to iterate through";string word;
istringstream iss(line, istringstream::in);
while( iss >> word ){// Do something on `word` here...}

使用std::stringstream可以很好地工作,并且做你想做的事情。如果你只是在寻找不同的做事方式,你可以使用#1/#2#3

这里有一个例子:

#include <iostream>#include <string>
int main(){std::string s("Somewhere down the road");std::string::size_type prev_pos = 0, pos = 0;
while( (pos = s.find(' ', pos)) != std::string::npos ){std::string substring( s.substr(prev_pos, pos-prev_pos) );
std::cout << substring << '\n';
prev_pos = ++pos;}
std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last wordstd::cout << substring << '\n';
return 0;}

对于一个非常大且可能冗余的版本,请尝试大量的for循环。

string stringlist[10];int count = 0;
for (int i = 0; i < sequence.length(); i++){if (sequence[i] == ' '){stringlist[count] = sequence.substr(0, i);sequence.erase(0, i+1);i = 0;count++;}else if (i == sequence.length()-1)  // Last word{stringlist[count] = sequence.substr(0, i+1);}}

它并不漂亮,但总的来说(除了标点符号和一系列其他错误)它是有效的!

我喜欢下面的代码,因为它将结果放入一个向量中,支持一个字符串作为一个字符串,并控制保留空值。但是,它看起来不太好。

#include <ostream>#include <string>#include <vector>#include <algorithm>#include <iterator>using namespace std;
vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {vector<string> result;if (delim.empty()) {result.push_back(s);return result;}string::const_iterator substart = s.begin(), subend;while (true) {subend = search(substart, s.end(), delim.begin(), delim.end());string temp(substart, subend);if (keep_empty || !temp.empty()) {result.push_back(temp);}if (subend == s.end()) {break;}substart = subend + delim.size();}return result;}
int main() {const vector<string> words = split("So close no matter how far", " ");copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n"));}

当然,Boost有一个#0部分像这样工作。而且,如果“空白”,你真的意味着任何类型的空白,使用Boost与is_any_of()的拆分效果很好。

这类似于Stack Overflow问题如何在C++中标记字符串?需要Boost外部库

#include <iostream>#include <string>#include <boost/tokenizer.hpp>
using namespace std;using namespace boost;
int main(int argc, char** argv){string text = "token  test\tstring";
char_separator<char> sep(" \t");tokenizer<char_separator<char>> tokens(text, sep);for (const string& t : tokens){cout << t << "." << endl;}}

我用它来用分隔符拆分字符串。第一个将结果放在预先构造的向量中,第二个返回一个新向量。

#include <string>#include <sstream>#include <vector>#include <iterator>
template <typename Out>void split(const std::string &s, char delim, Out result) {std::istringstream iss(s);std::string item;while (std::getline(iss, item, delim)) {*result++ = item;}}
std::vector<std::string> split(const std::string &s, char delim) {std::vector<std::string> elems;split(s, delim, std::back_inserter(elems));return elems;}

请注意,此解决方案不会跳过空令牌,因此以下将找到4个项目,其中一个为空:

std::vector<std::string> x = split("one:two::three", ':');

使用Boost的可能解决方案可能是:

#include <boost/algorithm/string.hpp>std::vector<std::string> strs;boost::split(strs, "string to split", boost::is_any_of("\t "));

这种方法可能比stringstream方法更快。而且由于这是一个通用模板函数,它可用于使用各种分隔符拆分其他类型的字符串(wchar等或UTF-8)。

详情请参阅留档

值得一提的是,这里有另一种从输入字符串中提取令牌的方法,仅依赖于标准库设施。

#include <iostream>#include <string>#include <sstream>#include <algorithm>#include <iterator>
int main() {using namespace std;string sentence = "And I feel fine...";istringstream iss(sentence);copy(istream_iterator<string>(iss),istream_iterator<string>(),ostream_iterator<string>(cout, "\n"));}

与其将提取的令牌复制到输出流,不如使用相同的通用#0算法将它们插入容器中。

vector<string> tokens;copy(istream_iterator<string>(iss),istream_iterator<string>(),back_inserter(tokens));

…或者直接创建vector

vector<string> tokens{istream_iterator<string>{iss},istream_iterator<string>{}};

对于那些为了代码大小而牺牲所有效率并将“高效”视为一种优雅的人来说,下面的内容应该是一个甜蜜的地方(我认为模板容器类是一个非常优雅的补充):

template < class ContainerT >void tokenize(const std::string& str, ContainerT& tokens,const std::string& delimiters = " ", bool trimEmpty = false){std::string::size_type pos, lastPos = 0, length = str.length();
using value_type = typename ContainerT::value_type;using size_type  = typename ContainerT::size_type;
while(lastPos < length + 1){pos = str.find_first_of(delimiters, lastPos);if(pos == std::string::npos){pos = length;}
if(pos != lastPos || !trimEmpty)tokens.push_back(value_type(str.data()+lastPos,(size_type)pos-lastPos ));
lastPos = pos + 1;}}

我通常选择使用std::vector<std::string>类型作为我的第二个参数(ContainerT)……但是当不需要直接访问时,list<>vector<>快得多,你甚至可以创建自己的字符串类并使用类似std::list<subString>的东西,其中subString不做任何复制以获得令人难以置信的速度提升。

它的速度是此页面上最快的标记化的两倍多,几乎是其他标记化的5倍。此外,使用完美的参数类型,您可以消除所有字符串和列表副本以获得额外的速度提升。

此外,它不做(非常低效的)结果返回,而是将令牌作为引用传递,因此如果您愿意,还允许您使用多个调用构建令牌。

最后,它允许您通过最后一个可选参数指定是否从结果中修剪空令牌。

它只需要std::string……其余的是可选的。它不使用流或提升库,但足够灵活,能够自然地接受其中一些外部类型。

这是另一种方法…

void split_string(string text,vector<string>& words){int i=0;char ch;string word;
while(ch=text[i++]){if (isspace(ch)){if (!word.empty()){words.push_back(word);}word = "";}else{word += ch;}}if (!word.empty()){words.push_back(word);}}

另一种灵活快速的方式

template<typename Operator>void tokenize(Operator& op, const char* input, const char* delimiters) {const char* s = input;const char* e = s;while (*e != 0) {e = s;while (*e != 0 && strchr(delimiters, *e) == 0) ++e;if (e - s > 0) {op(s, e - s);}s = e + 1;}}

将它与字符串向量一起使用(编辑:由于有人指出不要继承STL类… hrmf ;) ) :

template<class ContainerType>class Appender {public:Appender(ContainerType& container) : container_(container) {;}void operator() (const char* s, unsigned length) {container_.push_back(std::string(s,length));}private:ContainerType& container_;};
std::vector<std::string> strVector;Appender v(strVector);tokenize(v, "A number of words to be tokenized", " \t");

就是这样!这只是使用标记器的一种方式,例如如何字数:

class WordCounter {public:WordCounter() : noOfWords(0) {}void operator() (const char*, unsigned) {++noOfWords;}unsigned noOfWords;};
WordCounter wc;tokenize(wc, "A number of words to be counted", " \t");ASSERT( wc.noOfWords == 7 );

想象力有限;)

有一个名为#0的函数。

#include<string>using namespace std;
vector<string> split(char* str,const char* delim){char* saveptr;char* token = strtok_r(str,delim,&saveptr);
vector<string> result;
while(token != NULL){result.push_back(token);token = strtok_r(NULL,delim,&saveptr);}return result;}

我使用这个傻瓜是因为我们得到了String类“特殊”(即非标准):

void splitString(const String &s, const String &delim, std::vector<String> &result) {const int l = delim.length();int f = 0;int i = s.indexOf(delim,f);while (i>=0) {String token( i-f > 0 ? s.substring(f,i-f) : "");result.push_back(token);f=i+l;i = s.indexOf(delim,f);}String token = s.substring(f);result.push_back(token);}

getline上循环使用''作为标记。

到目前为止,我使用了Boost中的一个,但我需要一些不依赖于它的东西,所以我来到这里:

static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true){std::ostringstream word;for (size_t n = 0; n < input.size(); ++n){if (std::string::npos == separators.find(input[n]))word << input[n];else{if (!word.str().empty() || !remove_empty)lst.push_back(word.str());word.str("");}}if (!word.str().empty() || !remove_empty)lst.push_back(word.str());}

一个好的观点是,在separators中,您可以传递多个字符。

#include <vector>#include <string>#include <sstream>
int main(){std::string str("Split me by whitespaces");std::string buf;                 // Have a buffer stringstd::stringstream ss(str);       // Insert the string into a stream
std::vector<std::string> tokens; // Create vector to hold our words
while (ss >> buf)tokens.push_back(buf);
return 0;}

如果您喜欢使用提升,但想使用整个字符串作为分隔符(而不是像以前提出的大多数解决方案中的单个字符),您可以使用boost_split_iterator

示例代码包括方便的模板:

#include <iostream>#include <vector>#include <boost/algorithm/string.hpp>
template<typename _OutputIterator>inline void split(const std::string& str,const std::string& delim,_OutputIterator result){using namespace boost::algorithm;typedef split_iterator<std::string::const_iterator> It;
for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));iter!=It();++iter){*(result++) = boost::copy_range<std::string>(*iter);}}
int main(int argc, char* argv[]){using namespace std;
vector<string> splitted;split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));
// or directly to console, for examplesplit("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "\n"));return 0;}

下面是一种更好的方法。它可以使用任何字符,除非你愿意,否则不会拆线。不需要特殊的库(好吧,除了std,但谁真的认为这是一个额外的库),没有指针,没有引用,它是静态的。只是简单的普通C++。

#pragma once#include <vector>#include <sstream>using namespace std;class Helpers{public:static vector<string> split(string s, char delim){stringstream temp (stringstream::in | stringstream::out);vector<string> elems(0);if (s.size() == 0 || delim == 0)return elems;for(char c : s){if(c == delim){elems.push_back(temp.str());temp = stringstream(stringstream::in | stringstream::out);}elsetemp << c;}if (temp.str().size() > 0)elems.push_back(temp.str());return elems;}
//Splits string s with a list of delimiters in delims (it's just a list, like if we wanted to//split at the following letters, a, b, c we would make delims="abc".static vector<string> split(string s, string delims){stringstream temp (stringstream::in | stringstream::out);vector<string> elems(0);bool found;if(s.size() == 0 || delims.size() == 0)return elems;for(char c : s){found = false;for(char d : delims){if (c == d){elems.push_back(temp.str());temp = stringstream(stringstream::in | stringstream::out);found = true;break;}}if(!found)temp << c;}if(temp.str().size() > 0)elems.push_back(temp.str());return elems;}};

我喜欢在这个任务中使用Boost/regex方法,因为它们为指定拆分条件提供了最大的灵活性。

#include <iostream>#include <string>#include <boost/regex.hpp>
int main() {std::string line("A:::line::to:split");const boost::regex re(":+"); // one or more colons
// -1 means find inverse matches aka splitboost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);boost::sregex_token_iterator end;
for (; tokens != end; ++tokens)std::cout << *tokens << std::endl;}

这个怎么样:

#include <string>#include <vector>
using namespace std;
vector<string> split(string str, const char delim) {vector<string> v;string tmp;
for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {if(*i != delim && i != str.end()) {tmp += *i;} else {v.push_back(tmp);tmp = "";}}
return v;}

这是我的版本采取Kev的来源:

#include <string>#include <vector>void split(vector<string> &result, string str, char delim ) {string tmp;string::iterator i;result.clear();
for(i = str.begin(); i <= str.end(); ++i) {if((const char)*i != delim  && i != str.end()) {tmp += *i;} else {result.push_back(tmp);tmp = "";}}}

之后,调用该函数并对其执行一些操作:

vector<string> hosts;split(hosts, "192.168.1.2,192.168.1.3", ',');for( size_t i = 0; i < hosts.size(); i++){cout <<  "Connecting host : " << hosts.at(i) << "..." << endl;}

如果您需要通过非空格符号解析字符串,字符串流可以很方便:

string s = "Name:JAck; Spouse:Susan; ...";string dummy, name, spouse;
istringstream iss(s);getline(iss, dummy, ':');getline(iss, name, ';');getline(iss, dummy, ':');getline(iss, spouse, ';')

这是另一个解决方案。它紧凑且相当有效:

std::vector<std::string> split(const std::string &text, char sep) {std::vector<std::string> tokens;std::size_t start = 0, end = 0;while ((end = text.find(sep, start)) != std::string::npos) {tokens.push_back(text.substr(start, end - start));start = end + 1;}tokens.push_back(text.substr(start));return tokens;}

它可以很容易地被模板化以处理字符串分隔符、宽字符串等。

请注意,拆分""会导致一个空字符串,拆分","(即sep)会导致两个空字符串。

它也可以很容易地扩展为跳过空令牌:

std::vector<std::string> split(const std::string &text, char sep) {std::vector<std::string> tokens;std::size_t start = 0, end = 0;while ((end = text.find(sep, start)) != std::string::npos) {if (end != start) {tokens.push_back(text.substr(start, end - start));}start = end + 1;}if (end != start) {tokens.push_back(text.substr(start));}return tokens;}

如果需要在多个分隔符处拆分字符串,同时跳过空标记,则可以使用此版本:

std::vector<std::string> split(const std::string& text, const std::string& delims){std::vector<std::string> tokens;std::size_t start = text.find_first_not_of(delims), end = 0;
while((end = text.find_first_of(delims, start)) != std::string::npos){tokens.push_back(text.substr(start, end - start));start = text.find_first_not_of(delims, end);}if(start != std::string::npos)tokens.push_back(text.substr(start));
return tokens;}

最近我不得不把一个驼峰式的单词分成子单词。没有分隔符,只有上面的字符。

#include <string>#include <list>#include <locale> // std::isupper
template<class String>const std::list<String> split_camel_case_string(const String &s){std::list<String> R;String w;
for (String::const_iterator i = s.begin(); i < s.end(); ++i) {  {if (std::isupper(*i)) {if (w.length()) {R.push_back(w);w.clear();}}w += *i;}
if (w.length())R.push_back(w);return R;}

例如,这会将“AQueryTrades”拆分为“A”、“Query”和“Trades”。该函数适用于窄字符串和宽字符串。因为它尊重当前语言环境,它将“RaumfahrtüberwachungsVerordnung”拆分为“Raumfahrt”、“überwachungs”和“Verordnung”。

注意std::upper应该真正作为函数模板参数传递。然后,这个函数的更泛化也可以在分隔符","";"" "处拆分。

下面的代码使用strtok()将字符串拆分为令牌并将令牌存储在向量中。

#include <iostream>#include <algorithm>#include <vector>#include <string>
using namespace std;

char one_line_string[] = "hello hi how are you nice weather we are having ok then bye";char seps[]   = " ,\t\n";char *token;


int main(){vector<string> vec_String_Lines;token = strtok( one_line_string, seps );
cout << "Extracting and storing data in a vector..\n\n\n";
while( token != NULL ){vec_String_Lines.push_back(token);token = strtok( NULL, seps );}cout << "Displaying end result in vector line storage..\n\n";
for ( int i = 0; i < vec_String_Lines.size(); ++i)cout << vec_String_Lines[i] << "\n";cout << "\n\n\n";

return 0;}

使用vector作为基类的快速版本,可以完全访问其所有运算符:

    // Split string into parts.class Split : public std::vector<std::string>{public:Split(const std::string& str, char* delimList){size_t lastPos = 0;size_t pos = str.find_first_of(delimList);
while (pos != std::string::npos){if (pos != lastPos)push_back(str.substr(lastPos, pos-lastPos));lastPos = pos + 1;pos = str.find_first_of(delimList, lastPos);}if (lastPos < str.length())push_back(str.substr(lastPos, pos-lastPos));}};

用于填充STL集的示例:

std::set<std::string> words;Split split("Hello,World", ",");words.insert(split.begin(), split.end());

我使用以下

void split(string in, vector<string>& parts, char separator) {string::iterator  ts, curr;ts = curr = in.begin();for(; curr <= in.end(); curr++ ) {if( (curr == in.end() || *curr == separator) && curr > ts )parts.push_back( string( ts, curr ));if( curr == in.end() )break;if( *curr == separator ) ts = curr + 1;}}

PlasmaHH,我忘了包括额外的检查(curr>ts),用于删除带有空格的令牌。

这是一个拆分函数:

  • 是通用
  • 使用标准C++(无升压)
  • 接受多个分隔符
  • 忽略空令牌(可以很容易地更改)

    template<typename T>vector<T>split(const T & str, const T & delimiters) {vector<T> v;typename T::size_type start = 0;auto pos = str.find_first_of(delimiters, start);while(pos != T::npos) {if(pos != start) // ignore empty tokensv.emplace_back(str, start, pos - start);start = pos + 1;pos = str.find_first_of(delimiters, start);}if(start < str.length()) // ignore trailing delimiterv.emplace_back(str, start, str.length() - start); // add what's left of the stringreturn v;}

Example usage:

    vector<string> v = split<string>("Hello, there; World", ";,");vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");

我的代码是:

#include <list>#include <string>template<class StringType = std::string, class ContainerType = std::list<StringType> >class DSplitString:public ContainerType{public:explicit DSplitString(const StringType& strString, char cChar, bool bSkipEmptyParts = true){size_t iPos = 0;size_t iPos_char = 0;while(StringType::npos != (iPos_char = strString.find(cChar, iPos))){StringType strTemp = strString.substr(iPos, iPos_char - iPos);if((bSkipEmptyParts && !strTemp.empty()) || (!bSkipEmptyParts))push_back(strTemp);iPos = iPos_char + 1;}}explicit DSplitString(const StringType& strString, const StringType& strSub, bool bSkipEmptyParts = true){size_t iPos = 0;size_t iPos_char = 0;while(StringType::npos != (iPos_char = strString.find(strSub, iPos))){StringType strTemp = strString.substr(iPos, iPos_char - iPos);if((bSkipEmptyParts && !strTemp.empty()) || (!bSkipEmptyParts))push_back(strTemp);iPos = iPos_char + strSub.length();}}};

示例:

#include <iostream>#include <string>int _tmain(int argc, _TCHAR* argv[]){DSplitString<> aa("doicanhden1;doicanhden2;doicanhden3;", ';');for each (std::string var in aa){std::cout << var << std::endl;}std::cin.get();return 0;}

我使用以下代码:

namespace Core{typedef std::wstring String;
void SplitString(const Core::String& input, const Core::String& splitter, std::list<Core::String>& output){if (splitter.empty()){throw std::invalid_argument(); // for example}
std::list<Core::String> lines;
Core::String::size_type offset = 0;
for (;;){Core::String::size_type splitterPos = input.find(splitter, offset);
if (splitterPos != Core::String::npos){lines.push_back(input.substr(offset, splitterPos - offset));offset = splitterPos + splitter.size();}else{lines.push_back(input.substr(offset));break;}}
lines.swap(output);}}
// gtest:
class SplitStringTest: public testing::Test{};
TEST_F(SplitStringTest, EmptyStringAndSplitter){std::list<Core::String> result;ASSERT_ANY_THROW(Core::SplitString(Core::String(), Core::String(), result));}
TEST_F(SplitStringTest, NonEmptyStringAndEmptySplitter){std::list<Core::String> result;ASSERT_ANY_THROW(Core::SplitString(L"xy", Core::String(), result));}
TEST_F(SplitStringTest, EmptyStringAndNonEmptySplitter){std::list<Core::String> result;Core::SplitString(Core::String(), Core::String(L","), result);ASSERT_EQ(1, result.size());ASSERT_EQ(Core::String(), *result.begin());}
TEST_F(SplitStringTest, OneCharSplitter){std::list<Core::String> result;
Core::SplitString(L"x,y", L",", result);ASSERT_EQ(2, result.size());ASSERT_EQ(L"x", *result.begin());ASSERT_EQ(L"y", *result.rbegin());
Core::SplitString(L",xy", L",", result);ASSERT_EQ(2, result.size());ASSERT_EQ(Core::String(), *result.begin());ASSERT_EQ(L"xy", *result.rbegin());
Core::SplitString(L"xy,", L",", result);ASSERT_EQ(2, result.size());ASSERT_EQ(L"xy", *result.begin());ASSERT_EQ(Core::String(), *result.rbegin());}
TEST_F(SplitStringTest, TwoCharsSplitter){std::list<Core::String> result;
Core::SplitString(L"x,.y,z", L",.", result);ASSERT_EQ(2, result.size());ASSERT_EQ(L"x", *result.begin());ASSERT_EQ(L"y,z", *result.rbegin());
Core::SplitString(L"x,,y,z", L",,", result);ASSERT_EQ(2, result.size());ASSERT_EQ(L"x", *result.begin());ASSERT_EQ(L"y,z", *result.rbegin());}
TEST_F(SplitStringTest, RecursiveSplitter){std::list<Core::String> result;
Core::SplitString(L",,,", L",,", result);ASSERT_EQ(2, result.size());ASSERT_EQ(Core::String(), *result.begin());ASSERT_EQ(L",", *result.rbegin());
Core::SplitString(L",.,.,", L",.,", result);ASSERT_EQ(2, result.size());ASSERT_EQ(Core::String(), *result.begin());ASSERT_EQ(L".,", *result.rbegin());
Core::SplitString(L"x,.,.,y", L",.,", result);ASSERT_EQ(2, result.size());ASSERT_EQ(L"x", *result.begin());ASSERT_EQ(L".,y", *result.rbegin());
Core::SplitString(L",.,,.,", L",.,", result);ASSERT_EQ(3, result.size());ASSERT_EQ(Core::String(), *result.begin());ASSERT_EQ(Core::String(), *(++result.begin()));ASSERT_EQ(Core::String(), *result.rbegin());}
TEST_F(SplitStringTest, NullTerminators){std::list<Core::String> result;
Core::SplitString(L"xy", Core::String(L"\0", 1), result);ASSERT_EQ(1, result.size());ASSERT_EQ(L"xy", *result.begin());
Core::SplitString(Core::String(L"x\0y", 3), Core::String(L"\0", 1), result);ASSERT_EQ(2, result.size());ASSERT_EQ(L"x", *result.begin());ASSERT_EQ(L"y", *result.rbegin());}

我做这个是因为我需要一种简单的方法来拆分字符串和基于c的字符串……希望其他人也能发现它很有用。它也不依赖于令牌,你可以使用字段作为分隔符,这是我需要的另一个关键。

我相信有一些改进可以进一步提高它的优雅,请尽一切努力

StringSplitter.hpp:

#include <vector>#include <iostream>#include <string.h>
using namespace std;
class StringSplit{private:void copy_fragment(char*, char*, char*);void copy_fragment(char*, char*, char);bool match_fragment(char*, char*, int);int untilnextdelim(char*, char);int untilnextdelim(char*, char*);void assimilate(char*, char);void assimilate(char*, char*);bool string_contains(char*, char*);long calc_string_size(char*);void copy_string(char*, char*);
public:vector<char*> split_cstr(char);vector<char*> split_cstr(char*);vector<string> split_string(char);vector<string> split_string(char*);char* String;bool do_string;bool keep_empty;vector<char*> Container;vector<string> ContainerS;
StringSplit(char * in){String = in;}
StringSplit(string in){size_t len = calc_string_size((char*)in.c_str());String = new char[len + 1];memset(String, 0, len + 1);copy_string(String, (char*)in.c_str());do_string = true;}
~StringSplit(){for (int i = 0; i < Container.size(); i++){if (Container[i] != NULL){delete[] Container[i];}}if (do_string){delete[] String;}}};

StringSplitter.cpp:

#include <string.h>#include <iostream>#include <vector>#include "StringSplit.hpp"
using namespace std;
void StringSplit::assimilate(char*src, char delim){int until = untilnextdelim(src, delim);if (until > 0){char * temp = new char[until + 1];memset(temp, 0, until + 1);copy_fragment(temp, src, delim);if (keep_empty || *temp != 0){if (!do_string){Container.push_back(temp);}else{string x = temp;ContainerS.push_back(x);}
}else{delete[] temp;}}}
void StringSplit::assimilate(char*src, char* delim){int until = untilnextdelim(src, delim);if (until > 0){char * temp = new char[until + 1];memset(temp, 0, until + 1);copy_fragment(temp, src, delim);if (keep_empty || *temp != 0){if (!do_string){Container.push_back(temp);}else{string x = temp;ContainerS.push_back(x);}}else{delete[] temp;}}}
long StringSplit::calc_string_size(char* _in){long i = 0;while (*_in++){i++;}return i;}
bool StringSplit::string_contains(char* haystack, char* needle){size_t len = calc_string_size(needle);size_t lenh = calc_string_size(haystack);while (lenh--){if (match_fragment(haystack + lenh, needle, len)){return true;}}return false;}
bool StringSplit::match_fragment(char* _src, char* cmp, int len){while (len--){if (*(_src + len) != *(cmp + len)){return false;}}return true;}
int StringSplit::untilnextdelim(char* _in, char delim){size_t len = calc_string_size(_in);if (*_in == delim){_in += 1;return len - 1;}
int c = 0;while (*(_in + c) != delim && c < len){c++;}
return c;}
int StringSplit::untilnextdelim(char* _in, char* delim){int s = calc_string_size(delim);int c = 1 + s;
if (!string_contains(_in, delim)){return calc_string_size(_in);}else if (match_fragment(_in, delim, s)){_in += s;return calc_string_size(_in);}
while (!match_fragment(_in + c, delim, s)){c++;}
return c;}
void StringSplit::copy_fragment(char* dest, char* src, char delim){if (*src == delim){src++;}
int c = 0;while (*(src + c) != delim && *(src + c)){*(dest + c) = *(src + c);c++;}*(dest + c) = 0;}
void StringSplit::copy_string(char* dest, char* src){int i = 0;while (*(src + i)){*(dest + i) = *(src + i);i++;}}
void StringSplit::copy_fragment(char* dest, char* src, char* delim){size_t len = calc_string_size(delim);size_t lens = calc_string_size(src);
if (match_fragment(src, delim, len)){src += len;lens -= len;}
int c = 0;while (!match_fragment(src + c, delim, len) && (c < lens)){*(dest + c) = *(src + c);c++;}*(dest + c) = 0;}
vector<char*> StringSplit::split_cstr(char Delimiter){int i = 0;while (*String){if (*String != Delimiter && i == 0){assimilate(String, Delimiter);}if (*String == Delimiter){assimilate(String, Delimiter);}i++;String++;}
String -= i;delete[] String;
return Container;}
vector<string> StringSplit::split_string(char Delimiter){do_string = true;
int i = 0;while (*String){if (*String != Delimiter && i == 0){assimilate(String, Delimiter);}if (*String == Delimiter){assimilate(String, Delimiter);}i++;String++;}
String -= i;delete[] String;
return ContainerS;}
vector<char*> StringSplit::split_cstr(char* Delimiter){int i = 0;size_t LenDelim = calc_string_size(Delimiter);
while(*String){if (!match_fragment(String, Delimiter, LenDelim) && i == 0){assimilate(String, Delimiter);}if (match_fragment(String, Delimiter, LenDelim)){assimilate(String,Delimiter);}i++;String++;}
String -= i;delete[] String;
return Container;}
vector<string> StringSplit::split_string(char* Delimiter){do_string = true;int i = 0;size_t LenDelim = calc_string_size(Delimiter);
while (*String){if (!match_fragment(String, Delimiter, LenDelim) && i == 0){assimilate(String, Delimiter);}if (match_fragment(String, Delimiter, LenDelim)){assimilate(String, Delimiter);}i++;String++;}
String -= i;delete[] String;
return ContainerS;}

示例:

int main(int argc, char*argv[]){StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";vector<char*> Split = ss.split_cstr(":CUT:");
for (int i = 0; i < Split.size(); i++){cout << Split[i] << endl;}
return 0;}

将输出:

这个

an
示例
cstring

int main(int argc, char*argv[]){StringSplit ss = "This:is:an:example:cstring";vector<char*> Split = ss.split_cstr(':');
for (int i = 0; i < Split.size(); i++){cout << Split[i] << endl;}
return 0;}
int main(int argc, char*argv[]){string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";StringSplit ss = mystring;vector<string> Split = ss.split_string("[SPLIT]");
for (int i = 0; i < Split.size(); i++){cout << Split[i] << endl;}
return 0;}
int main(int argc, char*argv[]){string mystring = "This|is|an|example|string";StringSplit ss = mystring;vector<string> Split = ss.split_string('|');
for (int i = 0; i < Split.size(); i++){cout << Split[i] << endl;}
return 0;}

保留空条目(默认情况下,空条目将被排除在外):

StringSplit ss = mystring;ss.keep_empty = true;vector<string> Split = ss.split_string(":DELIM:");

目标是使其类似于C#的Split()方法,其中拆分字符串就像:

String[] Split ="Hey:cut:what's:cut:your:cut:name?".Split(new[]{":cut:"}, StringSplitOptions.None);
foreach(String X in Split){Console.Write(X);}

我希望其他人能像我一样发现这一点。

我有一个两行解决这个问题:

char sep = ' ';std::string s="1 This is an example";
for(size_t p=0, q=0; p!=s.npos; p=q)std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;

然后,而不是打印,你可以把它放在一个矢量。

这是我写的一个函数,它帮助我做了很多事情。它在为WebSockets做协议时帮助了我。

using namespace std;#include <iostream>#include <vector>#include <sstream>#include <string>
vector<string> split ( string input , string split_id ) {vector<string> result;int i = 0;bool add;string temp;stringstream ss;size_t found;string real;int r = 0;while ( i != input.length() ) {add = false;ss << input.at(i);temp = ss.str();found = temp.find(split_id);if ( found != string::npos ) {add = true;real.append ( temp , 0 , found );} else if ( r > 0 &&  ( i+1 ) == input.length() ) {add = true;real.append ( temp , 0 , found );}if ( add ) {result.push_back(real);ss.str(string());ss.clear();temp.clear();real.clear();r = 0;}i++;r++;}return result;}
int main() {string s = "S,o,m,e,w,h,e,r,e, down the road \n In a really big C++ house.  \n  Lives a little old lady.   \n   That no one ever knew.    \n    She comes outside.     \n     In the very hot sun.      \n\n\n\n\n\n\n\n   And throws C++ at us.    \n    The End.  FIN.";vector < string > Token;Token = split ( s , "," );for ( int i = 0 ; i < Token.size(); i++)    cout << Token.at(i) << endl;cout << endl << Token.size();int a;cin >> a;return a;}

我编写了以下代码。您可以指定分隔符,它可以是字符串。结果类似于Java的String.split,结果中为空字符串。

例如,如果我们调用拆分("ABCPICKABCANYABCTWO: ABC","ABC"),结果如下:

0  <len:0>1 PICK <len:4>2 ANY <len:3>3 TWO: <len:4>4  <len:0>

代码:

vector <string> split(const string& str, const string& delimiter = " ") {vector <string> tokens;
string::size_type lastPos = 0;string::size_type pos = str.find(delimiter, lastPos);
while (string::npos != pos) {// Found a token, add it to the vector.cout << str.substr(lastPos, pos - lastPos) << endl;tokens.push_back(str.substr(lastPos, pos - lastPos));lastPos = pos + delimiter.size();pos = str.find(delimiter, lastPos);}
tokens.push_back(str.substr(lastPos, str.size() - lastPos));return tokens;}

这是一个只使用标准正则表达式库的正则表达式解决方案。(我有点生疏,所以可能会有一些语法错误,但这至少是一般的想法)

#include <regex.h>#include <string.h>#include <vector.h>
using namespace std;
vector<string> split(string s){regex r ("\\w+"); //regex matches whole words, (greedy, so no fragment words)regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );regex_iterator<string::iterator> rend; //iterators to iterate thru wordsvector<string> result<regex_iterator>(rit, rend);return result;  //iterates through the matches to fill the vector}
void splitString(string str, char delim, string array[], const int arraySize){int delimPosition, subStrSize, subStrStart = 0;
for (int index = 0; delimPosition != -1; index++){delimPosition = str.find(delim, subStrStart);subStrSize = delimPosition - subStrStart;array[index] = str.substr(subStrStart, subStrSize);subStrStart =+ (delimPosition + 1);}}

Boost ! : -)

#include <boost/algorithm/string/split.hpp>#include <boost/algorithm/string.hpp>#include <iostream>#include <vector>
using namespace std;using namespace boost;
int main(int argc, char**argv) {typedef vector < string > list_type;
list_type list;string line;
line = "Somewhere down the road";split(list, line, is_any_of(" "));
for(int i = 0; i < list.size(); i++){cout << list[i] << endl;}
return 0;}

这个例子给出了输出-

Somewheredowntheroad

我相信还没有人发布过这个解决方案。它不是直接使用分隔符,而是基本上与Boost::分裂()相同,即它允许你传递一个谓词,如果char是分隔符,则返回true,否则返回false。我认为这给了程序员更多的控制权,最棒的是你不需要Boost。

template <class Container, class String, class Predicate>void split(Container& output, const String& input,const Predicate& pred, bool trimEmpty = false) {auto it = begin(input);auto itLast = it;while (it = find_if(it, end(input), pred), it != end(input)) {if (not (trimEmpty and it == itLast)) {output.emplace_back(itLast, it);}++it;itLast = it;}}

然后你可以像这样使用它:

struct Delim {bool operator()(char c) {return not isalpha(c);}};
int main() {string s("#include<iostream>\n""int main() { std::cout << \"Hello world!\" << std::endl; }");
vector<string> v;
split(v, s, Delim(), true);/* Which is also the same as */split(v, s, [](char c) { return not isalpha(c); }, true);
for (const auto& i : v) {cout << i << endl;}}

我刚刚写了一个很好的例子,说明如何按符号拆分字符,然后将每个字符数组(由符号分隔的单词)放入一个向量中。为了简单起见,我制作了std字符串的向量类型。

我希望这对你有帮助,并且对你有帮助。

#include <vector>#include <string>#include <iostream>
void push(std::vector<std::string> &WORDS, std::string &TMP){WORDS.push_back(TMP);TMP = "";}std::vector<std::string> mySplit(char STRING[]){std::vector<std::string> words;std::string s;for(unsigned short i = 0; i < strlen(STRING); i++){if(STRING[i] != ' '){s += STRING[i];}else{push(words, s);}}push(words, s);//Used to get last splitreturn words;}
int main(){char string[] = "My awesome string.";std::cout << mySplit(string)[2];std::cin.get();return 0;}

我的实现可以是另一种解决方案:

std::vector<std::wstring> SplitString(const std::wstring & String, const std::wstring & Seperator){std::vector<std::wstring> Lines;size_t stSearchPos = 0;size_t stFoundPos;while (stSearchPos < String.size() - 1){stFoundPos = String.find(Seperator, stSearchPos);stFoundPos = (stFoundPos == std::string::npos) ? String.size() : stFoundPos;Lines.push_back(String.substr(stSearchPos, stFoundPos - stSearchPos));stSearchPos = stFoundPos + Seperator.size();}return Lines;}

测试代码:

std::wstring MyString(L"Part 1SEPsecond partSEPlast partSEPend");std::vector<std::wstring> Parts = IniFile::SplitString(MyString, L"SEP");std::wcout << L"The string: " << MyString << std::endl;for (std::vector<std::wstring>::const_iterator it=Parts.begin(); it<Parts.end(); ++it){std::wcout << *it << L"<---" << std::endl;}std::wcout << std::endl;MyString = L"this,time,a,comma separated,string";std::wcout << L"The string: " << MyString << std::endl;Parts = IniFile::SplitString(MyString, L",");for (std::vector<std::wstring>::const_iterator it=Parts.begin(); it<Parts.end(); ++it){std::wcout << *it << L"<---" << std::endl;}

测试代码输出:

The string: Part 1SEPsecond partSEPlast partSEPendPart 1<---second part<---last part<---end<---
The string: this,time,a,comma separated,stringthis<---time<---a<---comma separated<---string<---
#include<iostream>#include<string>#include<sstream>#include<vector>using namespace std;
vector<string> split(const string &s, char delim) {vector<string> elems;stringstream ss(s);string item;while (getline(ss, item, delim)) {elems.push_back(item);}return elems;}
int main() {
vector<string> x = split("thi is an sample test",' ');unsigned int i;for(i=0;i<x.size();i++)cout<<i<<":"<<x[i]<<endl;return 0;}

LazyStringSplitter:

#include <string>#include <algorithm>#include <unordered_set>
using namespace std;
class LazyStringSplitter{string::const_iterator start, finish;unordered_set<char> chop;
public:
// Empty Constructorexplicit LazyStringSplitter(){}
explicit LazyStringSplitter (const string cstr, const string delims): start(cstr.begin()), finish(cstr.end()), chop(delims.begin(), delims.end()){}
void operator () (const string cstr, const string delims){chop.insert(delims.begin(), delims.end());start = cstr.begin();finish = cstr.end();}
bool empty() const { return (start >= finish); }
string next(){// return empty string// if ran out of charactersif (empty())return string("");
auto runner = find_if(start, finish, [&](char c) {return chop.count(c) == 1;});
// construct next stringstring ret(start, runner);start = runner + 1;
// Never return empty string// + tail recursion makes this method efficientreturn !ret.empty() ? ret : next();}};
  • 我将此方法称为LazyStringSplitter,因为一个原因-它不会一次性拆分字符串。
  • 本质上它的行为就像一个蟒蛇生成器
  • 它公开了一个名为next的方法,该方法返回从原始字符串拆分的下一个字符串
  • 我使用了c++11 STL中的unordered_set,因此查找分隔符的速度要快得多
  • 这就是它的工作原理

测试程序

#include <iostream>using namespace std;
int main(){LazyStringSplitter splitter;
// split at the characters ' ', '!', '.', ','splitter("This, is a string. And here is another string! Let's test and see how well this does.", " !.,");
while (!splitter.empty())cout << splitter.next() << endl;return 0;}

输出

ThisisastringAndhereisanotherstringLet'stestandseehowwellthisdoes

下一个改进计划是实现beginend方法,以便可以执行以下操作:

vector<string> split_string(splitter.begin(), splitter.end());

我已经使用strtok进行了自己的滚动,并使用Boost来拆分字符串。我发现的最好的方法是C++字符串工具包库。它非常灵活和快速。

#include <iostream>#include <vector>#include <string>#include <strtk.hpp>
const char *whitespace  = " \t\r\n\f";const char *whitespace_and_punctuation  = " \t\r\n\f;,=";
int main()\{\{   // normal parsing of a string into a vector of stringsstd::string s("Somewhere down the road");std::vector<std::string> result;if( strtk::parse( s, whitespace, result ) ){for(size_t i = 0; i < result.size(); ++i )std::cout << result[i] << std::endl;}}
{  // parsing a string into a vector of floats with other separators// besides spaces
std::string s("3.0, 3.14; 4.0");std::vector<float> values;if( strtk::parse( s, whitespace_and_punctuation, values ) ){for(size_t i = 0; i < values.size(); ++i )std::cout << values[i] << std::endl;}}
{  // parsing a string into specific variables
std::string s("angle = 45; radius = 9.9");std::string w1, w2;float v1, v2;if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) ){std::cout << "word " << w1 << ", value " << v1 << std::endl;std::cout << "word " << w2 << ", value " << v2 << std::endl;}}
return 0;}

该工具包比这个简单的示例显示的灵活性要大得多,但它在将字符串解析为有用元素方面的实用性令人难以置信。

我一直在寻找一种用任何长度的分隔符分割字符串的方法,所以我从头开始编写,因为现有的解决方案不适合我。

这是我的小算法,仅使用STL:

//use like this//std::vector<std::wstring> vec = Split<std::wstring> (L"Hello##world##!", L"##");
template <typename valueType>static std::vector <valueType> Split (valueType text, const valueType& delimiter){std::vector <valueType> tokens;size_t pos = 0;valueType token;
while ((pos = text.find(delimiter)) != valueType::npos){token = text.substr(0, pos);tokens.push_back (token);text.erase(0, pos + delimiter.length());}tokens.push_back (text);
return tokens;}

它可以与任何长度和形式的分隔符一起使用,据我测试。使用string或wstring类型实例化。

该算法所做的就是搜索分隔符,获取字符串中直到分隔符的部分,删除分隔符并再次搜索,直到找到它为止。

当然,您可以使用任意数量的空格作为分隔符。

希望有帮助。

没有Boost,没有字符串流,只有与std::stringstd::list协同工作的标准C库:C库函数便于分析,C++数据类型便于内存管理。

空格被认为是换行符、制表符和空格的任意组合。空格字符集由wschars变量建立。

#include <string>#include <list>#include <iostream>#include <cstring>
using namespace std;
const char *wschars = "\t\n ";
list<string> split(const string &str){const char *cstr = str.c_str();list<string> out;
while (*cstr) {                     // while remaining string not emptysize_t toklen;cstr += strspn(cstr, wschars);    // skip leading whitespacetoklen = strcspn(cstr, wschars);  // figure out token lengthif (toklen)                       // if we have a token, add to listout.push_back(string(cstr, toklen));cstr += toklen;                   // skip over token}
// ran out of string; return list
return out;}
int main(int argc, char **argv){list<string> li = split(argv[1]);for (list<string>::iterator i = li.begin(); i != li.end(); i++)cout << "{" << *i << "}" << endl;return 0;}

运行:

$ ./split ""$ ./split "a"{a}$ ./split " a "{a}$ ./split " a b"{a}{b}$ ./split " a b c"{a}{b}{c}$ ./split " a b c d  "{a}{b}{c}{d}

split的尾递归版本(本身分为两个函数)。除了将字符串推送到列表中之外,所有对变量的破坏性操作都消失了!

void split_rec(const char *cstr, list<string> &li){if (*cstr) {const size_t leadsp = strspn(cstr, wschars);const size_t toklen = strcspn(cstr + leadsp, wschars);
if (toklen)li.push_back(string(cstr + leadsp, toklen));
split_rec(cstr + leadsp + toklen, li);}}
list<string> split(const string &str){list<string> out;split_rec(str.c_str(), out);return out;}

这里有一个简单的解决方案,只使用标准的正则表达式库

#include <regex>#include <string>#include <vector>
std::vector<string> Tokenize( const string str, const std::regex regex ){using namespace std;
std::vector<string> result;
sregex_token_iterator it( str.begin(), str.end(), regex, -1 );sregex_token_iterator reg_end;
for ( ; it != reg_end; ++it ) {if ( !it->str().empty() ) //token could be empty:checkresult.emplace_back( it->str() );}
return result;}

regex参数允许检查多个参数(空格、逗号等)。

我通常只检查空格和逗号的拆分,所以我也有这个默认函数:

std::vector<string> TokenizeDefault( const string str ){using namespace std;
regex re( "[\\s,]+" );
return Tokenize( str, re );}

"[\\s,]+"检查空格(\\s)和逗号(,)。

注意,如果您想拆分wstring而不是string

  • 将所有std::regex更改为std::wregex
  • 将所有sregex_token_iterator更改为wsregex_token_iterator

请注意,您可能还希望通过引用获取字符串参数,具体取决于您的编译器。

这是我的方法,剪切和分裂:

string cut (string& str, const string& del){string f = str;
if (in.find_first_of(del) != string::npos){f = str.substr(0,str.find_first_of(del));str = str.substr(str.find_first_of(del)+del.length());}
return f;}
vector<string> split (const string& in, const string& del=" "){vector<string> out();string t = in;
while (t.length() > del.length())out.push_back(cut(t,del));
return out;}

顺便说一句,如果我能做些什么来优化这个…

// adapted from a "regular" csv parsestd::string stringIn = "my csv  is 10233478 NOTseparated by commas";std::vector<std::string> commaSeparated(1);int commaCounter = 0;for (int i=0; i<stringIn.size(); i++) {if (stringIn[i] == " ") {commaSeparated.push_back("");commaCounter++;} else {commaSeparated.at(commaCounter) += stringIn[i];}}

最后,你将有一个字符串向量,句子中的每个元素都由空格分隔。只有非标准资源是std::向量(但由于涉及std::字符串,我认为这是可以接受的)。

空字符串保存为单独的项。

这是我的版本

#include <vector>
inline std::vector<std::string> Split(const std::string &str, const std::string &delim = " "){std::vector<std::string> tokens;if (str.size() > 0){if (delim.size() > 0){std::string::size_type currPos = 0, prevPos = 0;while ((currPos = str.find(delim, prevPos)) != std::string::npos){std::string item = str.substr(prevPos, currPos - prevPos);if (item.size() > 0){tokens.push_back(item);}prevPos = currPos + 1;}tokens.push_back(str.substr(prevPos));}else{tokens.push_back(str);}}return tokens;}

它与多字符分隔符一起工作。它防止空标记进入结果。它使用单个标题。当您不提供分隔符时,它将字符串作为一个单一标记返回。如果字符串为空,它也返回一个空结果。不幸的是,它效率低下,因为您正在使用C++11编译巨大的std::vector副本除非,它应该使用移动原理。在C++11中,这段代码应该很快。

这是我使用C++11STL的解决方案。它应该相当有效:

#include <vector>#include <string>#include <cstring>#include <iostream>#include <algorithm>#include <functional>
std::vector<std::string> split(const std::string& s){std::vector<std::string> v;
const auto end = s.end();auto to = s.begin();decltype(to) from;
while((from = std::find_if(to, end,[](char c){ return !std::isspace(c); })) != end){to = std::find_if(from, end, [](char c){ return std::isspace(c); });v.emplace_back(from, to);}
return v;}
int main(){std::string s = "this is the string  to  split";
auto v = split(s);
for(auto&& s: v)std::cout << s << '\n';}

输出:

thisisthestringtosplit

我知道这里的聚会很晚了,但我正在考虑最优雅的方式,如果你有一系列分隔符而不是空格,并且只使用标准库。

以下是我的想法:

通过一系列分隔符将单词拆分为字符串向量:

template<class Container>std::vector<std::string> split_by_delimiters(const std::string& input, const Container& delimiters){std::vector<std::string> result;
for (auto current = begin(input) ; current != end(input) ; ){auto first = find_if(current, end(input), not_in(delimiters));if (first == end(input)) break;auto last = find_if(first, end(input), is_in(delimiters));result.emplace_back(first, last);current = last;}return result;}

以另一种方式拆分,通过提供一系列有效字符:

template<class Container>std::vector<std::string> split_by_valid_chars(const std::string& input, const Container& valid_chars){std::vector<std::string> result;
for (auto current = begin(input) ; current != end(input) ; ){auto first = find_if(current, end(input), is_in(valid_chars));if (first == end(input)) break;auto last = find_if(first, end(input), not_in(valid_chars));result.emplace_back(first, last);current = last;}return result;}

is_in和not_in是这样定义的:

namespace detail {template<class Container>struct is_in {is_in(const Container& charset): _charset(charset){}
bool operator()(char c) const{return find(begin(_charset), end(_charset), c) != end(_charset);}
const Container& _charset;};
template<class Container>struct not_in {not_in(const Container& charset): _charset(charset){}
bool operator()(char c) const{return find(begin(_charset), end(_charset), c) == end(_charset);}
const Container& _charset;};
}
template<class Container>detail::not_in<Container> not_in(const Container& c){return detail::not_in<Container>(c);}
template<class Container>detail::is_in<Container> is_in(const Container& c){return detail::is_in<Container>(c);}

在处理空格作为分隔符时,使用std::istream_iterator<T>的明显答案已经给出并被大量投票。当然,元素可能不会被空格分隔,而是被一些分隔符分隔。我没有发现任何答案只是重新定义了所述分隔符的空格的含义,然后使用常规方法。

改变什么流认为是空白的方法,你只需使用(std::istream::imbue())和std::ctype<char>方面来改变流的std::locale,并定义空白的含义(也可以对std::ctype<wchar_t>这样做,但它实际上略有不同,因为std::ctype<char>是表驱动的,而std::ctype<wchar_t>是由虚函数驱动的)。

#include <iostream>#include <algorithm>#include <iterator>#include <sstream>#include <locale>
struct whitespace_mask {std::ctype_base::mask mask_table[std::ctype<char>::table_size];whitespace_mask(std::string const& spaces) {std::ctype_base::mask* table = this->mask_table;std::ctype_base::mask const* tab= std::use_facet<std::ctype<char>>(std::locale()).table();for (std::size_t i(0); i != std::ctype<char>::table_size; ++i) {table[i] = tab[i] & ~std::ctype_base::space;}std::for_each(spaces.begin(), spaces.end(), [=](unsigned char c) {table[c] |= std::ctype_base::space;});}};class whitespace_facet: private whitespace_mask, public std::ctype<char> {public:whitespace_facet(std::string const& spaces): whitespace_mask(spaces), std::ctype<char>(this->mask_table) {}};
struct whitespace {std::string spaces;whitespace(std::string const& spaces): spaces(spaces) {}};std::istream& operator>>(std::istream& in, whitespace const& ws) {std::locale loc(in.getloc(), new whitespace_facet(ws.spaces));in.imbue(loc);return in;}// everything above would probably go into a utility library...
int main() {std::istringstream in("a, b, c, d, e");std::copy(std::istream_iterator<std::string>(in >> whitespace(", ")),std::istream_iterator<std::string>(),std::ostream_iterator<std::string>(std::cout, "\n"));
std::istringstream pipes("a b c|  d |e     e");std::copy(std::istream_iterator<std::string>(pipes >> whitespace("|")),std::istream_iterator<std::string>(),std::ostream_iterator<std::string>(std::cout, "\n"));}

大部分代码用于打包提供软分隔符的通用工具:一行中的多个分隔符被合并。无法生成空序列。当一个流中需要不同的分隔符时,你可能会使用共享流缓冲区使用不同的设置流:

void f(std::istream& in) {std::istream pipes(in.rdbuf());pipes >> whitespace("|");std::istream comma(in.rdbuf());comma >> whitespace(",");
std::string s0, s1;if (pipes >> s0 >> std::ws   // read up to first pipe and ignore sequence of pipes&& comma >> s1 >> std::ws) { // read up to first comma and ignore commas// ...}}

谢谢你@贾罗·阿卜杜勒·托里比奥·西斯内罗斯。它对我有用,但你的函数返回一些空元素。所以对于不带空的返回,我编辑了以下内容:

std::vector<std::string> split(std::string str, const char* delim) {std::vector<std::string> v;std::string tmp;
for(std::string::const_iterator i = str.begin(); i <= str.end(); ++i) {if(*i != *delim && i != str.end()) {tmp += *i;} else {if (tmp.length() > 0) {v.push_back(tmp);}tmp = "";}}
return v;}

使用:

std::string s = "one:two::three";std::string delim = ":";std::vector<std::string> vv = split(s, delim.c_str());

我们可以在c++中使用strtok

#include <iostream>#include <cstring>using namespace std;
int main(){char str[]="Mickey M;12034;911416313;M;01a;9001;NULL;0;13;12;0;CPP,C;MSC,3D;FEND,BEND,SEC;";char *pch = strtok (str,";,");while (pch != NULL){cout<<pch<<"\n";pch = strtok (NULL, ";,");}return 0;}
#include <iostream>#include <string>#include <sstream>#include <algorithm>#include <iterator>#include <vector>
int main() {using namespace std;int n=8;string sentence = "10 20 30 40 5 6 7 8";istringstream iss(sentence);
vector<string> tokens;copy(istream_iterator<string>(iss),istream_iterator<string>(),back_inserter(tokens));
for(int i=0;i<n;i++){cout<<tokens.at(i);}

}

作为一个业余爱好者,这是我想到的第一个解决方案。我有点好奇为什么我还没有看到类似的解决方案,我是不是做错了什么?

#include <iostream>#include <string>#include <vector>
std::vector<std::string> split(const std::string &s, const std::string &delims){std::vector<std::string> result;std::string::size_type pos = 0;while (std::string::npos != (pos = s.find_first_not_of(delims, pos))) {auto pos2 = s.find_first_of(delims, pos);result.emplace_back(s.substr(pos, std::string::npos == pos2 ? pos2 : pos2 - pos));pos = pos2;}return result;}
int main(){std::string text{"And then I said: \"I don't get it, why would you even do that!?\""};std::string delims{" :;\".,?!"};auto words = split(text, delims);std::cout << "\nSentence:\n  " << text << "\n\nWords:";for (const auto &w : words) {std::cout << "\n  " << w;}return 0;}

http://cpp.sh/7wmzy

这是我的条目:

template <typename Container, typename InputIter, typename ForwardIter>Containersplit(InputIter first, InputIter last,ForwardIter s_first, ForwardIter s_last){Container output;
while (true) {auto pos = std::find_first_of(first, last, s_first, s_last);output.emplace_back(first, pos);if (pos == last) {break;}
first = ++pos;}
return output;}
template <typename Output = std::vector<std::string>,typename Input = std::string,typename Delims = std::string>Outputsplit(const Input& input, const Delims& delims = " "){using std::cbegin;using std::cend;return split<Output>(cbegin(input), cend(input),cbegin(delims), cend(delims));}
auto vec = split("Mary had a little lamb");

第一个定义是一个STL风格的泛型函数,使用两对迭代器。第二个是一个方便的函数,可以节省您自己执行所有begin()end()。例如,如果您想使用list,您还可以将输出容器类型指定为模板参数。

它的优雅之处(IMO)在于,与大多数其他答案不同,它不仅限于字符串,还可以与任何STL兼容的容器一起使用。在不对上面的代码进行任何更改的情况下,你可以说:

using vec_of_vecs_t = std::vector<std::vector<int>>;
std::vector<int> v{1, 2, 0, 3, 4, 5, 0, 7, 8, 0, 9};auto r = split<vec_of_vecs_t>(v, std::initializer_list<int>{0, 2});

这将在每次遇到02时将向量v拆分为单独的向量。

(还有一个额外的好处是,使用字符串,这种实现比基于strtok()getline()的版本更快,至少在我的系统上是这样。

如果你想用一些字符分割字符串,你可以使用

#include<iostream>#include<string>#include<vector>#include<iterator>#include<sstream>#include<string>
using namespace std;void replaceOtherChars(string &input, vector<char> &dividers){const char divider = dividers.at(0);int replaceIndex = 0;vector<char>::iterator it_begin = dividers.begin()+1,it_end= dividers.end();for(;it_begin!=it_end;++it_begin){replaceIndex = 0;while(true){replaceIndex=input.find_first_of(*it_begin,replaceIndex);if(replaceIndex==-1)break;input.at(replaceIndex)=divider;}}}vector<string> split(string str, vector<char> chars, bool missEmptySpace =true ){vector<string> result;const char divider = chars.at(0);replaceOtherChars(str,chars);stringstream stream;stream<<str;string temp;while(getline(stream,temp,divider)){if(missEmptySpace && temp.empty())continue;result.push_back(temp);}return result;}int main(){string str ="milk, pigs.... hot-dogs ";vector<char> arr;arr.push_back(' '); arr.push_back(','); arr.push_back('.');vector<string> result = split(str,arr);vector<string>::iterator it_begin= result.begin(),it_end= result.end();for(;it_begin!=it_end;++it_begin){cout<<*it_begin<<endl;}return 0;}

这是顶部答案之一的扩展。它现在支持设置返回元素的最大数量,N。字符串的最后一位将在第N个元素中结束。MAXELEMENTS参数是可选的,如果设置为默认0,它将返回无限数量的元素。:-)

. h:

class Myneatclass {public:static std::vector<std::string>& split(const std::string &s, char delim, std::vector<std::string> &elems, const size_t MAXELEMENTS = 0);static std::vector<std::string> split(const std::string &s, char delim, const size_t MAXELEMENTS = 0);};

. cpp:

std::vector<std::string>& Myneatclass::split(const std::string &s, char delim, std::vector<std::string> &elems, const size_t MAXELEMENTS) {std::stringstream ss(s);std::string item;while (std::getline(ss, item, delim)) {elems.push_back(item);if (MAXELEMENTS > 0 && !ss.eof() && elems.size() + 1 >= MAXELEMENTS) {std::getline(ss, item);elems.push_back(item);break;}}return elems;}std::vector<std::string> Myneatclass::split(const std::string &s, char delim, const size_t MAXELEMENTS) {std::vector<std::string> elems;split(s, delim, elems, MAXELEMENTS);return elems;}

短而优雅

#include <vector>#include <string>using namespace std;
vector<string> split(string data, string token){vector<string> output;size_t pos = string::npos; // size_t to avoid improbable overflowdo{pos = data.find(token);output.push_back(data.substr(0, pos));if (string::npos != pos)data = data.substr(pos + token.size());} while (string::npos != pos);return output;}

可以使用任何字符串作为分隔符,也可以与二进制数据一起使用(std::字符串支持二进制数据,包括空值)

使用:

auto a = split("this!!is!!!example!string", "!!");

输出:

thisis!example!string
#include <iostream>#include <vector>using namespace std;
int main() {string str = "ABC AABCD CDDD RABC GHTTYU FR";str += " "; //dirty hack: adding extra space to the endvector<string> v;
for (int i=0; i<(int)str.size(); i++) {int a, b;a = i;
for (int j=i; j<(int)str.size(); j++) {if (str[j] == ' ') {b = j;i = j;break;}}v.push_back(str.substr(a, b-a));}
for (int i=0; i<v.size(); i++) {cout<<v[i].size()<<" "<<v[i]<<endl;}return 0;}

只是为了方便:

template<class V, typename T>bool in(const V &v, const T &el) {return std::find(v.begin(), v.end(), el) != v.end();}

基于多个分隔符的实际拆分:

std::vector<std::string> split(const std::string &s,const std::vector<char> &delims) {std::vector<std::string> res;auto stuff = [&delims](char c) { return !in(delims, c); };auto space = [&delims](char c) { return in(delims, c); };auto first = std::find_if(s.begin(), s.end(), stuff);while (first != s.end()) {auto last = std::find_if(first, s.end(), space);res.push_back(std::string(first, last));first = std::find_if(last + 1, s.end(), stuff);}return res;}

使用方法:

int main() {std::string s = "   aaa,  bb  cc ";for (auto el: split(s, {' ', ','}))std::cout << el << std::endl;return 0;}

对于那些需要用字符串分隔符拆分字符串的人,也许你可以试试我下面的解决方案。

std::vector<size_t> str_pos(const std::string &search, const std::string &target){std::vector<size_t> founds;
if(!search.empty()){size_t start_pos = 0;
while (true){size_t found_pos = target.find(search, start_pos);
if(found_pos != std::string::npos){size_t found = found_pos;
founds.push_back(found);
start_pos = (found_pos + 1);}else{break;}}}
return founds;}
std::string str_sub_index(size_t begin_index, size_t end_index, const std::string &target){std::string sub;
size_t size = target.length();
const char* copy = target.c_str();
for(size_t i = begin_index; i <= end_index; i++){if(i >= size){break;}else{char c = copy[i];
sub += c;}}
return sub;}
std::vector<std::string> str_split(const std::string &delimiter, const std::string &target){std::vector<std::string> splits;
if(!delimiter.empty()){std::vector<size_t> founds = str_pos(delimiter, target);
size_t founds_size = founds.size();
if(founds_size > 0){size_t search_len = delimiter.length();
size_t begin_index = 0;
for(int i = 0; i <= founds_size; i++){std::string sub;
if(i != founds_size){size_t pos  = founds.at(i);
sub = str_sub_index(begin_index, pos - 1, target);
begin_index = (pos + search_len);}else{sub = str_sub_index(begin_index, (target.length() - 1), target);}
splits.push_back(sub);}}}
return splits;}

这些代码片段由3个函数组成。坏消息是使用str_split函数,你将需要另外两个函数。是的,这是一大块代码。但好消息是,这两个额外的函数能够独立工作,有时也很有用…:)

main()块中测试函数,如下所示:

int main(){std::string s = "Hello, world! We need to make the world a better place. Because your world is also my world, and our children's world.";
std::vector<std::string> split = str_split("world", s);
for(int i = 0; i < split.size(); i++){std::cout << split[i] << std::endl;}}

它将产生:

Hello,! We need to make thea better place. Because youris also my, and our children's.

我相信这不是最有效的代码,但至少它有效。希望它有帮助。

这是我对这个问题的解决方案:

vector<string> get_tokens(string str) {vector<string> dt;stringstream ss;string tmp;ss << str;for (size_t i; !ss.eof(); ++i) {ss >> tmp;dt.push_back(tmp);}return dt;}

此函数返回字符串的向量。

#include <iostream>#include <regex>
using namespace std;
int main() {string s = "foo bar  baz";regex e("\\s+");regex_token_iterator<string::iterator> i(s.begin(), s.end(), e, -1);regex_token_iterator<string::iterator> end;while (i != end)cout << " [" << *i++ << "]";}

IMO,这是最接近python的re.split()。有关regex_token_iterator的更多信息,请参阅cplusplus.com。-1(ctor中regex_token_iterator第四个参数)是序列中不匹配的部分,使用匹配作为分隔符。

这是我的看法。我必须逐字处理输入字符串,这可以通过使用空格来计算单词来完成,但我觉得这会很乏味,我应该将单词拆分为向量。

#include<iostream>#include<vector>#include<string>#include<stdio.h>using namespace std;int main(){char x = '\0';string s = "";vector<string> q;x = getchar();while(x != '\n'){if(x == ' '){q.push_back(s);s = "";x = getchar();continue;}s = s + x;x = getchar();}q.push_back(s);for(int i = 0; i<q.size(); i++)cout<<q[i]<<" ";return 0;}
  1. 不考虑多个空间。
  2. 如果最后一个单词后面没有紧跟换行符,则它包括最后一个单词的最后一个字符和换行符之间的空格。

基于Galik的回答我做了这个。这主要是在这里,所以我不必一遍又一遍地写它。这是疯狂的,C++仍然没有本机拆分功能。特点:

  • 应该非常快。
  • 很容易理解(我认为)。
  • 合并空节。
  • 简单地使用几个分隔符(例如"\r\n"
#include <string>#include <vector>#include <algorithm>
std::vector<std::string> split(const std::string& s, const std::string& delims){using namespace std;
vector<string> v;
// Start of an element.size_t elemStart = 0;
// We start searching from the end of the previous element, which// initially is the start of the string.size_t elemEnd = 0;
// Find the first non-delim, i.e. the start of an element, after the end of the previous element.while((elemStart = s.find_first_not_of(delims, elemEnd)) != string::npos){// Find the first delem, i.e. the end of the element (or if this fails it is the end of the string).elemEnd = s.find_first_of(delims, elemStart);// Add it.v.emplace_back(s, elemStart, elemEnd == string::npos ? string::npos : elemEnd - elemStart);}// When there are no more non-spaces, we are done.
return v;}

我的stringu32string~的一般实现,使用boost::algorithm::split签名。

template<typename CharT, typename UnaryPredicate>void split(std::vector<std::basic_string<CharT>>& split_result,const std::basic_string<CharT>& s,UnaryPredicate predicate){using ST = std::basic_string<CharT>;using std::swap;std::vector<ST> tmp_result;auto iter = s.cbegin(),end_iter = s.cend();while (true){/*** edge case: empty str -> push an empty str and exit.*/auto find_iter = find_if(iter, end_iter, predicate);tmp_result.emplace_back(iter, find_iter);if (find_iter == end_iter) { break; }iter = ++find_iter;}swap(tmp_result, split_result);}

template<typename CharT>void split(std::vector<std::basic_string<CharT>>& split_result,const std::basic_string<CharT>& s,const std::basic_string<CharT>& char_candidate){std::unordered_set<CharT> candidate_set(char_candidate.cbegin(),char_candidate.cend());auto predicate = [&candidate_set](const CharT& c) {return candidate_set.count(c) > 0U;};return split(split_result, s, predicate);}
template<typename CharT>void split(std::vector<std::basic_string<CharT>>& split_result,const std::basic_string<CharT>& s,const CharT* literals){return split(split_result, s, std::basic_string<CharT>(literals));}

是的,我看了所有30个例子。

我找不到适用于多字符分隔符的split版本,所以这是我的:

#include <string>#include <vector>
using namespace std;
vector<string> split(const string &str, const string &delim){const auto delim_pos = str.find(delim);
if (delim_pos == string::npos)return {str};
vector<string> ret{str.substr(0, delim_pos)};auto tail = split(str.substr(delim_pos + delim.size(), string::npos), delim);
ret.insert(ret.end(), tail.begin(), tail.end());
return ret;}

可能不是最有效的实现,但它是一个非常简单的递归解决方案,仅使用<string><vector>

啊,它是用C++11编写的,但是这段代码没有什么特别之处,所以你可以很容易地将它调整到C++98。

并不是说我们需要更多的答案,但这是我在受到埃文·特兰的启发后想出的。

std::vector <std::string> split(const string &input, auto delimiter, bool skipEmpty=true) {/*Splits a string at each delimiter and returns these strings as a string vector.If the delimiter is not found then nothing is returned.If skipEmpty is true then strings between delimiters that are 0 in length will be skipped.*/bool delimiterFound = false;int pos=0, pPos=0;std::vector <std::string> result;while (true) {pos = input.find(delimiter,pPos);if (pos != std::string::npos) {if (skipEmpty==false or pos-pPos > 0) // if empty values are to be kept or notresult.push_back(input.substr(pPos,pos-pPos));delimiterFound = true;} else {if (pPos < input.length() and delimiterFound) {if (skipEmpty==false or input.length()-pPos > 0) // if empty values are to be kept or notresult.push_back(input.substr(pPos,input.length()-pPos));}break;}pPos = pos+1;}return result;}
#include <iostream>#include <string>#include <deque>
std::deque<std::string> split(const std::string& line,std::string::value_type delimiter,bool skipEmpty = false) {std::deque<std::string> parts{};
if (!skipEmpty && !line.empty() && delimiter == line.at(0)) {parts.push_back({});}
for (const std::string::value_type& c : line) {if ((c == delimiter&&(skipEmpty ? (!parts.empty() && !parts.back().empty()) : true))||(c != delimiter && parts.empty())) {parts.push_back({});}
if (c != delimiter) {parts.back().push_back(c);}}
if (skipEmpty && !parts.empty() && parts.back().empty()) {parts.pop_back();}
return parts;}
void test(const std::string& line) {std::cout << line << std::endl;
std::cout << "skipEmpty=0 |";for (const std::string& part : split(line, ':')) {std::cout << part << '|';}std::cout << std::endl;
std::cout << "skipEmpty=1 |";for (const std::string& part : split(line, ':', true)) {std::cout << part << '|';}std::cout << std::endl;
std::cout << std::endl;}
int main() {test("foo:bar:::baz");test("");test("foo");test(":");test("::");test(":foo");test("::foo");test(":foo:");test(":foo::");
return 0;}

输出:

foo:bar:::bazskipEmpty=0 |foo|bar|||baz|skipEmpty=1 |foo|bar|baz|

skipEmpty=0 |skipEmpty=1 |
fooskipEmpty=0 |foo|skipEmpty=1 |foo|
:skipEmpty=0 |||skipEmpty=1 |
::skipEmpty=0 ||||skipEmpty=1 |
:fooskipEmpty=0 ||foo|skipEmpty=1 |foo|
::fooskipEmpty=0 |||foo|skipEmpty=1 |foo|
:foo:skipEmpty=0 ||foo||skipEmpty=1 |foo|
:foo::skipEmpty=0 ||foo|||skipEmpty=1 |foo|

此答案获取字符串并将其放入字符串向量中。它使用提升库。

#include <boost/algorithm/string.hpp>std::vector<std::string> strs;boost::split(strs, "string to split", boost::is_any_of("\t "));

使用std::string_view和Eric Niebler的range-v3库:

https://wandbox.org/permlink/kW5lwRCL1pxjp2pW

#include <iostream>#include <string>#include <string_view>#include "range/v3/view.hpp"#include "range/v3/algorithm.hpp"
int main() {std::string s = "Somewhere down the range v3 library";ranges::for_each(s|   ranges::view::split(' ')|   ranges::view::transform([](auto &&sub) {return std::string_view(&*sub.begin(), ranges::distance(sub));}),[](auto s) {std::cout << "Substring: " << s << "\n";});}

通过使用范围for循环而不是ranges::for_each算法:

#include <iostream>#include <string>#include <string_view>#include "range/v3/view.hpp"
int main(){std::string str = "Somewhere down the range v3 library";for (auto s : str | ranges::view::split(' ')| ranges::view::transform([](auto&& sub) { return std::string_view(&*sub.begin(), ranges::distance(sub)); })){std::cout << "Substring: " << s << "\n";}}

我有一个与其他解决方案非常不同的方法,它提供了很多其他解决方案所缺乏的价值,但当然也有自己的缺点。

首先,这个问题可以用一个循环来解决,不需要额外的内存,只考虑四个逻辑情况。从概念上讲,我们对边界感兴趣。我们的代码应该反映出这一点:让我们遍历字符串,一次查看两个字符,记住我们在字符串的开头和结尾有特殊情况。

缺点是我们必须编写实现,这有点冗长,但主要是方便的样板。

好处是我们编写了实现,因此很容易根据特定需求对其进行定制,例如区分左边界和写单词边界,使用任何一组分隔符,或处理其他情况,例如非边界或错误位置。

using namespace std;
#include <iostream>#include <string>
#include <cctype>
typedef enum boundary_type_e {E_BOUNDARY_TYPE_ERROR = -1,E_BOUNDARY_TYPE_NONE,E_BOUNDARY_TYPE_LEFT,E_BOUNDARY_TYPE_RIGHT,} boundary_type_t;
typedef struct boundary_s {boundary_type_t type;int pos;} boundary_t;
bool is_delim_char(int c) {return isspace(c); // also compare against any other chars you want to use as delimiters}
bool is_word_char(int c) {return ' ' <= c && c <= '~' && !is_delim_char(c);}
boundary_t maybe_word_boundary(string str, int pos) {int len = str.length();if (pos < 0 || pos >= len) {return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR};} else {if (pos == 0 && is_word_char(str[pos])) {// if the first character is word-y, we have a left boundary at the beginningreturn (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos};} else if (pos == len - 1 && is_word_char(str[pos])) {// if the last character is word-y, we have a right boundary left of the null terminatorreturn (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};} else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) {// if we have a delimiter followed by a word char, we have a left boundary left of the word charreturn (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1};} else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) {// if we have a word char followed by a delimiter, we have a right boundary right of the word charreturn (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};}return (boundary_t){.type = E_BOUNDARY_TYPE_NONE};}}
int main() {string str;getline(cin, str);
int len = str.length();for (int i = 0; i < len; i++) {boundary_t boundary = maybe_word_boundary(str, i);if (boundary.type == E_BOUNDARY_TYPE_LEFT) {// whatever} else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) {// whatever}}}

如你所见,代码非常容易理解和微调,代码的实际用法非常简短。使用C++不应该阻止我们编写最简单、最容易定制的代码,即使这意味着不使用STL。我认为这是Linus Torvalds可能称之为“品味”的一个例子,因为我们消除了所有不需要的逻辑,同时编写的风格自然允许在需要时处理更多情况。

可以改进此代码的可能是使用enum class,接受maybe_word_boundary中指向is_word_char的函数指针而不是直接调用is_word_char,并传递一个lambda。

C++17版本没有任何内存分配(除了可能是std::function

void iter_words(const std::string_view& input, const std::function<void(std::string_view)>& process_word) {
auto itr = input.begin();
auto consume_whitespace = [&]() {for(; itr != input.end(); ++itr) {if(!isspace(*itr))return;}};
auto consume_letters = [&]() {for(; itr != input.end(); ++itr) {if(isspace(*itr))return;}};
while(true) {consume_whitespace();if(itr == input.end())return;auto word_start = itr - input.begin();consume_letters();auto word_end = itr - input.begin();process_word(input.substr(word_start, word_end - word_start));}}
int main() {iter_words("foo bar", [](std::string_view sv) {std::cout << "Got word: " <<  sv << '\n';});return 0;}

C++20最终为我们提供了一个split函数。或者更确切地说,一个范围适配器。godbolt链接

#include <iostream>#include <ranges>#include <string_view>
namespace ranges = std::ranges;namespace views = std::views;
using str = std::string_view;
constexpr auto view ="Multiple words"| views::split(' ')| views::transform([](auto &&r) -> str {return {&*r.begin(),static_cast<str::size_type>(ranges::distance(r))};});
auto main() -> int {for (str &&sv : view) {std::cout << sv << '\n';}}

每个人都回答了预定义的字符串输入。我认为这个答案会帮助扫描输入的人。

我使用令牌向量来保存字符串令牌。它是可选的。

#include <bits/stdc++.h>
using namespace std ;int main(){string str, token ;getline(cin, str) ; // get the string as inputistringstream ss(str); // insert the string into tokenizer
vector<string> tokens; // vector tokens holds the tokens
while (ss >> token) tokens.push_back(token); // splits the tokensfor(auto x : tokens) cout << x << endl ; // prints the tokens
return 0;}

样本输入:

port city international university

样本输出:

portcityinternationaluniversity

请注意,默认情况下,这仅适用于作为分隔符的空格。您可以使用自定义分隔符。为此,您已自定义代码。让分隔符为','。所以使用

char delimiter = ',' ;while(getline(ss, token, delimiter)) tokens.push_back(token) ;

而不是

while (ss >> token) tokens.push_back(token);

虽然有一些答案提供了C++20的解决方案,但由于它被发布,有一些更改并应用于C++20作为缺陷报告。因此,解决方案更短、更好:

#include <iostream>#include <ranges>#include <string_view>
namespace views = std::views;using str = std::string_view;
constexpr str text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";
auto splitByWords(str input) {return input| views::split(' ')| views::transform([](auto &&r) -> str {return {r.begin(), r.end()};});}
auto main() -> int {for (str &&word : splitByWords(text)) {std::cout << word << '\n';}}

截至今天,它仍然仅在GCC(godbolt链接)的主干分支上可用。它基于两个更改:用于std::string_view的P1391迭代器构造函数和用于保留范围类型的P2210 DR修复std::views::split

在C++23中,不需要任何transform样板,因为P1989向std::string_view添加了一个范围构造函数:

#include <iostream>#include <ranges>#include <string_view>
namespace views = std::views;
constexpr std::string_view text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";
auto main() -> int {for (std::string_view&& word : text | views::split(' ')) {std::cout << word << '\n';}}

godbolt链接

有一个更简单的方法来做到这一点!

#include <vector>#include <string>std::vector<std::string> splitby(std::string string, char splitter) {int splits = 0;std::vector<std::string> result = {};std::string locresult = "";for (unsigned int i = 0; i < string.size(); i++) {if ((char)string.at(i) != splitter) {locresult += string.at(i);}else {result.push_back(locresult);locresult = "";}}if (splits == 0) {result.push_back(locresult);}return result;}
void printvector(std::vector<std::string> v) {std::cout << '{';for (unsigned int i = 0; i < v.size(); i++) {if (i < v.size() - 1) {std::cout << '"' << v.at(i) << "\",";}else {std::cout << '"' << v.at(i) << "\"";}}std::cout << "}\n";}

最小解是一个函数,它将std::string和一组分隔符字符(作为std::string)作为输入,并返回std::strings中的std::vector

#include <string>#include <vector>
std::vector<std::string>tokenize(const std::string& str, const std::string& delimiters){using ssize_t = std::string::size_type;const ssize_t str_ln = str.length();ssize_t last_pos = 0;
// container for the extracted tokensstd::vector<std::string> tokens;
while (last_pos < str_ln) {// find the position of the next delimiterssize_t pos = str.find_first_of(delimiters, last_pos);
// if no delimiters found, set the position to the length of stringif (pos == std::string::npos)pos = str_ln;
// if the substring is nonempty, store it in the containerif (pos != last_pos)tokens.emplace_back(str.substr(last_pos, pos - last_pos));
// scan past the previous substringlast_pos = pos + 1;}
return tokens;}

使用示例:

#include <iostream>
int main(){std::string input_str = "one + two * (three - four)!!---! ";const char* delimiters = "! +- (*)";std::vector<std::string> tokens = tokenize(input_str, delimiters);
std::cout << "input = '" << input_str << "'\n"<< "delimiters = '" << delimiters << "'\n"<< "nr of tokens found = " << tokens.size() << std::endl;for (const std::string& tk : tokens) {std::cout << "token = '" << tk << "'\n";}
return 0;}

我无法相信这些答案有多么复杂。为什么没有人提出这样简单的建议呢?

#include <iostream>#include <sstream>
std::string input = "This is a sentence to read";std::istringstream ss(input);std::string token;
while(std::getline(ss, token, ' ')) {std::cout << token << endl;}

还有另一种方式——延续传递样式、零分配、基于函数的定界。

 void split( auto&& data, auto&& splitter, auto&& operation ) {using std::begin; using std::end;auto prev = begin(data);while (prev != end(data) ) {auto&&[prev,next] = splitter( prev, end(data) );operation(prev,next);prev = next;}}

现在我们可以基于此编写特定的拆分函数。

 auto anyOfSplitter(auto delimiters) {return [delimiters](auto begin, auto end) {while( begin != end && 0 == std::string_view(begin, end).find_first_of(delimiters) ) {++begin;}auto view = std::string_view(begin, end);auto next = view.find_first_of(delimiters);if (next != view.npos)return std::make_pair( begin, begin + next );elsereturn std::make_pair( begin, end );};}

我们现在可以生成一个传统的std字符串拆分,如下所示:

 template<class C>auto traditional_any_of_split( std::string_view<C> str, std::string_view<C> delim ) {std::vector<std::basic_string<C>> retval;split( str, anyOfSplitter(delim), [&](auto s, auto f) {retval.emplace_back(s,f);});return retval;}

或者我们可以用查找

 auto findSplitter(auto delimiter) {return [delimiter](auto begin, auto end) {while( begin != end && 0 == std::string_view(begin, end).find(delimiter) ) {begin += delimiter.size();}auto view = std::string_view(begin, end);auto next = view.find(delimiter);if (next != view.npos)return std::make_pair( begin, begin + next );elsereturn std::make_pair( begin, end );};}
template<class C>auto traditional_find_split( std::string_view<C> str, std::string_view<C> delim ) {std::vector<std::basic_string<C>> retval;split( str, findSplitter(delim), [&](auto s, auto f) {retval.emplace_back(s,f);});return retval;}

通过更换分离器部分。

这两种方法都分配一个返回值缓冲区。我们可以以手动管理生命周期为代价将返回值交换到字符串视图。

我们也可以采取一个延续,一次传递一个字符串视图,甚至避免分配视图的向量。

这可以通过abort选项进行扩展,以便我们可以在读取几个前缀字符串后中止。