“字符串”定义背后的历史

直到最近我才开始思考,但是我不确定为什么我们称字符串为 strings。我是一个。NET 程序员,但我相信字符串的概念存在于几乎每一种编程语言。

除了编程,我想我没有听说过 string这个词用来描述单词或字母。快速搜索一下“ Define: string”就会得到一堆定义,这些定义与字母、单词或任何与编程相关的本质概念都没有关系。

我的猜测是,在过去,字符串实际上只是特定长度的字符数组,通常在末尾有一个分隔字符。但是,我没有看到从“字符数组”到 string的自然转换。

有人能提供一些见解,为什么我们调用字符串 strings

8918 次浏览

It's called a strings, because it's actually an array of char type elements.

That being said, they are "stringing together" (or is it strung together) via this array, which turns them into a "string".

My assumption has always been that the programming term originated from the following definition of the word "string" (from Merriam-Webster):

(1): a series of things arranged in or as if in a line <a string of cars> <a string of names>

(2): a sequence of like items (as bits, characters, or words)

Since a string in programming is simply an ordered sequence of characters, referring to this as a "string of characters" (or simply "string") seems like the most probable origin.

I suspect it's because string originally meant just a sequence of data values: "I'll just string these together" etc. These values didn't have to be characters. One very common use for this general concept happened to be a sequence of characters, and this took over as the general meaning of the word.

From this reference:

The 1971 OED (p. 3097) quotes an 1891 Century Dictionary on a source in the Milwaukee Sentinel of 11 Jan. 1898 (section 3, p. 1) to the effect that this is a compositor's term. Printers would paste up the text that they had generated in a long strip of characters. (Presumably, they were paid by the foot, not by the word!) The quote says that it was not unusual for compositors to create more than 1500 (characters?) per hour.

The word was originally used to differentiate between a set of values to which the particular order of elements doesn't matter (for instance, a set of random samples of measurements) and another that could only have its meaning preserved when the order is also preserved. Originally a string could be a set of any kind of values, but since in the post-mainframe era a string of characters is by far the most common kind, the fact that the values are characters became a "default".

From searching through the ACM bibliography it seems the word string acquired its meaning in computer science during the 1960s. At the beginning a string is a general kind of sequence or list, e.g. A command language for handling strings of symbols from 1958.

This article explicitly mentions "character strings" in 1964.

Unfortunately I can't access the full texts, which are behind a toll booth.

The earliest reference I could find in computing is from March 1963's METEOR: A LISP Interpreter for String Transformations by Daniel G. Bobrow at MIT's AI Labs.

However, definition 15d. in the Oxford English Dictionary is:

Computing A linear sequence of records or data.

... and with a first quotation from a 1956 Journal of the Association for Computing Machinery:

Areas are set aside for shuttling strings of control fields back and forth until a completely sorted sequence is obtained.

This use naturally follows on from definition 15c.:

Math., etc. A sequence of symbols or linguistic elements in a definite order.

... and first used in Clarence Irving Lewis and Cooper Harold Langford's Symbolic Logic (1932):

Propositions are not strings of marks, or series of sounds, except incidentally.

This in turn follows on from many other, much earlier definitions for things in a line.

I had guessed that "string" was in use by mathematicians long before its adoption in programming languages. Turing machines effectively operate on strings. Turing may not have used the term, but it is used everywhere in automata textbooks, going back decades.

The earliest reference I could find was a fragment in Google books of a 1944 article "Recursively enumerable sets of positive integers and their decision problems" by logician Emil Post in Bulletin of the AMS. Fortunately, AMS provides online archives of complete articles free for download. Here is a link: http://www.ams.org/journals/bull/1944-50-05/S0002-9904-1944-08111-1/S0002-9904-1944-08111-1.pdf

I think there is little doubt that he is using "string" in the conventional sense used in computer science. P. 286 "For working purposes, we in- troduce the letter b, and consider "strings" of 1's and b's such as 11b1bb1. An operation on such strings such as "b1bP produces P1bb1" we term a normal operation. This particular normal operation is ap- plicable only to strings starting with b1b, and the derived string is then obtained from the given string by first removing the initial b1b, and then tacking on 1bb1 at the end. Thus b1bb becomes b1bb1."

In a talk about Javascript history Douglas Crockford says "Nobody knows" and gives a few alternatives: http://www.youtube.com/watch?v=RO1Wnu-xKoY#t=2989

I found the report which in it, allegedly, there is the first reference ever in computer history to a series of characters as a "string". I think it's at the top right column of page 4 of the pdf, which is numbered 47:

http://web.eecs.umich.edu/~bchandra/courses/papers/Naure_Algol60.pdf

A string is a sequence of discrete objects (usually char).

Given that, I would probably venture a guess that it may have to do with a metaphor related to "string of pearls". Each bead on the string is a single character.