如何理解 Ruby 中的符号

尽管阅读了“ 理解 Ruby 符号”,我仍然对使用符号时内存中数据的表示感到困惑。如果一个符号(其中两个包含在不同的对象中)存在于同一个内存位置,那么它们如何包含 与众不同值?我希望相同的内存位置包含相同的值。

下面是链接中的一段话:

与字符串不同,相同名称的符号被初始化,并且在 Ruby 会话期间只在内存中存在一次

我不明白它是如何设法区分同一个内存位置中包含的值的。

考虑一下这个例子:

patient1 = { :ruby => "red" }
patient2 = { :ruby => "programming" }


patient1.each_key {|key| puts key.object_id.to_s}
3918094
patient2.each_key {|key| puts key.object_id.to_s}
3918094

patient1patient2都是散列,没问题。然而,:ruby是一个符号。如果我们输出以下内容:

patient1.each_key {|key| puts key.to_s}

然后输出什么? "red"还是 "programming"

暂时忘记散列,我认为符号是一个值的 指针。我的问题是:

  • 我可以给符号赋值吗?
  • 一个符号仅仅是一个指向带有值的变量的指针吗?
  • 如果符号是全局的,这是否意味着一个符号总是指向一件事情?
44551 次浏览

Consider this:

x = :sym
y = :sym
(x.__id__ == y.__id__ ) && ( :sym.__id__ == x.__id__) # => true


x = "string"
y = "string"
(x.__id__ == y.__id__ ) || ( "string".__id__ == x.__id__) # => false

So, however you create a symbol object, as long as its contents are the same, it will refer to the same object in memory. This is not a problem because a symbol is an immutable object. Strings are mutable.


(In response to the comment below)

In the original article, the value is not being stored in a symbol, it is being stored in a hash. Consider this:

hash1 = { "string" => "value"}
hash2 = { "string" => "value"}

This creates six objects in the memory -- four string objects and two hash objects.

hash1 = { :symbol => "value"}
hash2 = { :symbol => "value"}

This only creates five objects in memory -- one symbol, two strings and two hash objects.

patient1 = { :ruby => "red" }
patient2 = { :ruby => "programming" }


patient1.each_key {|key| puts key.object_id.to_s}
3918094
patient2.each_key {|key| puts key.object_id.to_s}
3918094

patient1 and patient2 are both hashes, that's fine. :ruby however is a symbol. If we were to output the following:

patient1.each_key {|key| puts key.to_s}

Then what will be output? "red", or "programming"?

Neither, of course. The output will be ruby. Which, BTW, you could have found out in less time than it took you to type the question, by simply typing it into IRB instead.

Why would it be red or programming? Symbols always evaluate to themselves. The value of the symbol :ruby is the symbol :ruby itself and the string representation of the symbol :ruby is the string value "ruby".

[BTW: puts always converts its arguments to strings, anyway. There's no need to call to_s on it.]

The symbol :ruby does not contain "red" or "programming". The symbol :ruby is just the symbol :ruby. It is your hashes, patient1 and patient2 that each contain those values, in each case pointed to by the same key.

Think about it this way: If you go into the living room on christmas morning, and see two boxes with a tag on them that say "Kezzer" on them. On has socks in it, and the other has coal. You're not going to get confused and ask how "Kezzer" can contain both socks and coal, even though it is the same name. Because the name isn't containing the (crappy) presents. It's just pointing at them. Similarly, :ruby doesn't contain the values in your hash, it just points at them.

I was able to grock symbols when I thought of it like this. A Ruby string is an object that has a bunch of methods and properties. People like to use strings for keys, and when the string is used for a key then all those extra methods aren't used. So they made symbols, which are string objects with all the functionality removed, except that which is needed for it to be a good key.

Just think of symbols as constant strings.

patient1.each_key {|key| puts key.to_s}

Then what will be output? "red", or "programming"?

Neither, it will output "ruby".

You're confusing symbols and hashes. They aren't related, but they're useful together. The symbol in question is :ruby; it has nothing to do with the values in the hash, and it's internal integer representation will always be the same, and it's "value" (when converted to a string) will always be "ruby".

You might be presuming that the declaration you've made defines the value of a Symbol to be something other than what it is. In fact, a Symbol is just an "internalized" String value that remains constant. It is because they are stored using a simple integer identifier that they are frequently used as that is more efficient than managing a large number of variable-length strings.

Take the case of your example:

patient1 = { :ruby => "red" }

This should be read as: "declare a variable patient1 and define it to be a Hash, and in this store the value 'red' under the key (symbol 'ruby')"

Another way of writing this is:

patient1 = Hash.new
patient1[:ruby] = 'red'


puts patient1[:ruby]
# 'red'

As you are making an assignment it is hardly surprising that the result you get back is identical to what you assigned it with in the first place.

The Symbol concept can be a little confusing as it's not a feature of most other languages.

Each String object is distinct even if the values are identical:

[ "foo", "foo", "foo", "bar", "bar", "bar" ].each do |v|
puts v.inspect + ' ' + v.object_id.to_s
end


# "foo" 2148099960
# "foo" 2148099940
# "foo" 2148099920
# "bar" 2148099900
# "bar" 2148099880
# "bar" 2148099860

Every Symbol with the same value refers to the same object:

[ :foo, :foo, :foo, :bar, :bar, :bar ].each do |v|
puts v.inspect + ' ' + v.object_id.to_s
end


# :foo 228508
# :foo 228508
# :foo 228508
# :bar 228668
# :bar 228668
# :bar 228668

Converting strings to symbols maps identical values to the same unique Symbol:

[ "foo", "foo", "foo", "bar", "bar", "bar" ].each do |v|
v = v.to_sym
puts v.inspect + ' ' + v.object_id.to_s
end


# :foo 228508
# :foo 228508
# :foo 228508
# :bar 228668
# :bar 228668
# :bar 228668

Likewise, converting from Symbol to String creates a distinct string every time:

[ :foo, :foo, :foo, :bar, :bar, :bar ].each do |v|
v = v.to_s
puts v.inspect + ' ' + v.object_id.to_s
end


# "foo" 2148097820
# "foo" 2148097700
# "foo" 2148097580
# "bar" 2148097460
# "bar" 2148097340
# "bar" 2148097220

You can think of Symbol values as being drawn from an internal Hash table and you can see all values that have been encoded to Symbols using a simple method call:

Symbol.all_values


# => [:RUBY_PATCHLEVEL, :vi_editing_mode, :Separator, :TkLSHFT, :one?, :setuid?, :auto_indent_mode, :setregid, :back, :Fail, :RET, :member?, :TkOp, :AP_NAME, :readbyte, :suspend_context, :oct, :store, :WNOHANG, :@seek, :autoload, :rest, :IN_INPUT, :close_read, :type, :filename_quote_characters=, ...

As you define new symbols either by the colon-notation or by using .to_sym this table will grow.

I would recommend reading the Wikipedia article on hash tables - I think it will help you get a sense of what {:ruby => "red"} really means.

Another exercise that might help your understanding of the situation: consider {1 => "red"}. Semantically, this doesn't mean "set the value of 1 to "red"", which is impossible in Ruby. Rather, it means "create a Hash object, and store the value "red" for the key 1.

Symbols are not pointers. They do not contain values. Symbols simply are. :ruby is the symbol :ruby and that's all there is to it. It doesn't contain a value, it doesn't do anything, it just exists as the symbol :ruby. The symbol :ruby is a value just like the number 1 is. It doesn't point to another value any more than the number 1 does.

In short

Symbols solve the problem of creating human readable, immutable representations that also have the benefit of being simpler for the runtime to lookup than strings. Think of it like a name or label that can be reused.

Why :red is better than "red"

In dynamic object oriented languages you create complex, nested data structures with readable references. The hash is a common use case where you map values to unique keys — unique, at least, to each instance. You can't have more than one "red" key per hash.

However it would be more processor efficient to use a numeric index instead of string keys. So symbols were introduced as a compromise between speed and readability. Symbols resolve much easier than the equivalent string. By being human readable and easy for the runtime to resolve symbols are an ideal addition to a dynamic language.

Benefits

Since symbols are immutable they can be shared across the runtime. If two hash instances have a common lexicographic or semantic need for a red item the symbol :red would use roughly half the memory that the string "red" would have required for two hashes.

Since :red always resolves back to the same location in memory it can be reused across a hundred hash instances with almost no increase in memory, whereas using "red" will add a memory cost since each hash instance would need to store the mutable string upon creation.

Not sure how Ruby actually implements symbols/string but clearly a symbol offers less implementation overhead in the runtime since it's a fixed representation. Plus symbols takes one less character to type than a quoted string and less typing is the eternal pursuit of true Rubyists.

Summary

With a symbol like :red you get the readability of string representation with less overhead due to the cost of string comparison operations and the need to store each string instance in memory.

One easy way to wrap your head around this is to think, "what if I were using a string rather than a symbol?

patient1 = { "ruby" => "red" }
patient2 = { "ruby" => "programming" }

It isn't confusing at all, right? You're using "ruby" as a key in a hash.

"ruby" is a string literal, so that is the value. The memory address, or pointer, is not available to you. Every time you invoke "ruby", you are creating a new instance of it, that is, creating a new memory cell containing the same value - "ruby".

The hash then goes "what's my key value? Oh it's "ruby". Then maps that value to "red" or "programming". In other words, :ruby doesn't dereference to "red" or "programming". The hash maps :ruby to "red" or "programming".

Compare that to if we use symbols

patient1 = { :ruby => "red" }
patient2 = { :ruby => "programming" }

The value of :ruby is also "ruby", effectively.

Why? Because symbols are essentially string constants. Constants don't have multiple instances. It's the same memory address. And a memory address has a certain value, once dereferenced. For symbols, the pointer name is the symbol, and the dereferenced value is a string, which matches the symbol name, in this case, "ruby".

When in a hash, you are not using the symbol, the pointer, but the deferenced value. You're not using :ruby, but "ruby". The hash then looks up for key "ruby", the value is "red" or "programming", depending on how you defined the hash.

The paradigm shift and take-home concept is that a symbol's value is a completely separate concept from a value mapped to by a hash, given a key of that hash.

I'm new to Ruby, but I think (hope?) this is a simple way to look at it...

A symbol is not a variable or a constant. It doesn't stand in for, or point to, a value. A symbol IS a value.

All it is, is a string without the object overhead. The text and only the text.

So, this:

"hellobuddy"

Is the same as this:

:hellobuddy

Except you can't do, for example, :hellobuddy.upcase. It's the string value and ONLY the string value.

Likewise, this:

greeting =>"hellobuddy"

Is the same as this:

greeting => :hellobuddy

But, again, without the string object overhead.