电子邮件主题中的动画图标

我知道 数据 URI,其中 base64编码的数据可以在内联使用,如图像。今天我收到了一封垃圾邮件,邮件主题中有一个动画(gif)图标:

enter image description here

这里只有一个图标:

enter image description here

所以我唯一想到的就是数据 URI,以及 Gmail 是否允许在主题中插入一些表情符号。我看到了电子邮件的完整详细版本,并指向下图的主题行:

enter image description here

因此,GIF 来自 =?UTF-8?B?876Urg==?=编码的字符串,这类似于数据 URI 方案,但我不能得到它的图标。下面是元素 HTML 源码:

enter image description here

长话短说,有很多来自 https://mail.google.com/mail/e/XXX的表情符号,其中 XXX是十六进制数字。要么没有记录,要么我找不到。如果这是关于数据 URI 的,那么如何将它们包含在 Gmail 的电子邮件主题中呢?(我把那封电子邮件转发到一个雅虎的电子邮件帐户,看到的是 [?]而不是图标) ,如果不是,那么该编码字符串是如何解析的?

29838 次浏览

#Short description:

They are referred to internally as goomoji, and they appear to be a non-standard UTF-8 extension. When Gmail encounters one of these characters, it is replaced by the corresponding icon. I wasn't able to find any documentation on them, but I was able to reverse engineer the format.


#What are these icons?

Those icons are actually the icons that appear under the "Insert emoticons" panel.

Gmail Insert Emoticons

While I don't see the 52E icon in the list, there are several others that follow the same convention.

Note that there are also some icons whose names are prefixed, such as gtalk.03C gtalk.03C. I was not able to determine if or how these icons can be used in this manner.


#What is this Data URI thing?

It's not actually a Data URI, though it does share some similarities. It's actually a special syntax for encoding non-ASCII characters in email subjects, defined in RFC 2047. Basically, it works like this.

=?charset?encoding?data?=

So, in our example string, we have the following data.

=?UTF-8?B?876Urg==?=
  • charset = UTF-8
  • encoding = B (means base64)
  • data = 876Urg==

#So, how does it work?

We know that somehow, 876Urg== means the icon 52E, but how?

If we base64 decode 876Urg==, we get 0xf3be94ae. This looks like the following in binary:

11110011 10111110 10010100 10101110

These bits are consistent with a 4-byte UTF-8 encoded character.

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

So the relevant bits are the following.:

     011   111110   010100   101110

Or when aligned:

00001111 11100101 00101110

In hexadecimal, these bytes are the following:

FE52E

As you can see, except for the FE prefix which is presumably to distinguished the goomoji icons from other UTF-8 characters, it matches the 52E in the icon URL. Some testing proves that this holds true for other icons.


#Sounds like a lot of work, is there a converter?:

This can of course be scripted. I created the following Python code for my testing. These functions can convert the base64 encoded string to and from the short hex string found in the URL. Note, this code is written for Python 3, and is not Python 2 compatible.

###Conversion functions:

import base64


def goomoji_decode(code):
#Base64 decode.
binary = base64.b64decode(code)
#UTF-8 decode.
decoded = binary.decode('utf8')
#Get the UTF-8 value.
value = ord(decoded)
#Hex encode, trim the 'FE' prefix, and uppercase.
return format(value, 'x')[2:].upper()


def goomoji_encode(code):
#Add the 'FE' prefix and decode.
value = int('FE' + code, 16)
#Convert to UTF-8 character.
encoded = chr(value)
#Encode UTF-8 to binary.
binary = bytearray(encoded, 'utf8')
#Base64 encode return end return a UTF-8 string.
return base64.b64encode(binary).decode('utf-8')

###Examples:

print(goomoji_decode('876Urg=='))
print(goomoji_encode('52E'))

###Output:

52E
876Urg==

And, of course, finding an icon's URL simply requires creating a new draft in Gmail, inserting the icon you want, and using your browser's DOM inspector.

DOM Inspector

If you use the correct hex code point (e.g. fe4f4 for 'pile of poo') and If it is correctly encoded within the subject line header, let it be base64 (see @AlexanderOMara) or quoted-printable (=?utf-8?Q?=F3=BE=93=B4?=), then Gmail will automatically parse and replace it with the corresponding emoji.

Here's a Gmail emoji list for copying and pasting into subject lines - or email bodies. Animated emojis, which will grab even more attention in the inbox, are placed on a yellow background:

Gmail emojis on emailmarketingtipps.de

Many thanks to Alexander O'Mara for such a well-researched answer about the goomoji-tagged HTML images!

I just wanted to add three things:

  • There are still many many emoji (and other Unicode sequences generating pictures) that spammers and other erstwhile marketers are starting to use in email subject lines and that gmail does not convert to HTML images. In some browsers these show up bold and colored, which is almost as bad as animation. Browsers could also choose to animate these, but I don't know if any do. These Unicode sequences get displayed by the browser as Unicode text, so the exact appearance (color or not, animated or not, ...) depends on what text rendering system the browser is using. The appearance of a given Unicode emoji also depends on any Unicode variation selectors and emoji modifiers that appear near it in the Unicode code point sequence. Unlike the image-based emoji spam, these sequences can be copied-and-pasted out of the browser and into other apps as Unicode text.

  • I hope the many marketers reading this StackOverflow question will just say no. It is a horrible idea to include these sequences in your email subject lines and it will immediately tarnish you and your brand as lowlife spammers. It is not worth the "attention" your email will get.

  • Of course the first question coming to everyone's mind is: "how do I get rid of these things?" Fortunately there is this open-source Greasemonkey/Tampermonkey/Violentmonkey userscript:

Gmail Subject Line Emoji Roach Motel

This userscript eliminates both HTML-image (thanks to awesome work of Alexander O'Mara) and pure-Unicode types.

For the latter type, the userscript includes a regular expression designed to capture the Unicode sequences likely to be abused by marketers. The regex looks like this in ES6 Javascript (the userscript translates this to widely-supported pre-ES6 regex using the amazing ES6 Regex Transpiler):

var re = /(\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F|[\u{2100}-\u{2BFF}\u{E000}-\u{F8FF}\u{1D000}-\u{1F5FF}\u{1F650}-\u{1FA6F}\u{F0000}-\u{FFFFF}\u{100000}-\u{10FFFF}])\s*/gu


// which includes the Unicode Emoji pattern from
//   https://github.com/tc39/proposal-regexp-unicode-property-escapes
// plus also these blocks frequently used for spammy emojis
// (see https://en.wikipedia.org/wiki/Unicode_block ):
//   U+2100..U+2BFF     Arrows, Dingbats, Box Drawing, ...
//   U+E000..U+F8FF     Private Use Area (gmail generates them for some emoji)
//   U+1D000..U+1F5FF   Musical Symbols, Playing Cards (sigh), Pictographs, ...
//   U+1F650..U+1FA6F   Ornamental Dingbats, Transport and Map symbols, ...
//   U+F0000..U+FFFFF   Supplementary Private Use Area-A
//   U+100000..U+10FFFF Supplementary Private Use Area-B
// plus any space AFTER the discovered emoji spam