Email series: Basics
About this series
I am planning to write a few posts describing email technology and caveats with a focus on receiving email from the perspective of a product development team.
Why focus on receiving email
Email is widely used for purposes such as marketing, promotions, newsletters, or kinda-reliable notification delivery. So, many articles exist already in the internet describing how to base products on this capability, and I don’t have much to add in that space.
Why develop products based on email
Because of their long history and usage of open protocols, emails remain one of the few (perhaps the only) open standard allowing free communication between users anywhere on the Internet. This means that every internet user has an email account, and that every technology stack and platform has support to deal with email messages. As such, emails remain a widely spread common denominator which can facilitate many situations.
However, ultimately emails are a poor substrate for serious implementations and it is also the intent of this series to demonstrate why.
Emails as a document and their structure
From a technical point of view, an email is essentially a multiline string with an internal structure:
- Headers, such as
- Body, very commonly split into parts as in a message text part (commonly in both plain text and HTML) and attachment parts
We could refer to this format as a raw email. From the examples at RFC 5322, this is the most basic raw email:
From: John Doe <firstname.lastname@example.org> To: Mary Smith <email@example.com> Subject: Saying Hello Date: Fri, 21 Nov 1997 09:55:06 -0600 Message-ID: <firstname.lastname@example.org> This is a message just to say hello. So, "Hello".
Notoriously difficult to validate, email addresses have a surprisingly deep feature set, including subaddressing and even comments (!!!). Regardless, for the most part everyone is familiar with the basic structure of
Fun fact! Email addresses didn’t always have this format - for example there is such a thing as a UUCP bang path address which doesn’t have
@ at all but instead specifies an explicit routing path of nodes separated by
the first email ever received in Uruguay pic.twitter.com/05vQptqmCU— Alvaro (@alvrod) December 8, 2020
A more realistic email
MIME-Version: 1.0 Date: Fri, 18 Dec 2020 10:10:10 +0100 Message-ID: <CABf2nMZJ-su9ntLF2ugzy=hPFR5+kuauwr9NyQ2q4R-KtA0EZg@mail.gmail.com> Subject: hello From: Alvaro Rodriguez <email@example.com> To: firstname.lastname@example.org Content-Type: multipart/alternative; boundary="00000000000010b1f705b6c1e5a2" --00000000000010b1f705b6c1e5a2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi! --=20 =C3=81lvaro Rodr=C3=ADguez --- email@example.com @alvrod <http://twitter.com/alvrod> --00000000000010b1f705b6c1e5a2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Hi!<br clear=3D"all"><div><br>-- <br><div dir=3D"ltr" clas= s=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><= div><div dir=3D"ltr"><div>=C3=81lvaro Rodr=C3=ADguez<br>---<br><a href=3D"m= ailto:firstname.lastname@example.org" target=3D"_blank">email@example.com.= com</a><br><a href=3D"http://twitter.com/alvrod" target=3D"_blank">@alvrod<= /a><br></div></div></div></div></div></div></div> --00000000000010b1f705b6c1e5a2--
There are some additional elements over there, and to finish our first post in the series let’s quickly unpack what is going on in this fuller example.
Content-Type, in this case
multipart/alternativemeaning: multiple parts with a text/plain body and an alternative text/html body (giving the recipient the ability to choose which one to read, depending on device capabilities or personal choice). Using
multipartalso allows to add attachments each with its own MIME type.
Content-Transfer-Encoding, describing how to use US-ASCII to encode content that is definitely not US-ASCII, typically used as
base64for attached files or
quoted-printablefor internationalized text or US-ASCII encoded HTML as in this example.
Content-Dispositionto support options for rendering: show the content
inline(for example for images) or as an
attachmentwhere the user is expected to open or download it separately.
And lastly, about those funny looking lines like
--00000000000010b1f705b6c1e5a2--? As part of the MIME
Content-Type header, a “boundary” is provided to help the recipient parse the parts. Any string that is unique and not otherwise present in the body of the email could be used to indicate that a new part is starting. Each part may have its own
Message identifiers can be generated by the email client or first server processing the email, and needs to be globally unique. To help with this they use a subset of the email address format, so that each host may use its own scheme to identify messages.
These identifiers can be used to connect emails together in different ways such as using the
From La increible historia del Instituto de Computacion en 24 emails ↩︎