Email series: Basics
About this series
I am planning to write a few posts describing email technology and caveats with a focus on receiving email from the perspective of a product development team.
Why focus on receiving email
Email is widely used for purposes such as marketing, promotions, newsletters, or kinda-reliable notification delivery. So, many articles exist already in the internet describing how to base products on this capability, and I don’t have much to add in that space.
Why develop products based on email
Because of their long history and usage of open protocols, emails remain one of the few (perhaps the only) open standard allowing free communication between users anywhere on the Internet. This means that every internet user has an email account, and that every technology stack and platform has support to deal with email messages. As such, emails remain a widely spread common denominator which can facilitate many situations.
However, ultimately emails are a poor substrate for serious implementations and it is also the intent of this series to demonstrate why.
Email basics
Emails as a document and their structure
From a technical point of view, an email is essentially a multiline string with an internal structure:
- Headers, such as
From
,Reply-To
To
,Cc
,Bcc
Subject
- Body, very commonly split into parts as in a message text part (commonly in both plain text and HTML) and attachment parts
Email examples
We could refer to this format as a raw email. From the examples at RFC 5322, this is the most basic raw email:
From: John Doe <jdoe@machine.example>
To: Mary Smith <mary@example.net>
Subject: Saying Hello
Date: Fri, 21 Nov 1997 09:55:06 -0600
Message-ID: <1234@local.machine.example>
This is a message just to say hello.
So, "Hello".
Email addresses
Notoriously difficult to validate, email addresses have a surprisingly deep feature set, including subaddressing and even comments (!!!). Regardless, for the most part everyone is familiar with the basic structure of user@domain
.
Fun fact! Email addresses didn’t always have this format - for example there is such a thing as a UUCP bang path address which doesn’t have @
at all but instead specifies an explicit routing path of nodes separated by !
.
1the first email ever received in Uruguay pic.twitter.com/05vQptqmCU
— Alvaro (@alvrod) December 8, 2020
A more realistic email
MIME-Version: 1.0
Date: Fri, 18 Dec 2020 10:10:10 +0100
Message-ID: <CABf2nMZJ-su9ntLF2ugzy=hPFR5+kuauwr9NyQ2q4R-KtA0EZg@mail.gmail.com>
Subject: hello
From: Alvaro Rodriguez <alvaro@alvrod.com>
To: test@alvrod.com
Content-Type: multipart/alternative; boundary="00000000000010b1f705b6c1e5a2"
--00000000000010b1f705b6c1e5a2
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Hi!
--=20
=C3=81lvaro Rodr=C3=ADguez
---
alvaro@alvrod.com
@alvrod <http://twitter.com/alvrod>
--00000000000010b1f705b6c1e5a2
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">Hi!<br clear=3D"all"><div><br>-- <br><div dir=3D"ltr" clas=
s=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><=
div><div dir=3D"ltr"><div>=C3=81lvaro Rodr=C3=ADguez<br>---<br><a href=3D"m=
ailto:alvaro@alvrod.com" target=3D"_blank">alvaro@alvrod.com.=
com</a><br><a href=3D"http://twitter.com/alvrod" target=3D"_blank">@alvrod<=
/a><br></div></div></div></div></div></div></div>
--00000000000010b1f705b6c1e5a2--
Email parts
There are some additional elements over there, and to finish our first post in the series let’s quickly unpack what is going on in this fuller example.
Content-Type
, in this casemultipart/alternative
meaning: multiple parts with a text/plain body and an alternative text/html body (giving the recipient the ability to choose which one to read, depending on device capabilities or personal choice). Usingmultipart
also allows to add attachments each with its own MIME type.Content-Transfer-Encoding
, describing how to use US-ASCII to encode content that is definitely not US-ASCII, typically used asbase64
for attached files orquoted-printable
for internationalized text or US-ASCII encoded HTML as in this example.Content-Disposition
to support options for rendering: show the contentinline
(for example for images) or as anattachment
where the user is expected to open or download it separately.
And lastly, about those funny looking lines like --00000000000010b1f705b6c1e5a2--
? As part of the MIME Content-Type
header, a “boundary” is provided to help the recipient parse the parts. Any string that is unique and not otherwise present in the body of the email could be used to indicate that a new part is starting. Each part may have its own Content-Type
and Content-Transfer-Encoding
headers.
Message-ID
Message identifiers can be generated by the email client or first server processing the email, and needs to be globally unique. To help with this they use a subset of the email address format, so that each host may use its own scheme to identify messages.
These identifiers can be used to connect emails together in different ways such as using the In-Reply-To
, References
or Resent-Message-ID
headers.