Our email contains years of important personal information: key contacts, versions of documents, discussions around important projects or deals. It's a datasource that too often ignored by developers and for those brave ones who don't, they're in for a bumpy ride dealing with the tedious details of arcane protocols.
The presentation will be about the potential use cases for email data, the varies ways to access it, the common pitfalls and different tools targeted at this.
Unleash Your Potential - Namagunga Girls Coding Club
Email as a datasource for applications
1. Email as a datasource for apps
Bruno Morency
bruno@context.io
@brunomorency
2. • Overview of the technologies
that make email
What this • How your apps can fit in that
picture
presentation
will be about • An intro to IMAP and message
bodies with common pitfalls.
• Overview of Context.IO
10. • Message transport, nothing to
do with content
• Defines the envelope (sender
and recipients)
• Does not define the message
headers
• Chain from client to recipient’s
server
11. DKIM Standards for sender
signatures and prevent
SPF sender spoofing
12.
13. • Complement spam filters
• Opens the message and
checks headers to decide if it
will deliver it to the inbox
• As a receiver, it’s one more
way to block spam.
• As a sender, it’s a tool you
must master to avoid ending
up in the spam folder
• Email deliverability is an
industry by itself
14. Protocol to allow a
IMAP client to access and
manipulate emails on a
receiving server.
15.
16. • All messages and their folder
organization are on the server
• Clients poll to know about with
new messages that arrive or
actions made through other
clients
• While it doesn’t send
messages, clients usually
store sent messages through it
17. Protocol to allow a
POP client to retrieve emails
from a receiving server.
18.
19. • The server only serves as a
temporary buffer for received
messages
• Classification and message
state is purely a client-side
concept
• Many clients can access the
same account but can’t
coordinate anything
20. RFC-822 Standards defining
MIME headers and the actual
body of the message
Multipart
31. Me: “App Developer, meet IMAP. IMAP,
meet App Developer.”
IMAP: “I don’t give a sh*t about you, App
Developer. Go away!”
32. 1. Connect to the IMAP server and authenticate
>"openssl"s_client"-crlf"-connect"imap.gmail.com:993
["a"few"lines"of"SSL"and"server"info"]
*"OK"Gimap"ready"for"requests"from"123.14.12.20"zw8i38638oab.180
a001"LOGIN"username"password
*"CAPABILITY"IMAP4rev1"UNSELECT"IDLE"NAMESPACE"QUOTA"ID"XLIST"CHILDREN"X-
GM-EXT-1"UIDPLUS"COMPRESS=DEFLATE
a001"OK"username"authenticated"(Success)
36. 4. FLAG a message as read
a015"STORE"81"+FLAGS"(Seen)
*"81"FETCH"(FLAGS"())
a015"OK"Success
37. 4. CLOSE the mailbox and LOGOUT the account
a023"CLOSE
a023"OK"Returned"to"authenticated"state."(Success)
a024"LOGOUT
*"BYE"LOGOUT"Requested
a024"OK"LOGOUT"completed."(Success)
39. • There is no persistent primary
key you can rely on to retrieve a
Pitfall #1: message
Identifying
• Message Sequence Number
messages
• Unique Identifier
40. • Ascending and contiguous
sequence. If the mailbox says
Sequence 11 exist, you can fetch
messages with seq. nb. 1 to 11
Number
• They can (and will) be
reassigned during a session.
41. • 32-bit value uniquely identifying
a message within a mailbox.
• Ascending but not necessarily
Unique incremental nor contiguous.
Identifier • If you move a message to
(aka UID) another mailbox, it will get a
new UID in that new mailbox
• Changes if the mailbox
UIDVALIDITY changes
42. • Only the INBOX mailbox has a
special meaning.
Pitfall #2: • Everything else has the
Special-use meaning the client wants it to
have (which may not be in
folders (or English)
lack thereof) • Gmail has XLIST which add
mailbox attributes (Inbox, Sent,
Starred, ...)
43. Pitfall #3: • Anything that searches or
fetches messages is done
No data until within the context of a mailbox
you select a • Can’t get account-wide list of
mailbox messages
44. • It's an extension that isn't widely
Pitfall #4: available and even then,
restricted to a single mailbox
Threads
• X-GM-THREAD-ID to the rescue
45. • You need to get and parse the
body structure
Pitfall #5:
Attachment? • As far as IMAP is concerned, an
attachment is the same thing as
any other MIME part
46. • Setting the Deleted flag marks
the message for deletion but it’s
Pitfall #6: still there
Deleting • EXPUNGE will remove all
messages messages with Deleted flag
from the currently selected
mailbox
47. • Purging client side message list
is a PITA.
Pitfall #7: • Server won't tell you which
Keeping up messages were deleted, you
just have to figure out some
with deleted have been and find which one
messages were.
• It's the same if you want to keep
track of Seen flag.
48.
49. The joys of parsing email messages
Yé! I fetched a message! Now what do I do?
51. A message with an attachment
MIME-Version:"1.0
Content-Type:"multipart/mixed;"boundary=_MYBOUNDARY_
--_MYBOUNDARY_
Content-Type:"text/plain
This"is"the"body"of"the"message.
--_MYBOUNDARY_
Content-Type:"image/jpeg;"name="IMG_713.jpg"
Content-Disposition:"attachment;"filename="IMG_713.jpg";"size=6379099;
Content-Transfer-Encoding:"base64
/9j/4AAQSkZJRgABAgAAZABkAAD/7AARRHVja3kAAQAEAAAAZA+4AJkFkb2JlAGTAAAAAAQMA
AwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMD8IAEQgAegG1AwERAI
RAQMRAfEASMAAQACAwEBAQEBAAAAAAAAAAAHCAUGCQQDAgEKAQEAAgIDAQAAAAAAAAAAAAAAB
gcFCAEDBAIQAAEEAgEBBgQGAQUAAAAAAAUCAwQGAQcAEjBQERMUFRBgFhcgQHAhNAhBMSIjJD
URAAIC==
--_MYBOUNDARY_--
52. A message with alternative parts
MIME-Version:"1.0
Content-Type:"multipart/alternative;"boundary=_MYBOUNDARY_
--_MYBOUNDARY_
Content-Type:"text/plain;"charset="us-ascii"
Content-Transfer-Encoding:"quoted-printable
Hello!"Here’s"a"message"with"*rich*"text
--_MYBOUNDARY_
Content-Type:"text/html;"charset="us-ascii"
Content-Transfer-Encoding:"quoted-printable
<html><body>Hello!"Here’s"a"message"with"<b>rich</b>"text</body></html>
--_MYBOUNDARY_--
53. Pitfall #1: • Great to track messages but
spec says it's optional.
Message-ID
is optional ... and it’s not always there.
54. • Refers to Message-ID of other
emails
Pitfall #2:
In-Reply-To • Very useful to rebuild threads
References ... until an Outlook user jumps in and
replaces it with their own Thread.
Topic and Thread.Index headers
55. Pitfall #3: • Content-Disposition tells you
Attachments attachment or inline. Should
signature image be considered
are what you as a file attachment?
decide them
• TNEF attachments
to be