Autogenerating a Book Series From Three Years of iMessages

I am frequently annoyed at the things that I can’t remember. And when I’m trying to remember the details of something, I often turn to my text messages—thanks to big improvements recently, it is now quite fast to search my whole iMessage history on my phone, provided that I can remember some verbatim part of the message I’m looking for. And often, once I’m in the past, I want to look around: text messages from ages ago provide surprisingly interesting insights into the past.

But iMessage isn’t set up well for this casual browsing: when you try to scroll away from a search result, the loading is very slow. And the interface provides no way to jump to a specific date. I’d really like to be able to “flip through” my messages and stop at a random place for a view into that moment in time. Apple doesn’t provide a way to do that, so, I thought, why not enable it myself? I though it’d be great to enable this “flipping through messages” in the most literal way possible: by creating a physical book of my biggest conversation.

Is it possible?

In order to do anything at all with the messages, I needed to get them out of my phone and onto my computer. I’d looked many times for a way to do this with Signal, so wasn’t sure what I’d find, but was pleased that it seemed relatively straightforward to pull messages off an iPhone (even easier if your messages are already on a Mac). According to the very helpful iPhone wiki, all I had to do was grab sms.db from a backup of my phone, and I’d have a SQLite database that I could do whatever I liked with.

Querying my texts with SQL

This simplicity seemed a bit too good to be true—for some reason I expected some proprietary format that would be a pain to reverse-engineer. So I had to see it for myself. I took a standard backup on my Mac in finder (that was a trip—the “plugged-in iPhone” UI has barely changed since I used iTunes to sync music to my iPod touch in seventh grade). While the backup format is really not complicated, it was intimidating browsing the backup folder at first because an ls in the root directory yields a bunch of directories named after a single hex byte:

/.../00008120-001854410CEB401E >>> ls
00  0e	1c  2a	38  46	54  62	70  7e	8c  9a	a8  b6	c4  d2	e0  ee	fc
01  0f	1d  2b	39  47	55  63	71  7f	8d  9b	a9  b7	c5  d3	e1  ef	fd
02  10	1e  2c	3a  48	56  64	72  80	8e  9c	aa  b8	c6  d4	e2  f0	fe
03  11	1f  2d	3b  49	57  65	73  81	8f  9d	ab  b9	c7  d5	e3  f1	ff
04  12	20  2e	3c  4a	58  66	74  82	90  9e	ac  ba	c8  d6	e4  f2	Info.plist
05  13	21  2f	3d  4b	59  67	75  83	91  9f	ad  bb	c9  d7	e5  f3	Manifest.db
06  14	22  30	3e  4c	5a  68	76  84	92  a0	ae  bc	ca  d8	e6  f4	Manifest.db-shm
07  15	23  31	3f  4d	5b  69	77  85	93  a1	af  bd	cb  d9	e7  f5	Manifest.db-wal
08  16	24  32	40  4e	5c  6a	78  86	94  a2	b0  be	cc  da	e8  f6	Manifest.plist
09  17	25  33	41  4f	5d  6b	79  87	95  a3	b1  bf	cd  db	e9  f7	Status.plist
0a  18	26  34	42  50	5e  6c	7a  88	96  a4	b2  c0	ce  dc	ea  f8
0b  19	27  35	43  51	5f  6d	7b  89	97  a5	b3  c1	cf  dd	eb  f9
0c  1a	28  36	44  52	60  6e	7c  8a	98  a6	b4  c2	d0  de	ec  fa
0d  1b	29  37	45  53	61  6f	7d  8b	99  a7	b5  c3	d1  df	ed  fb

Entering one of these directories yields a bunch of files starting with the hex byte after which the directory was named:

/.../00008120-001854410CEB401E >>> cd 3d
/.../00008120-001854410CEB401E/3d >>> ls
3d0292d3fe90e1e22c247403c0e9105ea0f9ff44      3d8830b71e98aae80b6eaf8bdd5500d79ce74946
3d02fe309afa7de839822d6f1b8433aa90090d17      3d88cdc16ff2b5231e5ea4b52271ee195a6f4b96
3d072c4fca5db4a5678fa10b137435f757e98492      3d8a425d70f4049417e855d273c44d8199de30c9
3d0739c90579fa907246d5c21bd8d8ebaa2d9d6b      3d8a43a1921f504bb4393250f75b24bfc2c5cedb
3d0798b3cc4d2f5ad347ffb8bc5a0f9d8c82cfb9      3d8a7c0460aadabf1b7fc9adea9e6a2a6e7bc73b
3d07a0adc5c5c22dc525ccd3a93fb05a50ef1ac5      3d8b6ad12c7617b3d783790a457b0aa19b193b68
3d0880f091c51ddc145e17c78d8e6f9a3e7e20c8      3d8b82abe05a9d697102d8b665c9d499e07492ea
3d093e92cf03abf3650411e09a647630a1e0c478      3d8ba897240ad32580bf8dfd00db8f181658cdfd
3d095e908ff898be3b3ffd64a75db959a58ac70a      3d8bc227d67ec4944df8e75291102367034d7214
3d09d5dcd5a9bdad67a80cd83201a9e1fb75aada      3d8c722f1d92f7cd6f90c936c14f60f51aad128b
3d0abb83123be82abf43ce20118e72fea06023c5      3d8ca6eeabeb1c01fae05bb20f08dedf734cfd04
3d0b246304c42d2ab1eb1892d629fcdfde689cb7      3d8d0c6b1bf7946c6bef91d60cccb32207b7bc01
3d0bb5f49e6f0e31348ef8feb9a38d4ce71f5ec7      3d8fd2fbcaf3079a683a8e486ecde8875f0a591d
3d0c1283936c45fec533a507b78558b5aa3159fa      3d8ff93bd94b3ea14edc77d1e677cf4ee4306e4e
3d0cb8e28462780bb9af1440e297ecd8224c70ff      3d90ea8bfbf62feda080cd0ccbd12fa5c8673993
3d0ce10de5f69606c52882215b99ebab259dc194      3d932638fe8ed669725b7a143c6a8b02b8959923
3d0d7e5fb2ce288813306e4d4636395e047a3d28      3d93c92679aa9d398331e27fdeed64b5094e68d1
...

Looking at these with a nice file explorer that looks at magic bytes to determine filetypes (I use Thunar) helps make some sense of it, since it can show that these cryptic names really are just regular old images and other files. But really even that is unnecessary since the iPhone Wiki told us that the filename for the sms.db file that we’re looking for is 3d0d7e5fb2ce288813306e4d4636395e047a3d28. Copying this to my home directory:

$ cp 3d0d7e5fb2ce288813306e4d4636395e047a3d28 ~/imessage.db

And opening it up with the sqlite3 CLI we can actually see some tables!

~ >>> sqlite3 imessages.db
SQLite version 3.44.2 2023-11-24 11:41:44
Enter ".help" for usage hints.
sqlite> .tables
_SqliteDatabaseProperties              message
attachment                             message_attachment_join
chat                                   message_processing_task
chat_handle_join                       recoverable_message_part
chat_message_join                      sync_deleted_attachments
chat_recoverable_message_join          sync_deleted_chats
deleted_messages                       sync_deleted_messages
handle                                 unsynced_removed_recoverable_messages
kvtable
sqlite>

The schema requires a couple of joins to extract an actual conversation, but without too much trouble we can start to pull out messages (in this case from CVS spamming me):

sqlite> select
        message.ROWID, message.date, message.text, message.is_from_me from message
        inner join chat_message_join on message_id=message.ROWID
        inner join chat on chat.ROWID=chat_message_join.chat_id
        where chat.chat_identifier='28732'
        order by date asc;
278125|694030292385607040||0
278327|694647875648848000||0

...

314056|726702453329793024||0
314412|727316171079934976|CVS ExtraCare: 20% off one full-price item, just because. Tap the link to send to card: c.cvs.com/B0kjBMbNM|0

We got one, but a lot of blank ones too—many of the messages are missing! It turns out that for some messages, message data is stored in an encoded NSMutableAttributedString binary blob in the message.attributedData column instead of in message.text. With a bit of wrangling to get the binary data out of the SQLite CLI, we can look at one of these missing messages and see that the data is indeed there:

~ >>> sqlite3 imessages.db "select hex(attributedBody) from message where ROWID=278125;"   \
| cut -d\' -f2   \
| xxd -r -p      \
| xxd -g1
00000000: 04 0b 73 74 72 65 61 6d 74 79 70 65 64 81 e8 03  ..streamtyped...
00000010: 84 01 40 84 84 84 19 4e 53 4d 75 74 61 62 6c 65  [email protected]
00000020: 41 74 74 72 69 62 75 74 65 64 53 74 72 69 6e 67  AttributedString
00000030: 00 84 84 12 4e 53 41 74 74 72 69 62 75 74 65 64  ....NSAttributed
00000040: 53 74 72 69 6e 67 00 84 84 08 4e 53 4f 62 6a 65  String....NSObje
00000050: 63 74 00 85 92 84 84 84 0f 4e 53 4d 75 74 61 62  ct.......NSMutab
00000060: 6c 65 53 74 72 69 6e 67 01 84 84 08 4e 53 53 74  leString....NSSt
00000070: 72 69 6e 67 01 95 84 01 2b 81 f3 00 43 56 53 20  ring....+...CVS
00000080: 45 78 74 72 61 43 61 72 65 3a 20 24 32 20 6f 66  ExtraCare: $2 of
00000090: 66 20 79 6f 75 72 20 70 75 72 63 68 61 73 65 2c  f your purchase,
000000a0: 20 6a 75 73 74 20 66 6f 72 20 79 6f 75 21 20 49   just for you! I
000000b0: 6e 20 73 74 6f 72 65 20 6f 72 20 6f 6e 6c 69 6e  n store or onlin
000000c0: 65 2e 20 54 61 70 20 74 68 65 20 6c 69 6e 6b 20  e. Tap the link
000000d0: 74 6f 20 73 65 6e 64 20 64 65 61 6c 20 74 6f 20  to send deal to
000000e0: 63 61 72 64 3a 20 63 2e 63 76 73 2e 63 6f 6d 2f  card: c.cvs.com/

Luckily, we don’t need to implement the parsing for this binary format ourselves. There’s a great imessage-database crate that does exactly this: ingests an iMessage database and outputs the data in nice Rust data structures. Out of the box, it comes with a binary (imessage-exporter) to generate text or HTML versions of your conversations—so really quite similar to my goal.

With just a couple of tweaks to the SQL statement the library uses to fetch messages, I’m able to narrow down the query to just a single conversation. But for this project I want to make a nicely formatted physical book that I can hold in my hand and flip through—the HTML and text formats that the project ships with won’t quite work for this.

Generating LaTeX

I am a huge fan of LaTeX due to the beautiful documents it can be convinced to produce, and since leaving school have been itching to generate some more pretty PDFs. And since LaTeX’s text-based source code makes it perfect for templating and autogeneration, it seems like a great choice. I’ll my book by spitting out LaTeX code for every text message in the conversation.

Thanks to the imessage-database library it’s pretty easy to iterate through all the messages in the conversation, so I start by generating LaTeX code for each message. My first approach at this LaTeX generation is quite simple: align left if the message is from me and right otherwise, insert some text indicating an attachment where images are sent, and skip things like reactions and replies that I don’t want to bother rendering. This initial approach works well, and after splitting the text up into chapters based on date and bit of visual tweaking, I’m satisfied.

But there’s one major problem: LaTeX doesn’t support unicode. Of course, this means that as soon as I extend the rendering window enough to include an emoji, the LaTeX compiler explodes. Simply stripping out emojis from the source text works, but is hardly a tolerable solution—after all, emojis are integral to modern communication.

After a bit of research, it looks like XeLaTeX is the key: it adds support for unicode fonts to LaTeX. Switching to XeLaTeX proves quite straightforward, and by defining a \emojifont to an emoji font and wrapping every emoji in {\emojifont X} in my generated LaTeX source, the output renders successfully with emojis inline. But I don’t want to pay for every page of my book to be printed in color when I print it. Luckily, Google’s Noto Emoji font has a great set of simple black-and-white emojis that are perfect for this purpose. I’m quite happy with the way these emojis look in print:

Three messages including an array of black-and-white emojis printed on a white page.

After a couple extra niceties like a header that tracks the current date (with a LaTeX command that sets \markright with every message), I’m ready to put it all together.

When I finally compile all three years of messages that I want to be able to flip through, I’m surprised to find that the compiler dumps out well over a thousand pages of messages when I put them into a standard 6" x 9" page size. Since it’s exactly three years of messages anyway, though, there’s an easy solution: I split the opus into three volumes to get the size of each one down to something printable.

Ordering

When I decided to try to do this, I really wanted to end up with a physical book in my hand. So I had to figure out how to get these books printed. And to my surprise, printing a paperback book is quite cheap. After reviewing a bunch of options, Barnes and Noble Press seems like the best option. It’s decently more expensive than some of the other options like Lulu and Amazon KDP, but most options are targeted at people that are trying to sell their books. B&N Press is too, but their story for personal books seems better than the others as you don’t need to “publish” your book to get it printed. And the price is still quite reasonable: I was able to print all three volumes, around 1300 pages total, for $30 including shipping.

Before I can order books from my LaTeX-generated PDFs, the website tells me that the last step is to create covers. Upon uploading the body pages to B&N Press, the sites generates the dimensions required for the cover. Given these, I threw together a cover for each of the three volumes in Inkscape, which the website accepted without complaint.

The B&N press website is not perfect: it generally is very slow, and while trying to place my order the checkout page was broken and wouldn’t show up for over 24 hours. But after that was fixed, ordering worked.

And sure enough, after a couple weeks’ wait, I had three actual books in hand. I flip through them regularly, and it is so much easier to revisit old conversations this way than trying to do so on my phone.

Create your own

The source code is in rough shape, and I haven’t packaged it as a cargo binary, but there’s not much of it. If you want to take a look or try for yourself, it’s available at https://github.com/bkettle/message-book.

A stack of three paperback books, with pictures of a couple on the cover.