TL;DR: This post looks at Apple CloudKit within the context of Apple Notes to help you understand how Apple stores data in iCloud and why you may be surprised to find CloudKit data shoved into a random database field.
Background
I recently ran into an odd bug in Apple Cloud Notes Parser. As I was testing what information appeared when users shared notes back and forth, decrypting notes resulted in a crash every time. The odd thing was I really hadn’t changed my test corpus much, just moved existing notes into albums shared via iCloud. This stack trace began a long trip down an interesting rabbit hole as it indicated we were trying to decrypt something using an initialization vector which was nil.
The Hunt Begins
Because we are failing when we try to rebuild the table in this encrypted note, my first stop was looking at the end of the debug_log.txt file.
From the log file, we can see the last note I was trying to read was note #84 and it specifically died trying to deal with the table kept in ZICCLOUDSYNCINGOBJECT.ZIDENTIFIER='C93C5BE6-9212-46F6-939F-331844B9FEA0'. The next step is to check on the encryption variables we are expecting from that row in NoteStore.sqlite.
iv
tag
salt
iterations
wrapped_key
NULL
NULL
C15DFD1B4D95A86EA3A16334528F619A
20000
7EDB7DD3B1531D9E34E4A34F9D86D77AF92170BD96894A87
Sure enough, the initialization vector is null, and what’s odd about that result is you can’t decrypt the object without knowing the iv as a starting point. I had run into a similar issue when dealing with other embedded objects as they sometimes had values in other columns I had not expected. In this case, however, I was certain that I was accurately pulling the values from the right columns to decrypt a table because other tables were decrypting just fine. I also was certain that the values had to exist, because I was looking at the table on my iPhone in all its glory, so somehow Apple Notes knew what the iv was.
Forza Bruta
At this point, I fell back to my favorite solution, brute force. I had already looked at all of the columns which were related to either of these values, since there is fairly consistent naming between the columns that hold the cryptographic variables, and they were all null. That meant that either I was looking in completely the wrong place, or it was embedded within something inside a different blob on that row. This was somewhat the situation I created SQLite Miner for and I ran it against the example NoteStore.sqlite to see what might turn up1.
While there are a few false hits on protobufs, there are some interesting binary plists in columns I had not investigated yet. Specifically, the ZSERVERSHAREDATA, ZUNAPPLIEDENCRYPTEDRECORD, ZUSERSPECIFICSERVERRECORDDATA, and ZSERVERRECORDDATA columns seemed interesting as they all had binary plists and names that seemed interesting. Since the table we were looking at before has a ZICCLOUDSYNCINGOBJECT.Z_PK of 82, we can check if any of the exports from those columns potentially held our missing data.
Digging In
That’s interesting! We have two blobs that were binary plists on our troublesome record, but what do they hold? It turns out, a significant amount of information. Here is what the first one looks like:
While this doesn’t look very nice, you can already see some interesting values, such as “Mark’s iPhone” and the ZIDENTIFIER of the table we were looking into, C93C5BE6-9212-46F6-939F-331844B9FEA0. This led to the first hurdle, understanding and parsing NSKeyedArchives.
The First Hurdle: NSKeyedArchives
Because I do not write code for any Apple platform, I’m often playing from behind slightly on recognizing obvious answers. In this case, I’m sure someone who regularly writes for the iOS platform is slapping their head saying “That’s easy!” but it took me a minute. With the keyword “NSKeyedArchiver” near the top of the file, I googled that and came across the Apple Developer Documentation for the structure as the first result. A bit further in the results I also came across a great entry from mac4n6 in 2016 (showing how late I am to this game). While that article gave me enough knowledge to know what I wanted to do, it revolved around using XCode or plutil to view or convert files, not actual interaction with the object itself.
Don’t Reinvent The Wheel
Apple Cloud Notes Parser is written in Ruby, so I was debating either writing classes in Ruby to include in this codebase, but first decided to look for an existing gem. As it turns out, keyed_archive2 existed as a gem, but had not been updated in 7 years. During that time, the underlying formats had been updated and the gem did not fare well when trying to parse this example. It also did not allow for loading data directly from a variable, only from disk, and I did not want to have to save all the NSKeyedArchives on disk as temporary files. Thankfully, in open source every problem is just a pull request away from a solution, so I did just that and ended up a new maintainer of the code on Github and Ruby Gems.
Once keyed_archive supported current formats, I added a method to unpack the archive and spit out the pairing of keys to values, KeyedArchive#unpacked_top(). With that and a helper script I was able to get a much better look into the structure.
With clearer output, we can see that the ZIDENTIFIER we were looking for is in the RecordID section and “Mark’s iPhone” is identified as ModifiedByDevice. There are other potentially interesting keys in here, such as ParentReference, which includes a UUID that happens to be the ZIDENTIFIER for the actual note this table was in: 7D35FAE7-E05A-49D9-B05A-B372E211FEC6.
Getting to the Bottom of it
Now that we have a good way of viewing the NSKeyedArchive, we can look at the other potentially interesting blob, ZUNAPPLIEDENCRYPTEDRECORD.
We see a lot more of the ZICCLOUDSYNCINGOBJECTS values in here, including our encryption variables! Specifically, as you look at the ValueStore section, we can find these values under RecordValues:
With these values, we can now decrypt the mergeable data, which is also found in the same section, in the EncryptedValues object. Once decrypted, this is a JSON object which contains the encrypted mergeable data that represents the table:
I need to do more testing to know why values would be stored in the actual columns, vice the ZUNAPPLIEDENCRYPTEDRECORD column. However, once I accounted for this possiblity and checked this column for data, Apple Cloud Notes Parser stopped erroring out. I had fixed my original problem but still had one significant issue: I had no idea what this was.
The Second Hurdle: “CK”
The abbreviation “CK” appeared a lot of times in these NSKeyedArchives, such as CKRecordZoneID, CKRecordID, and CKReference. Googling for any of these quickly brings you to the Apple Developer Documentation for CloudKit (“CK”). CloudKit is the mechanism to store data in iCloud for any application that wants to take advantage of that backend. Because Apple Notes has allowed users to share data since iOS 9, anything that can go into a shared note must be able to take advantage of CloudKit for that purpose. This includes notes, folders, and attachments.
CloudKit Records
The heart of CloudKit is an individual record, represented by the CKRecord class. A CKRecord is a dictionary of key-value pairs, which is what we are seeing in the ZUNAPPLIEDENCRYPTEDRECORD example, hence the term “RECORD”. Apple suggests creating a record type for each different type of information you need. These objects support a variety of field types and at a minimum, the initializer needs a RecordType and a RecordID, which we can see in our example:
Here we see the RecordType is a simple String containing “Attachment”. This makes sense because the example comes from an embedded object. The RecordID is more complex and appears to be another dictionary, telling us the class of the object is a CKRecordID with a RecordName corresponding to our ZIDENTIFIER of C93C5BE6-9212-46F6-939F-331844B9FEA0. We also can see a ZoneID of type CKRecordZoneID.
CloudKit Zones
In CloudKit, CKRecordZones are how the specific application can organize its records. In our example, the ZoneName is “Notes” and the ownerName is the String “__defaultOwner__” indicating it is the currently logged in iCloud user. That means we are looking at record C93C5BE6-9212-46F6-939F-331844B9FEA0 inside of the currently logged in user’s “Notes” zone. It is important to note that in all of this, we are not getting above the Zone level, however the overall CloudKit architecture also has the concept of Containers and Databases above this3. In other words, there might be another app that could have the same RecordID and ZoneID for the same user, but in a different database and application.
CloudKit Sharing Data
Since the CloudKit record has to remember which user the note actually belongs to, does that mean we can get more data from iCloud about users who share notes? This question (sans the CloudKit-specific aspect) was what I was testing when the table originally broke, so let’s see what is in a shared note’s record. If you will permit some handwavy magic, I already know from my dataset that the folder which was created by another account to test sharing is ZICCLOUDSYNCINGOBJECT.Z_PK=138 in my database4. This query will pull out pertinent details for it:
UUID
Owner
Zone Owner
List As
Folder Name
57B3EC94-2846-4E16-8E17-D8B1E8B8B5EA
9
_b291456806da1837211e8a60c9abe865b
1_iCloud
shared_testing
In the output, we see some interesting tidbits about the folder name shared_testing. To start with, the “Owner” of the folder is listed as 9, which when queried as a Z_PK in the same table corresponds to my iCloud account. Similarly, the “List As” is displayed how my iCloud account is: 1_iCloud. That would ordinarily give me the belief that the user of this Notes database created this folder and is responsible for the contents therein.
However, the “Zone Owner” gives the iCloud ID as recorded for Notes who actually created the folder, in this case _b291456806da1837211e8a60c9abe865b. This is the actual folder creator and because iCloud sharing rules are strict, you know for sure that the user intentionally shared this folder with the owner of this Notes database and the sharing request was accepted. That seems meaningful if there was something incriminating in that folder, but somewhat useless as _b291456806da1837211e8a60c9abe865b is fairly anonymous at this point.
ZSERVERRECORDDATA
Digging into the ZSERVERRECORDDATA object shows not much more information than we find within the columns above, save the fact that the “ModifiedByDevice” field contains the hostname of the device used by _b291456806da1837211e8a60c9abe865b.
ZSERVERSHAREDDATA
Digging into the ZSERVERSHAREDATA object is much different, however, because these share objects are how CloudKit tracks who can see what. Within them is information about the users involved (and a lot more, I had to rip this down a ton to make it legible):
Even after snipping the output down considerably, there is a lot there of interest. To begin with, every time you see “potentially interesting binary”, there was binary data I was removing since I wanted to sanitize the data. I am not yet completely sure of what is present in them, but figure I eventually have to just cut this research, ship what I have, and dive in later. Beyond that, you can see that personal information is being presented about the users who have access.
CloudKit Account Shared Account Information
For example, all of the data I have filled in on my iCloud account and the other test account can be found in the UserIdentity sections:
So now, instead of just knowing that _b291456806da1837211e8a60c9abe865b created a folder and shared it with our user, we know the name being used and the email account tied to their iCloud account. Potentially, we also have a phone number to correlate with other databases in the forensic image. Finally, based on the ZSERVERRECORDDATA contents, we know the hostname of at least a device used by that other user5.
Much Improved Analysis
Putting that all together, I would be able to say completely confidently that the iCloud user identified by forza_bruta@ciofecaforensics.com created the folder called shared_testing and shared that folder with the iCloud user identified by notta_cuppa@ciofecaforensics.com, who explicitly accepted the sharing request. Further, the iCloud user identified by forza_bruta@ciofecaforensics.com asserted that their name was Forza Bruta to iCloud and would be identified throughout this database by _b291456806da1837211e8a60c9abe865b. The iCloud user identified by forza_bruta@ciofecaforensics.com likely had a device named Forza's iPhone 12 which probably modified this shared folder (note the weasel wording here, I’m not yet confident I understand all of this).
Apple Cloud Notes Parser Changes
This discovery led to a surprising amount of changes to the Apple Cloud Notes Parser. This will sound dumb, since “cloud” is in the name, but I didn’t realize how pervasive these changes woule be when I started in on them a month ago. Because all of these objects can be added to a Note saved to iCloud, everything has to support it. To fix this, I created a class called AppleCloudKitRecord which most of the other classes are children of. It handles the CloudKit data parsing for all of the other objects uniformly and most of the CSV and HTML output tries to bring the pertinent information forward.
Obviously, now that we know where the variables are hidden, I’ve also fixed decryption for objects with ZUNAPPLIEDENCRYPTEDRECORD data. Bigger picture, we’ve also started bringing keyed_archive up to date, which was critical to making any of the above happen quickly.
But Who Cares?
This has been a long journey and it would be pretty easy to say “Criminals likely won’t share notes” and walk away from this topic. I strongly suspect a greater appreciation for CloudKit will make things far more apparent, since almost any iOS application can use CloudKit. In this case Notes just ended up being the vehicle to start the research.
For example, I snagged one of Josh Hickman’s test images, and did just a simple grep for CKRecordZone, which I thought wouldn’t have too many false positives other than things which were actually CloudKit related. Unsurprisingly, there are a lot of Protected Cloud Storage (PCS) files, however there also are other files I wouldn’t have expected, such as Safari’s bookmark syncing and Apple’s HomeKit stuff.
Conclusion
Never, ever, let yourself close “just one more bug report” after 10:00 PM, it will inevitably lead to a rabbit hole which chews up a month of time and leads to many more rabbit holes to dig down. In the case of CloudKit, trying to figure out why the encryption variables I was expecting were buried away in another column led to my having to add a whole bunch of extra checks to Apple Cloud Notes Parser, start handling CloudKit data, and realize there might be a lot more data sitting around than we realize. It is certainly worth considering if an application in question is supported by CloudKit and, if so, digging into what data is available as a result.
Finally, this tool and research is entirely supported by my free time. This update, for example was probably around 30-40 hours of work between the research, dumping, diving, discovering, developing, and documenting. If it is meaningful to you, please consider supporting it through GitHub sponsors or Ko-Fi. I have plans for this, but need support to reach them, especially if I build more of a CloudKit lab.
Footnotes
Some of the output from these commands has been edited slightly for readability on the page, in particular to ensure blocks don’t need a scrollbar. ↩
The Github repo is here and the Ruby gem is hosted here. ↩
I usually don’t like editing the data I present, but as the accounts in question are not entirely test data, I feel the need to remove certain aspects of personal information. I may not edit this consistently, where that happens I apologize in advance. ↩
I’m still testing exactly how and why the device hostname gets filled in. So far it seems like there can be false positives, but at a minimum you know this hostname is used by that account. ↩