It didn’t take long for hackers to crack the Siri protocol and discover exactly how Siri works. The beautiful thing about the hack was that it used a Moxie Marlinspike talked about some years back. After forcing all of Siri’s internet traffic through a packet-sniffer, it became apparent that Siri communicates over HTTPS with a server at Apple.
By simply creating a certificate that looks like Apple’s but uses a a fake CA which could be installed onto the iPhone, it was simple to trick the iPhone into communicating directly with an internal server. Once this was in place, our hacker friends could get to work on figuring out what information the iPhone sends for each Siri command.
The task actually proved to be more awkward than one would expect. To begin with, Apple makes use of its own proprietary HTTP method, called ACE, in order to communicate. On top of this, the body of the HTTP message is a binary blob, which makes it relatively unclear as to what is going on inside the communication. Finally, the headers of the HTTP message seem to contain a unique ID which seems tied to the iPhone making the request, most likely to identify the device and prevent unauthorized devices from making use of the service.
After some very intelligent guesswork, it was possible to work out exactly what sort of content gets sent to Apple every time you use Siri. It seems that the binary blob is compressed using the zlib compression library, and ultimately it simply contains a large plist with all of the data that Apple’s servers need in order to process a Siri request. Of course, the information sent in this list will vary depending on the communication.
Generally, when you make a Siri request, all of the magic happens outside of your precious iPhone 4S. The audio content is recorded and then compressed using the Ogg Speex codec, which was developed for VOIP communications. This is then bundled up and sent back to Apple. Apple’s server processing farm, performs the voice-recognition on the audio recording and returns the text along with confidence score ratings and timestamps for each word. More than likely, other data such as your GPS co-ordinates is also sent back to Apple for processing.
The hackers at Applidium who have broken the protocol have published their tools on Github. What is really cool about their work is that it is possible to record an audio sample on a non-iPhone device, and then compress it using the Speex codec and then send it off to Apple for processing. Of course, you need your iPhone 4S’ unique identifier in order to do this, but once you’ve got it, you can rig up your old Apple desktop or PC to interface with Siri and do whatever you need it to do.
On the other hand, if you’re like me, the idea that every time you send a text message or email using Siri all of that content routes through Apple’s servers first, will send a shiver down your spine. I stopped using Gmail for my personal and company email, a long time ago.