iPhone as 3D controller

One of the more interesting aspects of Appleā€™s iPhone from a device evolution standpoint is the combination of a very advanced operating system with a rich array of physical sensors covering the visual, the aural, the spatial, acceleration, proximity, and, with the newly announced 3GS, compass orientation.

This, combined with an easy-to-use development environment, makes it an attractive device for exploring physical control of software applications.

I had used a SpaceNavigator from 3DConnexxion before, to control 3D applications. Using it the other day lead me to the idea that my iPhone, by way of its acceleration sensor (and its compass, as per iPhone 3GS), might be used as a 3D controller for 3D modelling software.

I’d gotten in on the iPhone developer program early so I’ve been able to comb through the Cocoa Touch framework (that provides the APIs I need on iPhone for doing this) and develop towards a working application. Since then, I’ve spent a couple of rainy Sundays working on the app - this article details the design and state of the application, which is currently working on iPhone and Mac, but not available in the App Store yet for reasons I’ll detail later on.

Concept

The idea is simple enough: an object (or view) in a 3D modeler or viewer follows the rotational movements an iphone makes by way of someone holding, and rotating, that iPhone. Think of it as a technological form of telekinesis: by rotating your hand, a 3D model on screen rotates as well, mimicing the motion.

In a working scenario, modeling could be done by moving/clicking a mouse with the right hand, and holding/rotating your iPhone with your left hand. Without a touching a keyboard, you’ll be able to rotate your model and work on it at the same time: a major timesaver (and repetitive strain preventer) for 3D artists.

Object Interaction

Since the concept is to basically directly connect your hand to an on-screen 3D model, the UI is the physical phone. For interaction purposes, a brick would serve just as well, although it’d be heavy to hold.

However, since an iPhone is more than a brick (insert joke about hacked iphones), we can extend the conceptual interaction model to utilize some other nice things the iPhone has on board.

If we would hardwire a 3D model’s rotation axes to the phone’s rotation axes, two impracticalites arise:

  1. If you lay down the phone, the object lies down, too. This is not useful.
  2. The rotational extent of the object is tied to that of your wrist, assuming we create a one-to-one translation of rotational axes. This, again, is impractical: what if you want to rotate the object more than, say, 100 degrees?

Luckily, the iPhone offers some other sensors. For example, all of the screen is, at it’s simplest, one big button. The solution for much better rotation control is this: have the user touch and hold the screen anywhere in order to ‘activate’ rotation. Not only does this prevent the problem of lying the phone down, it also allows us to rotate an object by any amount we wish by ‘hauling’ it – hold, rotate phone (object rotates, too), let go, rotate phone back (object stays in place), hold again, and rotate phone again (object rotates further).

Ergonomics and UI design

A natural way to hold the iPhone when you use it to ‘indicate rotation’ is the same way most people hold it if they use it with one hand. People (I’m assuming right-handed for explanation purposes) hold the phone naturally in a way where the left bottom side of the phone rests again the bit of your palm that crosses over into the base of the thumb. This area is known as the ‘thenar space’ in medical terms.

This posture is almost the same as you see in Apple’s marketing shots, but it has the corner a bit more inside the palm, for more vertical stability. This is shown below - with our 3D remote’s user interface shown as well:

For regular use of this application, there’s no need to look at the screen, so you can hold the iPhone with your wrist in a neutral position: slightly tilted forward and to the right. Notice how the natural way of touching the screen, while holding in this position, is using the thumb (try it). As we’ve seen before the thumb is jointed to the hand on the bottom left of the device. If you move the thumb naturally, i.e., without flexing it, it will draw a curved arc over the screen as shown here:

With this in mind, the ‘zoom bow’ you see in the UI was designed to conform to that arc, thereby implicitly defining a zoom gesture for zooming in and out of the 3D scene (something done very often during 3D modeling) - in a way that minimizes physical strain. To ensure the user knows he or she is ‘zoom-gesturing’, both an audio cue and a visual cue are used (visual cue shown next to the gesture - the bow is highlighted).

Putting the bow in place still leaves a significant portion of screen estate ‘unused’. More importantly, for the iPhone 1G and 3G, we have no way of detecting yaw on the device - only pitch and roll can be detected properly, owing to the orientation being read using a gravity-powered sensor. (Read this short article for the difference between pitch, yaw, and roll. Also note that for the iPhone 3GS, we could use the compass – I haven’t checked the SDK yet on this point).

As we would want to add yaw rotation in a way other than using orientation sensors, we happen to have very nice simile in another iconic apple product: iPod. Below you’ll see an image showing iPod’s click wheel for comparison, an image depicting a ‘touch wheel’-oriented yaw gesture, and the visual feedback of this gesture (which is accompanied by an audible cue):

Summarizing, with some proper design, we’ve managed to keep the UI exceedingly straightforward: hold anywhere to rotate (pitch/roll), use zoom gesture to zoom, use yaw gesture to yaw.

Interface finishing touches (Pun intended)

To keep with the theme of simplicity, we need only two settings to make things usable:

One option to disable sound, once the user’s muscle memory allows him or her to consistently make correct gestures while not looking at the device (even well designed UI sounds can get annoying at some point). And, another setting to flip/mirror the interface, for all the lefties out there (scroll wheel will be the same place, but the zoom bow will flip over).

Building the app and ‘connecting’ iPhone with a 3d model

After our fun with the UI design, we’re going to get the rotation and other gesture information from the phone all the way into the 3D model environment. In the technical and practical sense, this is the most involved part of the project.

As a quick note, building the apps on the iPhone and the Mac was a breeze. The APIs are very good on both platforms, but it’s the XCode environment with its iPhone simulator that made this fun and effective rather than a pain (like an experience a long time ago in which I’ve done some Java coding for earlier mobile devices - brrr).

Since no 3D modeling app I’ve seen out there has any recognizable form of ‘open standard for outside realtime control of 3d models’, let alone using non-standard hardware devices, we have to create a software proxy between our app and the 3D modeling app. And since the 3D modeling app lives on the computer, we also need a proxy between the computer and the device. So the chain of information flow is as follows:

  1. Hand motions, sensed by
  2. iPhone’s hardware sensors (rotation/acceleration/screen/compass), interpreted by
  3. an iPhone app, that prepares and publishes the information using its own, built-in
  4. motion information network server, communicating (via WiFi) with
  5. a Mac app with a network client that retrieves the motion information, then uses
  6. OS-wide messaging to reach
  7. A 3D app plugin that’s hosted by a 3D app. The plugin applies the info the correct
  8. 3D Model.

Quite some complexity here. Luckily, it’s mostly behind-the-scenes complexity, so we can keep this a good user experience if we work hard.

Specifically, we have to build one (1) iphone app, one (1) mac app, and a number of plugins – one for each 3D application we want to support.

Note: WiFi was the only option at the time of building the application – with iPhone OS 3.0, it should be possible to connect using bluetooth or by USB. Whether this can be used for iPhone-to-mac-communication is surprisingly non-obvious, and still under investigation (I hope I can say this under the dev program NDA).

The Mac app

The Mac app’s job is basically to make the administrative stuff as hassle-free as humanly possible, avoiding technical details wherever possible in favor of doing extra work behind the scenes. For example, for connecting to the device we’ll use Bonjour, to prevent tedious networking configuration.

So, having implemented all of this, does WiFi work for conveying this sort of information streams? Well, yes, it works. But it’s far from perfect. Over a local, fast LAN I can reliably get around 10 updates per second. More that than becomes challenging – not because of bandwidth, but because of lag. I’ve tried usability by rotating objects in a simple OpenGL window, and it’s not good enough yet. I hope OS 3.0’s bluetooth, or USB, prove a better alternative. Failing that, proper realtime interpolation of data points will go a long way towards getting good usability.

Update: after some retooling of the underlying protocol to native port TCP packets, it’s now smooth enough to support proper movement translation, coming up to about 60 updates per second on a local WiFi (802.11g) network. 802.11n should be extremely smooth without breaking a sweat.

The plugins

This is a lot of work, and not straightforward. For a good user experience, we need a high resolution of display updates – at least 30 times per second, but preferably 60. To not block the 3D modeling app’s own UI, this needs to be offered either using a high frequency timer-based callback structure, or by native threaded support. No 3D modeling application I’ve seen so far supports either. Most APIs, if usable and present at all, are designed to do ‘construction’-like helper plugins, such as ‘build a staircase with one click’ – not for direct, high-frequency control over user actions in the UI.

The best I’ve been able to do is 1-second callbacks in Google Sketchup - obviously not good enough for actual use. Since Luxology offers a trial of its (really svelte) Modo 3d modeler I’ve been able to check that out for a little while – but I haven’t found ways API-wise to do it well.

What it looks like in practice

Steps to completion

When some good plugin code is place for a popular 3D modeling app, there’s finishing the code, hallway usability testing, performance improvements, QAing and more QAing, marketing messaging, pricing, and the whole actual App Store publishing bonanza.

Next steps would be building an open plugin API so other enthusiasts can connect the proxy/connector app to their favorite 3D modeler. Also, a PC-based connector, and plugins for PC-based 3D modeling applications, would be in order at some point.

At this point however, proper API support in some Mac 3D apps is the biggest hurdle to getting this thing into the App Store. Anyone that can point me to high-frequency callback support in 3D modeling apps’ plugin APIs is kindly invited get in touch – please do leave a message or twitter me at @tacoe.

Monday June 1, 2009 / project / front page