Building an iPhone app that can see
I want an app I can hold.
Something on my own phone that uses the camera to actually see and understand what it’s looking at.
I’ve built plenty with Claude Code already. Websites, bots, automations. But those all live on a server somewhere. An iPhone app is different ground, and before I could start I had to clear up three things I’d been nodding along to for years without really knowing what they were.
First, Xcode. It’s Apple’s program for building Apple apps, and it’s the only official way to turn code into something that runs on an iPhone and eventually lands in the App Store. It holds the compiler, the fake iPhones you test on, and the signing tools that let Apple trust your app enough to run it. It’s free, and it only works on a Mac. You can write the code anywhere, but at some point everything passes through Xcode.
Second, the simulator. It’s a fake iPhone that runs in a window on my Mac. I hit run and an iPhone shows up on screen with my app inside it. No cable, no real device, instant. It’s how you move fast without picking up your phone every thirty seconds to check one tiny change.
There’s a catch that matters for a camera app. The simulator has no real camera. It can hand your app a canned photo, but it can’t give you a live lens. So anything that actually sees has to be tested on a real iPhone plugged into the Mac. Worth knowing before you lose an hour staring at a black camera screen wondering what you broke.
Third, what Claude Code is actually doing in all this. This was the part that surprised me. It doesn’t just write the Swift code. It sets up the whole project, runs the build, boots a simulator, and tells me what broke, all from the terminal. The loop ends up looking like this:
- I tell it what the app does, in plain words.
- It builds a SwiftUI project and writes the code.
- It runs it in the simulator to check it actually compiles.
- I tell it what’s wrong or what I want next. It edits.
- For the camera, I plug in my iPhone, trust the Mac, and run it there instead.
The seeing part has two roads. One is Apple’s own Vision framework, which runs right on the phone and can read text, find objects, scan barcodes, and spot faces, all offline and free. The other is to grab a frame off the camera and send it to an AI model that describes what it sees in plain language. On-device is faster and private. The model is smarter but needs a connection. For a first build I’d get the camera working with Apple’s tools first, then swap in the smart part once the basic plumbing holds.
So that’s the map. Xcode is the workshop. The simulator is the test bench. Claude Code is doing most of the actual building while I describe what I want.
I haven’t shipped it yet. Right now it runs in the simulator and I’m about to plug in the phone for the first time. I’ll write the next one from the other side, with whatever I got wrong in here fixed.