Governing Claude: Enforcing Senior Developer Standards on an LLM
How I treated Claude like a reckless Junior Developer to build a production-grade iOS health app with strict physics and medical standards.
I have a background in Physics, Data Science, and Cybersecurity. It’s a strange mix, but a useful one in my day job. I work in a field where data quality and privacy are valued more than gold.
I used to export all my health data from my iPhone and run analysis on it in Python to find actionable health insights. The problem? Every day I collected new data, I had to export it again and rerun the Python scripts. It wasn’t sustainable. So I decided to build my own app: AHIMO (Advanced Health Insights & Monitoring).
The Problem: “StackOverflow Quality”
I know Python and data science, but I didn’t know SwiftUI. I turned to Claude.
Left to its own devices, Claude writes “StackOverflow-quality” code — hacks, hardcoded values, ignored edge cases. It prioritizes “making it work” over “making it correct.” That didn’t work for me. I needed Claude to think like a senior medical device engineer.
The Solution: Strict Governance
I applied the same processes we use at work. The trick was enforcement. I didn’t just prompt the AI — I enforced a governance system.
1. The “Read or Die” Protocol
I used Claude.md (strict rules) and DEVELOPER-HANDBOOK.md (architecture). Passive context isn’t enough. Claude may read Claude.md, but it doesn’t always know which rules are relevant to a given task.
I forced compliance by explicitly invoking rules in prompts, for example:
“As per Claude.md, we must have a single data gateway…”
This ensured the correct constraints were applied every time.
2. Forced Self-Documentation (The “Drift” Trap)
Initially, I required Claude to document every implementation in separate files (CoreDataLogic.md, UIDesign.md, etc.) to verify what it was doing.
In my line of work, traceability is everything. If I don’t know the source, I discard the data. I applied the same logic here.
The Mistake:
Separate documentation files are a maintenance nightmare. As the code evolved, CoreDataLogic.md became outdated almost immediately. I found myself correcting the documentation more than the code, or worse—feeding Claude outdated context.
The Fix: DocC & Inline Documentation
I pivoted to a strict “Code is Truth” policy. Instead of external files, I forced Claude to use Swift’s native documentation standard (///) for every public property and function.
Now, if the code changes, the documentation changes with it in the same commit. This reduced context usage and ensured the “manual” was always compiled directly from the source.
3. The Constraints: The “Pink Elephant” Problem
Psychologically, if you tell someone “Don’t think of a pink elephant,” they immediately picture one. LLMs have a similar quirk. If you list 50 things not to do, the model’s attention mechanism still focuses on those concepts, increasing the probability of them slipping in.
To fix this, Claude.md relies on Contrastive Examples. I don’t just ban bad code; I provide “Golden Samples”—side-by-side comparisons of the prohibited pattern (❌) versus the required pattern (✅). This forces the model to mimic the “Good” structure rather than just trying to avoid the “Bad” one.
Here are the concrete rules I used to force senior-level code.
3.1 Zero Tolerance for “Fake” Data
LLMs love Int.random() to fill gaps. In a health app, a wrong number is a lie. I explicitly forbade it.
From Claude.md:
ABSOLUTE PROHIBITION OF SIMULATED DATA:
❌ NEVER simulate, fake, hardcode, or generate ANY numbers
❌ NEVER use Int.random(), placeholder values, or mock data
✅ ALWAYS trace EVERY number to its authentic source (HealthKit, CoreData)
3.2 The “Biological Day” Logic
Computers reset at midnight. Human bodies do not. Standard code slices sleep sessions in half at 00:00. I forced Claude to think in biological days, and I provided a specific code example to ensure it filters by the event time, not the log time.
From Claude.md:
// ✅ CORRECT - filters by when event happened (startOfDay)
let dayEntries = entries.filter { entry in
guard let eventDate = entry.startOfDay ?? entry.startDate ?? entry.timestamp else { return false }
return eventDate >= startOfDay && eventDate < endOfDay
}
// ❌ WRONG - filters by when entry was logged (timestamp)
let dayEntries = entries.filter { entry in
guard let timestamp = entry.timestamp else { return false }
return timestamp >= startOfDay && timestamp < endOfDay
}
3.3 Banishing Hardcoded Units
Claude loves math like kg * 2.204 inside a View. That’s untraceable and dangerous.
From Claude.md:
// ❌ FORBIDDEN – hardcoded conversions
let pounds = kg * 2.20462
// ✅ REQUIRED – centralized formatting
let formatter = HealthDataFormatter.shared
await formatter.formatValue(70.0, for: .bodyWeight)
3.4 Preventing the “Main Thread” Crash
My app performs complex correlations (e.g. sleep vs HRV). Claude’s default fix is slapping @MainActor everywhere to silence compiler warnings — which freezes the UI. I had to explicitly show it where concurrency belongs.
From Claude.md:
// ❌ CRASHES APP
@MainActor class SomeLayer2Analyzer { }
@MainActor func analyze() async { }
MainActor.run { /* analysis */ }
// ✅ REQUIRED - No @MainActor
class SomeLayer2Analyzer {
func analyze() async -> Result { }
}
3.5 The “Enforcer” (CI/CD: Hybrid Warfare)
Given my cybersecurity background, I didn’t trust the AI to comply 100% of the time. Policies are useless without strict enforcement.
Code standards need automated enforcement. SwiftLint catches style violations and banned patterns—like importing HealthKit in ViewModels, placing @MainActor in background services, or using print() instead of the logger.
However, SwiftLint hit a wall with localization. It couldn’t parse xcstrings JSON or verify if a key actually existed. For this, Python scripts proved to be the superior tool.
Both tools now gate every PR.
3.6 The “Good Enough” Trap (100% Tests)
Claude would run unit tests and respond “We passed 80%, this is good, task completed successfully.” Other times Claude would respond “The bugs in the code are from previous work, not my problem.”
In hobby projects, an 80% test pass rate could be “good enough.” In safety-critical engineering, every failing test is a known failure mode.
Rule: 100% of tests must pass. Always.
No deleting failing tests. No accepting “flaky” ones. This forced the AI to handle edge cases like leap years and daylight-saving transitions.
3.7 The “Lazy Polyglot” (Localization)
AHIMO is fully localized using Xcode String Catalogs (.xcstrings). Claude often commits the sin of the lazy translator: copying English text directly into Spanish or German fields to satisfy the compiler.
I started by adding a visual rule to Claude.md to prevent hardcoded strings:
// ✅ REQUIRED
Text(String(localized: "generic.loading"))
// ❌ FORBIDDEN
Text("Loading...")
However, even with this rule, localization was the #1 violator by a huge margin.
| Violation category | Fix commits |
|---|---|
| Localization | 87 |
| MainActor / threading | 3 |
| Hardcoded metrics | 3 |
| Direct data access | 2 |
The same cycle repeated over and over:
-
New feature ships with hardcoded English strings
-
Follow-up commit: “Localize all X hardcoded strings”
-
Another commit: “Add missing DE/ES translations”
Threading and architecture violations were rare because they cause immediate, visible crashes. Localization bugs are silent until someone switches languages. That silence is why they slip through.
The fix:
I forced Claude to write a Python script (find_untranslated_strings.py) to police itself. The script parses the JSON catalog and compares the first two words of translated strings against the English source. If they match, it flags the entry as a fake translation.
If I were starting again, localization checks would be enforced earlier than threading rules.
3.8 The 500k Token Wall
The localization catalog grew to 525,159 tokens — far beyond any context window.
Solution:
Claude wrote a suite of Python scripts (check_missing_translations.py, fix_untranslated_strings.py) that processed translations in batches without ever loading the full file into context.
A context-window limitation became a solvable engineering problem.
The Outcome: A Privacy Fortress
This was the most important result for me.
Most health apps process data in the cloud. I forced AHIMO to use 100% on-device processing.
Security: Because the data never leaves your device, the only way to steal your health data is to physically steal — and unlock — your phone.
No Login: No accounts. No server. I don’t see your data. Advertisers don’t see your data. Your data is yours.
Lessons Learned (From Claude)
At the end, I asked Claude:
“You developed the entire app. What are the lessons learned? Check the history of all commits”
Here are the most valuable ones:
-
Single data gateway — enforce from day one
-
Hybrid CI/CD Enforcement — use SwiftLint for code style and custom bash scripts for complex logic. Use each tool where it shines.
-
Centralized type/category checks — avoid scattered if x == .foo
-
Mandatory logging and formatting services — no raw print() or hardcoded units
-
Localization from the start — retrofitting is painful
-
Automated translation QA — scripts catch what humans miss
-
Prominent threading rules — deadlocks are brutal to debug
-
Short and long documentation — people actually read the short one
Conclusion
Developing AHIMO with Claude wasn’t about letting the AI write code. It was about teaching the AI the constraints of the domain.
By treating the prompt context as a senior architect that enforces medical and engineering standards, I was able to leverage my physics and data science background to build a production-grade iOS app in a language I didn’t know.
Disclaimer: Ahimo is not FDA approved, nor is it ISO MDR compliant or Software as a Medical Device.