Appearance
Live Notes
Architecture for the live notes feature — Deepgram streaming, speaker diarization, Cloudflare Durable Object relay, and AI summarization.
Freemium: Free users get a 30-minute lifetime trial. See Freemium Psychology & Engagement Design for the rationale behind all gate placement and UX decisions.
Overview
Live Notes is available to all users with a 30-minute lifetime free trial. Pro/Ultimate users have unlimited access.
- User configures a
SessionConfig(title, template, optional calendar event link) inSessionSetupSheet. - Negotiates a session with the lucidpal-api (
POST /transcription/session). - Opens a WebSocket through a Cloudflare Durable Object (
TranscriptionSession) that bridges the iOS audio stream to Deepgram's streaming API. - Renders live speaker-diarized transcript segments in
TranscriptionViewwith pause/resume, bookmarking, and speaker renaming. - On stop, sends the full transcript to
POST /transcription/summarize(Gemini) to generate a rich structured summary. - Saves the result as a
NoteItemviaNotesStoreProtocol. SessionDetailView+SessionDetailViewModelrender the saved note and handle calendar push for action items.
Component Map
iOS Client
┌─────────────────────────────────────────────────────┐
│ TranscriptionView │
│ ↕ (observes phase + service via objectWillChange) │
│ TranscriptionViewModel (@MainActor) │
│ ↕ (owns) │
│ DeepgramTranscriptionService (@MainActor) │
│ ├─ AVAudioEngine (mic tap → PCM → Int16/16kHz) │
│ └─ URLSessionWebSocketTask (→ CF Durable Object) │
└─────────────────────────────────────────────────────┘
↕ WSS
┌─────────────────────────────────────────────────────┐
│ lucidpal-api (Cloudflare Workers) │
│ POST /transcription/session (auth + quota check) │
│ GET /transcription/ws (→ DO upgrade) │
│ POST /transcription/summarize (Gemini) │
│ │
│ TranscriptionSession (Durable Object) │
│ ├─ clientSocket ↔ serverSocket (WebSocketPair)│
│ └─ deepgramWs (→ Deepgram nova-2) │
└─────────────────────────────────────────────────────┘TranscriptionViewModel
@MainActor final class TranscriptionViewModel: ObservableObject
Phase state machine
idle → connecting → recording → summarizing → done
↘ error(String)| Phase | Trigger | Description |
|---|---|---|
idle | Initial / after stop with empty transcript | Waiting |
connecting | start() called | service.startRecording() in progress |
recording | WS open + audio engine running | Live transcription |
summarizing | stopAndSave() called with non-empty transcript | Gemini call in flight |
done | Save complete | savedNote is set; view dismisses and opens note editor |
error(String) | Any failure | Message surfaced to user |
SessionConfig
SessionConfig is passed from SessionSetupSheet into TranscriptionViewModel and forwarded to DeepgramTranscriptionService. Fields:
| Field | Type | Description |
|---|---|---|
title | String | Optional user-supplied session title (empty = AI-generated) |
template | SessionTemplate | Recording context (Meeting, Interview, Brainstorm, Freeform, etc.) — passed to /summarize to tune the prompt |
calendarEventId | String? | EventKit identifier of the linked calendar event — saved to NoteItem.calendarEventId |
Pause / Resume
TranscriptionViewModel.pause() / resume() delegate to DeepgramTranscriptionService.pauseRecording() / resumeRecording(). While paused, the audio engine tap is suspended and the WS send task stops forwarding chunks; the WebSocket connection remains open.
Bookmarks
TranscriptionViewModel.addBookmark() calls DeepgramTranscriptionService.addBookmark(), which appends a BookmarkItem to service.bookmarks:
swift
struct BookmarkItem: Codable, Identifiable, Sendable {
let id: UUID
let timestamp: Int // elapsedSeconds at the moment of tap
var label: String? // optional user label (not set during recording)
}Bookmarks are harvested in stopAndSave() and written to NoteItem.bookmarks. They appear in SessionDetailView with their timestamps.
Speaker renaming
Speaker names are local state in TranscriptionView (speakerNames: [Int: String]). Tapping a speaker label opens a rename alert; the name is stored by speakerIndex and applied to all ChatBubble views for that speaker. Names are not persisted to the NoteItem — the saved transcript uses the original "Speaker N" labels from formattedTranscript.
Service change forwarding
DeepgramTranscriptionService is a nested ObservableObject. TranscriptionViewModel subscribes to service.objectWillChange and re-emits it on its own objectWillChange so TranscriptionView re-renders on every service property change without needing to directly observe the service.
Summarization
stopAndSave() calls POST /transcription/summarize with the formatted transcript (speaker-prefixed segments joined by \n\n, capped at 50 000 chars). On success the note is saved with source: .voice — no bodyRTF is set, only body (plain text).
DeepgramTranscriptionService
@MainActor final class DeepgramTranscriptionService: NSObject, ObservableObject
Published state
| Property | Type | Description |
|---|---|---|
isConnecting | Bool | Session negotiation in progress |
isRecording | Bool | Audio engine running and WS open |
liveTranscript | String | Accumulated final words (space-joined) |
partialTranscript | String | Current in-flight phrase; cleared when isFinal |
speakerSegments | [SpeakerSegment] | Finalized speaker-labelled segments |
micAmplitude | Float | 0.0–1.0 RMS amplitude (updated per audio buffer) |
elapsedSeconds | Int | Wall-clock recording duration (1 Hz timer) |
errorMessage | String? | Last error, if any |
Session negotiation
POST /transcription/sessionwithAuthorization: Bearer <jwtToken>.- Returns
{ sessionToken, wsUrl }. The WS URL embeds the session token as a query param:wss://api.lucidpal.app/transcription/ws?session=<token>. - Session token stored in KV with 600 s TTL, encoding
{ userId, remainingMins }for the Durable Object.
Error responses:
402→TranscriptionError.subscriptionRequired429→TranscriptionError.limitReached
Audio pipeline
AVAudioEngine.inputNode
└─ tap (bus 0, 4096 frames, native hardware format)
└─ AVAudioConverter → Int16, 16 kHz, mono, interleaved
└─ AsyncStream<(Float amplitude, Data pcmChunk)>
└─ sendTask: URLSessionWebSocketTask.send(.data(chunk))The tap block is nonisolated (Self.makeTapBlock) to avoid actor reentrancy on the audio thread. It yields (amplitude, data) into the AsyncStream; the @MainActor sendTask consumes the stream and forwards chunks to the WebSocket.
WS message handling
Each URLSessionWebSocketTask.Message.string is decoded as TranscriptEvent:
swift
struct TranscriptEvent: Decodable {
let type: String
let transcript: String
let isFinal: Bool
let words: [WordEvent] // word + speaker index + start time
}Non-final: partialTranscript = event.transcript.
Final: append transcript to liveTranscript; group words into speakerSegments — appending to the last segment if the speaker index is unchanged, otherwise creating a new SpeakerSegment.
formattedTranscript
If speakerSegments is non-empty:
Speaker 1: <text>\n\nSpeaker 2: <text>\n\n...Otherwise falls back to liveTranscript (raw space-joined words).
This formatted string is what gets saved as NoteItem.body and sent to /summarize.
Error handling
- Mid-session WS close with code
4029or reason containing "limit" → "Monthly transcription limit reached" message. AVAudioSession.interruptionNotification(e.g. phone call) →stopRecording()immediately.URLSessionWebSocketDelegate.urlSession(_:webSocketTask:didCloseWith:reason:)→ error message if still recording.
SpeakerSegment Model
swift
struct SpeakerSegment: Identifiable, Codable, Equatable, Sendable {
let id: UUID
let speakerIndex: Int // 0-based Deepgram speaker index
var text: String // accumulated words for this segment (mutable)
let startTime: TimeInterval // start time of first word in segment
var speakerLabel: String { "Speaker \(speakerIndex + 1)" }
}Segments are only appended or extended — never removed while recording. The UI renders each segment as a color-coded bubble (speaker index → speakerColors array: indigo/teal/orange/pink).
API Routes (lucidpal-api)
POST /transcription/session
Auth: Bearer JWT (authMiddleware)
Quota check:
Free users (no active Pro/Ultimate subscription):
- Read
X-Device-IDheader — return400if missing. - Read KV
trial_secs:{deviceId}— seconds used so far (permanent key, no expiry). - Compute
remaining = FREE_TRIAL_SECONDS (1800) - usedSeconds. - Return
402ifremaining <= 0. - Store session token in KV
txn_session:<token>(TTL 600 s) with{ userId, deviceId, isTrial: true, limitSecs: remaining }. - Response includes
trialSecondsRemaining: remaining.
Paid users (active Pro/Ultimate):
- Read KV
transcription_mins:<userId>:<YYYY-MM>. - Return
429ifusedMinutes >= plan limit. - Store session token with
{ userId, isTrial: false, limitSecs: remainingMins * 60 }.
Response: { sessionToken: string, wsUrl: string, trialSecondsRemaining?: number }
The iOS client stores trialSecondsRemaining in LiveTrialManager via syncFromServer(remainingSeconds:) on successful session start.
GET /transcription/ws?session=<token> (Durable Object)
Upgrades to WebSocket. The Durable Object (TranscriptionSession) reads txn_session:<token> from KV, opens a connection to Deepgram (nova-2, diarize=true, interim_results=true, 16kHz linear16), and bridges the two sockets:
iOS ──audio PCM──▶ serverSocket ──▶ deepgramWs ──▶ Deepgram
iOS ◀──TranscriptEvent JSON── serverSocket ◀── deepgramWsHard cap enforcement: on each message from the iOS side, if Date.now() - startMs >= limitMs → close client socket with code 4029.
Usage tracking (trackUsage): on close (any reason):
- Trial path (
isTrial: true): compute elapsed seconds, add totrial_secs:{deviceId}KV (permanent, no expiry, capped atFREE_TRIAL_SECONDS). - Paid path (
isTrial: false): computeMath.ceil((elapsed ms) / 60_000)and incrementtranscription_mins:<userId>:<YYYY-MM>(TTL 60 days).
POST /transcription/summarize
Auth: Bearer JWT + active Pro/Ultimate subscription check.
Body: { transcript: string (10–50 000 chars), durationSeconds: number (1–7200) }
Model: Gemini 2.5 Flash Lite (gemini-2.5-flash-lite-preview-06-17), maxOutputTokens: 1024, temperature: 0.3.
Prompt: instructs Gemini to return structured JSON only, shaped by the SessionTemplate context. Response schema:
json
{
"title": "string",
"summary": "string",
"highlights": ["string"],
"chapters": [{ "startSeconds": 0, "title": "string", "summary": "string" }],
"decisions": ["string"],
"actionItems": ["string"],
"deadlines": ["string"],
"openQuestions": ["string"],
"followUpDraft": "string"
}Fallback: if Gemini returns non-200 or non-JSON, responds with { title: "Voice Note", summary: transcript[:500], actionItems: [] } — the client always gets a saveable result.
Usage Limits
| Plan | Limit | KV key pattern |
|---|---|---|
| Free (trial) | 1,800 s lifetime | trial_secs:{deviceId} (permanent) |
| Pro | 300 min / month | transcription_mins:<userId>:<YYYY-MM> |
| Ultimate | 1,200 min / month | same |
| Dev | unlimited | quota check skipped |
Trial seconds are tracked per device (Keychain UUID, survives reinstalls). Pro/Ultimate minutes are tracked per calendar month. The Durable Object writes usage at session close; the /session endpoint reads it at session open to gate new sessions.
Cost note: Deepgram charges ~$0.0043/min. The 30-minute free trial caps cost at $0.13 per device lifetime.
Free Trial — iOS Client Components
LiveTrialManager
@MainActor final class LiveTrialManager: ObservableObject
Caches the trial balance locally so the UI can render without a network call.
| Property | Type | Description |
|---|---|---|
secondsUsed | Int | Persisted in UserDefaults |
secondsRemaining | Int | max(0, 1800 - secondsUsed) |
isExhausted | Bool | secondsRemaining == 0 |
hasShownModal | Bool | First-session modal already shown |
syncFromServer(remainingSeconds:) reconciles with the server balance (called after /session response). Server is always the source of truth.
#if DEBUG: resetForTesting() wipes UserDefaults keys and is surfaced in DevTierDrawer.
DeviceIdentityService
Generates a stable UUID on first launch, stored in Keychain (kSecAttrAccessibleAfterFirstUnlock). Used as X-Device-ID header and as the server-side KV key for trial tracking. See Freemium Psychology for details.
Subscription Gate (PremiumManager)
The Live Notes tab is ungated — all users can enter. The upgrade prompt appears at the Stop & Save action for free users with an exhausted trial. See Freemium Psychology for the full gate placement rationale.
Relationship to NotesStore
Voice notes are saved with:
source: .voicebody: formatted transcript (plain text, speaker-prefixed)bodyRTF: not set — no rich-text editor involvedaiSummary,aiActionItems: from Gemini (or fallback)aiCategory: set byNoteEnrichmentServicein background (same pipeline as manual notes)
Voice notes are saved with the full AI output mapped to NoteItem fields:
NoteItem field | Source |
|---|---|
source | .voice |
body | formatted transcript (speaker-prefixed plain text) |
durationSeconds | service.elapsedSeconds at stop |
sessionTemplate | from SessionConfig.template |
calendarEventId | from SessionConfig.calendarEventId |
bookmarks | service.bookmarks |
aiSummary | Gemini summary |
aiActionItems | Gemini actionItems |
aiHighlights | Gemini highlights |
aiChapters | Gemini chapters (mapped to NoteChapter) |
aiDecisions | Gemini decisions |
aiDeadlines | Gemini deadlines |
aiOpenQuestions | Gemini openQuestions |
aiFollowUpDraft | Gemini followUpDraft |
bodyRTF | not set |
aiCategory | set later by NoteEnrichmentService (background) |
After notesStore.save(note), TranscriptionViewModel sets savedNote and transitions to .done, dismissing LiveNotesSheet.
SessionDetailViewModel
@MainActor final class SessionDetailViewModel: ObservableObject
Drives the post-session detail view. Responsibilities:
scheduleActionItem(_ title: String)— opensCreateEventSheetpre-filled with the action item title and a default time of tomorrow at 09:00. Requests calendar access first if not already granted.createCalendarEvent(...)— callsCalendarService.createEvent(...)and adds the item title tocalendarAddedItemsso the UI shows a checkmark.calendarDraft: SiriPendingEvent?— drives theCreateEventSheetsheet presentation.calendarAddedItems: Set<String>— tracks which action items have been scheduled (per-session, not persisted).