Skip to content

Live Notes

Architecture for the live notes feature — Deepgram streaming, speaker diarization, Cloudflare Durable Object relay, and AI summarization.

Freemium: Free users get a 30-minute lifetime trial. See Freemium Psychology & Engagement Design for the rationale behind all gate placement and UX decisions.


Overview

Live Notes is available to all users with a 30-minute lifetime free trial. Pro/Ultimate users have unlimited access.

  1. User configures a SessionConfig (title, template, optional calendar event link) in SessionSetupSheet.
  2. Negotiates a session with the lucidpal-api (POST /transcription/session).
  3. Opens a WebSocket through a Cloudflare Durable Object (TranscriptionSession) that bridges the iOS audio stream to Deepgram's streaming API.
  4. Renders live speaker-diarized transcript segments in TranscriptionView with pause/resume, bookmarking, and speaker renaming.
  5. On stop, sends the full transcript to POST /transcription/summarize (Gemini) to generate a rich structured summary.
  6. Saves the result as a NoteItem via NotesStoreProtocol.
  7. SessionDetailView + SessionDetailViewModel render the saved note and handle calendar push for action items.

Component Map

iOS Client
┌─────────────────────────────────────────────────────┐
│  TranscriptionView                                   │
│    ↕ (observes phase + service via objectWillChange) │
│  TranscriptionViewModel  (@MainActor)                │
│    ↕  (owns)                                         │
│  DeepgramTranscriptionService  (@MainActor)          │
│    ├─ AVAudioEngine  (mic tap → PCM → Int16/16kHz)   │
│    └─ URLSessionWebSocketTask  (→ CF Durable Object) │
└─────────────────────────────────────────────────────┘
              ↕ WSS
┌─────────────────────────────────────────────────────┐
│  lucidpal-api (Cloudflare Workers)                   │
│    POST /transcription/session  (auth + quota check) │
│    GET  /transcription/ws        (→ DO upgrade)      │
│    POST /transcription/summarize (Gemini)            │
│                                                      │
│    TranscriptionSession  (Durable Object)            │
│      ├─ clientSocket  ↔ serverSocket  (WebSocketPair)│
│      └─ deepgramWs  (→ Deepgram nova-2)              │
└─────────────────────────────────────────────────────┘

TranscriptionViewModel

@MainActor final class TranscriptionViewModel: ObservableObject

Phase state machine

idle → connecting → recording → summarizing → done
                             ↘ error(String)
PhaseTriggerDescription
idleInitial / after stop with empty transcriptWaiting
connectingstart() calledservice.startRecording() in progress
recordingWS open + audio engine runningLive transcription
summarizingstopAndSave() called with non-empty transcriptGemini call in flight
doneSave completesavedNote is set; view dismisses and opens note editor
error(String)Any failureMessage surfaced to user

SessionConfig

SessionConfig is passed from SessionSetupSheet into TranscriptionViewModel and forwarded to DeepgramTranscriptionService. Fields:

FieldTypeDescription
titleStringOptional user-supplied session title (empty = AI-generated)
templateSessionTemplateRecording context (Meeting, Interview, Brainstorm, Freeform, etc.) — passed to /summarize to tune the prompt
calendarEventIdString?EventKit identifier of the linked calendar event — saved to NoteItem.calendarEventId

Pause / Resume

TranscriptionViewModel.pause() / resume() delegate to DeepgramTranscriptionService.pauseRecording() / resumeRecording(). While paused, the audio engine tap is suspended and the WS send task stops forwarding chunks; the WebSocket connection remains open.

Bookmarks

TranscriptionViewModel.addBookmark() calls DeepgramTranscriptionService.addBookmark(), which appends a BookmarkItem to service.bookmarks:

swift
struct BookmarkItem: Codable, Identifiable, Sendable {
    let id: UUID
    let timestamp: Int   // elapsedSeconds at the moment of tap
    var label: String?   // optional user label (not set during recording)
}

Bookmarks are harvested in stopAndSave() and written to NoteItem.bookmarks. They appear in SessionDetailView with their timestamps.

Speaker renaming

Speaker names are local state in TranscriptionView (speakerNames: [Int: String]). Tapping a speaker label opens a rename alert; the name is stored by speakerIndex and applied to all ChatBubble views for that speaker. Names are not persisted to the NoteItem — the saved transcript uses the original "Speaker N" labels from formattedTranscript.

Service change forwarding

DeepgramTranscriptionService is a nested ObservableObject. TranscriptionViewModel subscribes to service.objectWillChange and re-emits it on its own objectWillChange so TranscriptionView re-renders on every service property change without needing to directly observe the service.

Summarization

stopAndSave() calls POST /transcription/summarize with the formatted transcript (speaker-prefixed segments joined by \n\n, capped at 50 000 chars). On success the note is saved with source: .voice — no bodyRTF is set, only body (plain text).


DeepgramTranscriptionService

@MainActor final class DeepgramTranscriptionService: NSObject, ObservableObject

Published state

PropertyTypeDescription
isConnectingBoolSession negotiation in progress
isRecordingBoolAudio engine running and WS open
liveTranscriptStringAccumulated final words (space-joined)
partialTranscriptStringCurrent in-flight phrase; cleared when isFinal
speakerSegments[SpeakerSegment]Finalized speaker-labelled segments
micAmplitudeFloat0.0–1.0 RMS amplitude (updated per audio buffer)
elapsedSecondsIntWall-clock recording duration (1 Hz timer)
errorMessageString?Last error, if any

Session negotiation

  1. POST /transcription/session with Authorization: Bearer <jwtToken>.
  2. Returns { sessionToken, wsUrl }. The WS URL embeds the session token as a query param: wss://api.lucidpal.app/transcription/ws?session=<token>.
  3. Session token stored in KV with 600 s TTL, encoding { userId, remainingMins } for the Durable Object.

Error responses:

  • 402TranscriptionError.subscriptionRequired
  • 429TranscriptionError.limitReached

Audio pipeline

AVAudioEngine.inputNode
  └─ tap (bus 0, 4096 frames, native hardware format)
       └─ AVAudioConverter → Int16, 16 kHz, mono, interleaved
            └─ AsyncStream<(Float amplitude, Data pcmChunk)>
                 └─ sendTask: URLSessionWebSocketTask.send(.data(chunk))

The tap block is nonisolated (Self.makeTapBlock) to avoid actor reentrancy on the audio thread. It yields (amplitude, data) into the AsyncStream; the @MainActor sendTask consumes the stream and forwards chunks to the WebSocket.

WS message handling

Each URLSessionWebSocketTask.Message.string is decoded as TranscriptEvent:

swift
struct TranscriptEvent: Decodable {
    let type: String
    let transcript: String
    let isFinal: Bool
    let words: [WordEvent]        // word + speaker index + start time
}

Non-final: partialTranscript = event.transcript.

Final: append transcript to liveTranscript; group words into speakerSegments — appending to the last segment if the speaker index is unchanged, otherwise creating a new SpeakerSegment.

formattedTranscript

If speakerSegments is non-empty:

Speaker 1: <text>\n\nSpeaker 2: <text>\n\n...

Otherwise falls back to liveTranscript (raw space-joined words).

This formatted string is what gets saved as NoteItem.body and sent to /summarize.

Error handling

  • Mid-session WS close with code 4029 or reason containing "limit" → "Monthly transcription limit reached" message.
  • AVAudioSession.interruptionNotification (e.g. phone call) → stopRecording() immediately.
  • URLSessionWebSocketDelegate.urlSession(_:webSocketTask:didCloseWith:reason:) → error message if still recording.

SpeakerSegment Model

swift
struct SpeakerSegment: Identifiable, Codable, Equatable, Sendable {
    let id: UUID
    let speakerIndex: Int       // 0-based Deepgram speaker index
    var text: String            // accumulated words for this segment (mutable)
    let startTime: TimeInterval // start time of first word in segment

    var speakerLabel: String { "Speaker \(speakerIndex + 1)" }
}

Segments are only appended or extended — never removed while recording. The UI renders each segment as a color-coded bubble (speaker index → speakerColors array: indigo/teal/orange/pink).


API Routes (lucidpal-api)

POST /transcription/session

Auth: Bearer JWT (authMiddleware)

Quota check:

Free users (no active Pro/Ultimate subscription):

  1. Read X-Device-ID header — return 400 if missing.
  2. Read KV trial_secs:{deviceId} — seconds used so far (permanent key, no expiry).
  3. Compute remaining = FREE_TRIAL_SECONDS (1800) - usedSeconds.
  4. Return 402 if remaining <= 0.
  5. Store session token in KV txn_session:<token> (TTL 600 s) with { userId, deviceId, isTrial: true, limitSecs: remaining }.
  6. Response includes trialSecondsRemaining: remaining.

Paid users (active Pro/Ultimate):

  1. Read KV transcription_mins:<userId>:<YYYY-MM>.
  2. Return 429 if usedMinutes >= plan limit.
  3. Store session token with { userId, isTrial: false, limitSecs: remainingMins * 60 }.

Response: { sessionToken: string, wsUrl: string, trialSecondsRemaining?: number }

The iOS client stores trialSecondsRemaining in LiveTrialManager via syncFromServer(remainingSeconds:) on successful session start.


GET /transcription/ws?session=<token> (Durable Object)

Upgrades to WebSocket. The Durable Object (TranscriptionSession) reads txn_session:<token> from KV, opens a connection to Deepgram (nova-2, diarize=true, interim_results=true, 16kHz linear16), and bridges the two sockets:

iOS  ──audio PCM──▶  serverSocket  ──▶  deepgramWs  ──▶  Deepgram
iOS  ◀──TranscriptEvent JSON──  serverSocket  ◀──  deepgramWs

Hard cap enforcement: on each message from the iOS side, if Date.now() - startMs >= limitMs → close client socket with code 4029.

Usage tracking (trackUsage): on close (any reason):

  • Trial path (isTrial: true): compute elapsed seconds, add to trial_secs:{deviceId} KV (permanent, no expiry, capped at FREE_TRIAL_SECONDS).
  • Paid path (isTrial: false): compute Math.ceil((elapsed ms) / 60_000) and increment transcription_mins:<userId>:<YYYY-MM> (TTL 60 days).

POST /transcription/summarize

Auth: Bearer JWT + active Pro/Ultimate subscription check.

Body: { transcript: string (10–50 000 chars), durationSeconds: number (1–7200) }

Model: Gemini 2.5 Flash Lite (gemini-2.5-flash-lite-preview-06-17), maxOutputTokens: 1024, temperature: 0.3.

Prompt: instructs Gemini to return structured JSON only, shaped by the SessionTemplate context. Response schema:

json
{
  "title": "string",
  "summary": "string",
  "highlights": ["string"],
  "chapters": [{ "startSeconds": 0, "title": "string", "summary": "string" }],
  "decisions": ["string"],
  "actionItems": ["string"],
  "deadlines": ["string"],
  "openQuestions": ["string"],
  "followUpDraft": "string"
}

Fallback: if Gemini returns non-200 or non-JSON, responds with { title: "Voice Note", summary: transcript[:500], actionItems: [] } — the client always gets a saveable result.


Usage Limits

PlanLimitKV key pattern
Free (trial)1,800 s lifetimetrial_secs:{deviceId} (permanent)
Pro300 min / monthtranscription_mins:<userId>:<YYYY-MM>
Ultimate1,200 min / monthsame
Devunlimitedquota check skipped

Trial seconds are tracked per device (Keychain UUID, survives reinstalls). Pro/Ultimate minutes are tracked per calendar month. The Durable Object writes usage at session close; the /session endpoint reads it at session open to gate new sessions.

Cost note: Deepgram charges ~$0.0043/min. The 30-minute free trial caps cost at $0.13 per device lifetime.


Free Trial — iOS Client Components

LiveTrialManager

@MainActor final class LiveTrialManager: ObservableObject

Caches the trial balance locally so the UI can render without a network call.

PropertyTypeDescription
secondsUsedIntPersisted in UserDefaults
secondsRemainingIntmax(0, 1800 - secondsUsed)
isExhaustedBoolsecondsRemaining == 0
hasShownModalBoolFirst-session modal already shown

syncFromServer(remainingSeconds:) reconciles with the server balance (called after /session response). Server is always the source of truth.

#if DEBUG: resetForTesting() wipes UserDefaults keys and is surfaced in DevTierDrawer.

DeviceIdentityService

Generates a stable UUID on first launch, stored in Keychain (kSecAttrAccessibleAfterFirstUnlock). Used as X-Device-ID header and as the server-side KV key for trial tracking. See Freemium Psychology for details.


Subscription Gate (PremiumManager)

The Live Notes tab is ungated — all users can enter. The upgrade prompt appears at the Stop & Save action for free users with an exhausted trial. See Freemium Psychology for the full gate placement rationale.


Relationship to NotesStore

Voice notes are saved with:

  • source: .voice
  • body: formatted transcript (plain text, speaker-prefixed)
  • bodyRTF: not set — no rich-text editor involved
  • aiSummary, aiActionItems: from Gemini (or fallback)
  • aiCategory: set by NoteEnrichmentService in background (same pipeline as manual notes)

Voice notes are saved with the full AI output mapped to NoteItem fields:

NoteItem fieldSource
source.voice
bodyformatted transcript (speaker-prefixed plain text)
durationSecondsservice.elapsedSeconds at stop
sessionTemplatefrom SessionConfig.template
calendarEventIdfrom SessionConfig.calendarEventId
bookmarksservice.bookmarks
aiSummaryGemini summary
aiActionItemsGemini actionItems
aiHighlightsGemini highlights
aiChaptersGemini chapters (mapped to NoteChapter)
aiDecisionsGemini decisions
aiDeadlinesGemini deadlines
aiOpenQuestionsGemini openQuestions
aiFollowUpDraftGemini followUpDraft
bodyRTFnot set
aiCategoryset later by NoteEnrichmentService (background)

After notesStore.save(note), TranscriptionViewModel sets savedNote and transitions to .done, dismissing LiveNotesSheet.


SessionDetailViewModel

@MainActor final class SessionDetailViewModel: ObservableObject

Drives the post-session detail view. Responsibilities:

  • scheduleActionItem(_ title: String) — opens CreateEventSheet pre-filled with the action item title and a default time of tomorrow at 09:00. Requests calendar access first if not already granted.
  • createCalendarEvent(...) — calls CalendarService.createEvent(...) and adds the item title to calendarAddedItems so the UI shows a checkmark.
  • calendarDraft: SiriPendingEvent? — drives the CreateEventSheet sheet presentation.
  • calendarAddedItems: Set<String> — tracks which action items have been scheduled (per-session, not persisted).

Internal — not for distribution