Nudge: Building an Ambient AI Coach at the Omi Hackathon
How we turned a wearable mic into a real-time SOP compliance system β and then made it work for families too

What if every human interaction could be a little better? Not through surveillance β through a gentle nudge.
Try the interactive presentation β
The Problem Nobody Talks About
Every restaurant has an SOP binder. Every call center has a compliance handbook. Every hospital has protocols. And in every one of these workplaces, the same thing happens: the binder collects dust.
Itβs not that people donβt care. Itβs that real-time compliance is humanly impossible to monitor. A shift manager canβt listen to every table simultaneously. A QA lead canβt review every support call live. By the time violations surface in weekly reviews, the moment is gone β the customer already left unhappy, the safety protocol was already skipped.
The gap isnβt knowledge. Itβs timing. People know the SOPs. They just need a reminder in the moment, not three days later in a meeting.
The Spark: Omi + Ambient AI
When the Omi Hackathon Bengaluru was announced, the idea clicked immediately. Omi is a wearable AI device β essentially a tiny microphone that clips to your shirt and streams conversation transcripts in real-time. What if we could feed those transcripts through an LLM, compare them against SOPs, and generate coaching feedback while the shift is still happening?
Not a surveillance tool. Not a gotcha system. An ambient coach β like having a senior mentor whispering in your ear: βHey, you forgot to ask about allergies at table 4.β
We called it Nudge. The name beat out CueCard, Whispyr, and SOPrano because of its behavioral science roots β people resist being told what to do, but they accept gentle suggestions. The name is the product philosophy: not surveillance, just a nudge.

Architecture: The Big Picture

Before diving into the technical details, hereβs how all the pieces fit together:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β THE NUDGE PIPELINE β
β β
β ββββββββββββ ββββββββββββ ββββββββββββββββ β
β β Omi / β β Backend β β Dashboard β β
β β Facade ββββββΆβ Server ββββββΆβ (SSE live) β β
β β (mic+ β β (FastAPI)β β β β
β β whisper) β β β β Manager / β β
β ββββββββββββ ββββββ¬ββββββ β Kiosk view β β
β β ββββββββββββββββ β
β ββββββΌββββββ β
β β Claude β β
β β LLM β β
β β β β
β β Analyze β β
β β against β β
β β SOPs β β
β ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The flow:
- A wearable mic (Omi device or our laptop-based facade) captures conversation
- Audio is transcribed to text (Whisper)
- Text hits the backend via webhook
- Backend runs an LLM analysis pipeline against domain-specific SOPs
- Results stream to dashboards in real-time via Server-Sent Events
Simple in theory. The devil, as always, is in the details.
The Facade Strategy: Demo Without Hardware
Hereβs the first interesting architectural decision. The hackathon was about building apps for the Omi wearable β but we didnβt want to be blocked by hardware availability or Bluetooth connectivity issues on demo day. So we built a facade.
The facade is a separate service that mimics the Omi deviceβs webhook behavior, but using the laptopβs built-in microphone. From the backendβs perspective, itβs indistinguishable from a real Omi device.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FACADE vs PRODUCTION β
β β
β FACADE (Demo Mode) PRODUCTION (Omi) β
β ββββββββββββββββ ββββββββββββββββ β
β β Laptop Mic β β Omi Wearable β β
β β (PipeWire β β (BLE β Phone β β
β β pw-record) β β β Cloud) β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β β β β
β ββββββββΌββββββββ ββββββββΌββββββββ β
β β Local Whisperβ β Omi's Cloud β β
β β (port 2022) β β Transcriptionβ β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β β β β
β ββββββββββββββ¬βββββββββββββββ β
β β β
β ββββββββΌββββββββ β
β β Backend β β Same webhook β
β β /webhook/ β Same format β
β β transcripts β Same analysis β
β ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The facade uses PipeWire (pw-record) for audio capture β the native Linux audio system. It records at 16kHz mono (optimal for speech), then sends the audio to a local Whisper server for transcription. The transcribed text gets wrapped in exactly the same webhook payload format that Omi uses, and posted to the backend.
β Insight βββββββββββββββββββββββββββββββββββββ The facade strategy was a product philosophy, not just a workaround. Building without the hardware first forced clarity about what actually mattered. Itβs a metaphor for product development: strip away the shiny thing and build the intelligence layer first. 100% of pre-hackathon time went into prompt engineering, domain design, and user experience β the parts that actually differentiate the product. When the Omi device arrived, integration took minutes. βββββββββββββββββββββββββββββββββββββββββββββββββ
The facade also handles push-to-talk (PTT) with role assignment. During a demo, you press a button, speak as βWaiterβ or βCustomerβ, and the facade prefixes the transcript accordingly. This lets a single person simulate a multi-party conversation.
Deep Dive: The LLM Analysis Pipeline

This is the heart of Nudge. Every transcript that arrives gets run through a multi-stage analysis pipeline, powered by Claude.
The Pipeline Stages
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ANALYSIS PIPELINE β
β β
β Transcript arrives β
β β β
β βΌ β
β βββββββββββββββββββ β
β β 1. BOUNDARY β "Is this a new customer β
β β DETECTION β interaction, or continuing?" β
β ββββββββββ¬βββββββββ β
β β β
β βΌ β
β βββββββββββββββββββ¬ββββββββββββββββββ β
β β 2. VIOLATIONS β 3. SENTIMENT β β parallel! β
β β DETECTION β ANALYSIS β β
β β β β β
β β Compare against β Rate staff tone β β
β β domain SOPs β 0-100 scale β β
β ββββββββββ¬βββββββββ΄βββββββββ¬βββββββββ β
β β β β
β ββββββββββ¬βββββββββ β
β βΌ β
β βββββββββββββββββββββββββββ β
β β 4. COACHING TIPS β Only if violations found β
β β (conditional) β or sentiment < threshold β
β βββββββββββββββββββββββββββ β
β β β
β βΌ β
β Results β SSE β Dashboard β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
A few design decisions worth calling out:
Stages 2 and 3 run in parallel. Violation detection and sentiment analysis are independent β they donβt need each otherβs results. Running them concurrently with asyncio.gather() cuts the wall-clock time nearly in half.
Stage 4 is conditional. If there are zero violations and sentiment is high (the interaction went well), thereβs nothing to coach on. Skipping this stage saves an LLM call and ~3 seconds per clean interaction. In a typical restaurant shift, maybe 60-70% of interactions are clean β thatβs a lot of saved compute.
Concurrency is carefully controlled. LLM calls are expensive in both time and memory. A global semaphore limits concurrent Claude invocations, preventing the system from being overwhelmed when multiple transcripts arrive simultaneously (which happens β a busy restaurant might have five conversations happening at once).
Violation Detection: Teaching an LLM to Be a Compliance Auditor
The violation detection stage is where the LLM earns its keep. The prompt engineering here went through many iterations. Some key challenges:
The customer blame problem. Early prompts would flag customer behavior β βCustomer didnβt say please.β Thatβs not an SOP violation. The system had to be explicitly instructed to ONLY analyze staff behavior, never customers.
The severity calibration. Not all violations are equal. Forgetting to upsell dessert is a low severity. Forgetting to ask about food allergies is critical. The severity guide in the prompt maps categories to severity levels based on actual restaurant risk:
| Severity | Examples | Score Impact |
|---|---|---|
| Critical | Safety violations, allergen warnings missed, hostile behavior | -25 points |
| High | Hygiene failures, negative team dynamics, profanity | -15 points |
| Medium | Service protocol missed, no table check-back | -8 points |
| Low | Upselling opportunity missed, greeting too brief | -3 points |
The context sensitivity. βHi, welcome!β is a perfectly fine greeting at a casual restaurant. Early prompts flagged it as βgreeting too brief.β The system needed to understand that a short friendly greeting satisfies the SOP β you donβt need a 30-second welcome speech.
Prompt injection defense. Since the input comes from real speech (transcribed), someone could theoretically say things designed to manipulate the LLM. The pipeline sanitizes transcripts before analysis, stripping patterns that look like prompt injection attempts.
Sentiment Analysis: The Human Side
Beyond rule violations, sentiment analysis captures the tone of interactions. A waiter might follow every SOP perfectly but deliver it all with a monotone, disinterested voice. The sentiment score catches that.
The analysis rates staff tone on a 0-100 scale:
- 90-100 β Warm, enthusiastic, genuinely engaging
- 70-89 β Professional, friendly, good service
- 50-69 β Neutral, functional, could be warmer
- 30-49 β Curt, rushed, slightly dismissive
- 0-29 β Hostile, rude, unprofessional
The system also generates improvement tips β specific rewrites of what the staff actually said, with suggested alternatives. Not βbe more friendlyβ (useless), but βInstead of βWhat do you want?β, try βWhat can I get for you today?ββ
Three-Layer Resilience: Never Show βββ
LLM calls fail. Networks hiccup. Models return malformed JSON. And when they do, the dashboard must never show βββ for a sentiment score or silently drop violations. We discovered this the hard way: the analysis run would be marked βcompletedβ even when the LLM returned nothing, leaving the UI permanently broken for that staff member.
The fix is a three-layer defense:
Layer 1 Layer 2 Layer 3
ACCURATE STATUS RETRY FALLBACK
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ
β LLM returned β β Retry once β β Synthetic result β
β nothing? ββββββΆβ (transient ββββββΆβ score = 75 β
β Mark FAILED β β errors fix β β "Estimated" β
β not complete β β themselves) β β β
ββββββββββββββββ ββββββββββββββββ β UI always shows β
β a number, never β
β "--" β
ββββββββββββββββββββ
Layer 1 tracks status accurately β a run that produced no data is marked βfailed,β not βcompleted.β Layer 2 retries once, because transient errors (rate limits, timeouts) often resolve on the second attempt. Layer 3 injects a synthetic fallback (score 75 β neutral midpoint) so the UI always has something to show. 75 was chosen deliberately: neither alarming nor congratulatory.
Beyond retries, the pipeline uses:
- Exponential backoff β delays of 1s, 2s, 4s between attempts
- Timeout caps β 60 seconds for full analysis, 30 seconds for sentiment
- JSON extraction β A robust parser that handles LLM responses wrapped in markdown code blocks, extra whitespace, or partial outputs
- Deduplication β Hash-based dedup prevents the same violation from appearing twice when a transcript gets re-analyzed
Deep Dive: Domain-as-JSON β One Codebase, Infinite Verticals

This might be the most interesting architectural decision in the entire project. Early on, we realized that βSOP complianceβ isnβt just a restaurant problem. Call centers have scripts. Hospitals have protocols. Even families have unwritten rules about communication.
The question was: do we build a restaurant SOP tool, or do we build a platform?
We chose platform. And the key enabler is what I call Domain-as-JSON.
The Domain Schema
Every vertical β restaurant, family, sales, support β is defined entirely in a single JSON file. No code changes needed. The JSON contains:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β DOMAIN JSON SCHEMA β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β β Identity β β
β β β’ id, name, description β β
β β β’ What domain is this? β β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β β Roles β β
β β β’ Who participates? (staff/customer, β β
β β parent/child, agent/caller...) β β
β β β’ How to identify them in transcripts β β
β β β’ Display: icons, colors, labels β β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β β Prompts β β
β β β’ LLM instructions per analysis stage β β
β β β’ Boundary detection context β β
β β β’ Violation detection rules β β
β β β’ Sentiment analysis framing β β
β β β’ Coaching persona β β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β β Metrics & Scoring β β
β β β’ What to measure (customer service, β β
β β family harmony, sales effectiveness...) β β
β β β’ How to score violations β β
β β β’ Category β metric mapping β β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β β SOPs β β
β β β’ The actual rules to enforce β β
β β β’ Category, severity, description β β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Restaurant vs. Family: Same Engine, Different Soul
The beauty of this approach becomes clear when you compare two domains:
Restaurant domain: 20+ SOPs covering greeting, hygiene, safety, upselling, team dynamics. Roles are Staff and Customer. Metrics track βCustomer Serviceβ and βStaff Sentiment.β A critical violation is serving food without asking about allergies.
Family domain: This is where Nudge stopped being a business tool and became something personal. 10 core SOPs covering listening, gratitude, respect, patience, autonomy β plus 60+ relationship-specific SOPs that activate based on who is speaking to whom. Metrics track βFamily Harmonyβ and βWarmth & Respect.β A critical violation is comparing siblings β βWhy canβt you be more like your sister?β
The family domain isnβt generic. Itβs modeled on a real family navigating a specific life stage: parents learning to treat adult children as peers, adult children practicing patience with aging parents, siblings maintaining trust. The SOPs capture the unique tensions of a household where everyone is an adult but relationships still carry decades of history.
Same analysis engine. Same dashboard. Same scoring system. Completely different soul. The technology that catches a waiter forgetting to ask about allergies also catches a parent dismissing their daughterβs career autonomy. The architecture doesnβt care about domain semantics β it just knows patterns of human interaction.
The prompts section is where the magic happens. The restaurant domain tells the LLM to be a βhospitality coach.β The family domain tells it to be a βfamily counselor.β The violation detection instructions for restaurant focus on service protocol; for family, they focus on emotional intelligence. These arenβt just label changes β they fundamentally alter how the LLM interprets the same transcript.
Relationship-Based SOPs: When Context Gets Personal
The domain-as-JSON system has one more trick: relationship-based SOPs. Human relationships arenβt binary. A family has spouse dynamics, parent-child dynamics, sibling dynamics β each with fundamentally different norms.
The family domain defines relationship types with directional role pairs. When the system detects that a father is speaking to a daughter, it activates parent_to_child SOPs (empathy, autonomy, no comparison). When siblings are talking, sibling SOPs activate (sharing, no name-calling). The same transcript triggers different rules depending on who is speaking to whom.
This is a generic system β any domain could define relationship types. A management domain might have peer_to_peer vs manager_to_report SOPs. A sales domain might differentiate cold_call from existing_customer.
Why AI Coaching Works: The Rama Framework
During development, we found validation for the approach in an unexpected place: an Economist podcast (Boss Class S3E3) profiling Glowforgeβs AI sales coach, which reportedly drove 50%+ productivity gains. The framework for where AI coaching succeeds maps perfectly to Nudge:
- Repetitive but not identical β every conversation varies slightly, exactly where LLMs shine
- Tolerable correctness β coaching suggestions, not clinical decisions. Being wrong 20% of the time is fine because a human reviews it
- Human already in the loop β the manager console is a conversation starter, not a verdict
- Avoids the easy button β coaching tips suggest improvements, they donβt auto-correct
Nudge hits all four. Itβs not making medical diagnoses or legal judgments. Itβs offering suggestions that a shift manager can accept, modify, or ignore. The cost of a false positive is a 30-second conversation; the cost of a missed violation could be an allergic reaction.
Eight Domains and Counting
At the time of the hackathon, we had built eight complete domain definitions:
| Domain | What It Monitors | Key Metric |
|---|---|---|
| Restaurant | Food service staff compliance | Customer Service |
| Family | Household communication quality | Family Harmony |
| Sales | Sales call effectiveness | Deal Progress |
| Support | Customer support quality | Resolution Quality |
| Communication | General workplace communication | Communication Score |
| Drive-Through | Fast food drive-through efficiency | Speed & Accuracy |
| Management | Manager-to-team interactions | Team Morale |
| Onboarding | New employee training quality | Onboarding Score |
Adding a new domain takes about 30 minutes of writing JSON β no backend changes, no deployment, no code review. The entire personality of the system pivots based on a URL parameter: ?domain=family vs ?domain=restaurant.
β Insight βββββββββββββββββββββββββββββββββββββ The domain-as-JSON pattern is more than configuration. Itβs a form of prompt engineering stored as data. Each domain JSON contains the entire βpersonalityβ of the system for that vertical β how it interprets speech, what it considers violations, how it coaches. This makes the system genuinely multi-tenant without any tenant-specific code. βββββββββββββββββββββββββββββββββββββββββββββββββ
Deep Dive: Real-Time Architecture
A compliance tool that shows results 5 minutes later isnβt very useful. Nudge needed to be real-time β from spoken word to dashboard alert in seconds, not minutes.
The Webhook-to-SSE Pipeline
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REAL-TIME DATA FLOW β
β β
β Mic β Whisper β Webhook βββ β
β β β
β ββββββββΌβββββββ β
β β Analysis β β
β β Queue β β asyncio.Queue β
β β β with semaphore β
β ββββββββ¬βββββββ β
β β β
β ββββββββΌβββββββ β
β β Pipeline β β
β β (Claude) β β
β ββββββββ¬βββββββ β
β β β
β ββββββββΌβββββββ β
β β Redis β β
β β Pub/Sub β β
β ββββββββ¬βββββββ β
β β β
β ββββββββββββββββΌβββββββββββββββ β
β β β β β
β βββββββΌββββββ βββββββΌββββββ βββββββΌββββββ β
β β Dashboard β β Manager β β Kiosk β β
β β (SSE) β β (SSE) β β (SSE) β β
β βββββββββββββ βββββββββββββ βββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The webhook endpoint accepts transcripts from both the facade and real Omi devices. It authenticates via either a shared secret header (facade) or a device-specific UID parameter (Omi). The transcript gets enqueued for analysis β not processed synchronously, because we donβt want the webhook response to block while Claude thinks.
The analysis queue uses Pythonβs asyncio.Queue with a worker that processes jobs sequentially per staff member but allows concurrent processing across different staff. A semaphore caps the total concurrent LLM calls β this is critical because Claude calls take 5-15 seconds each, and a busy shift might generate 20 transcripts in a minute.
Redis Pub/Sub bridges the gap between the analysis worker and connected dashboards. When the pipeline produces results (a new violation, an updated sentiment score, a coaching tip), it publishes to a Redis channel. Any number of SSE clients can subscribe to these channels.
Server-Sent Events (SSE) deliver updates to the browser. Unlike WebSockets, SSE is unidirectional (serverβclient) which is exactly what we need β the dashboard doesnβt send data back, it just receives updates. SSE also reconnects automatically on network drops, which matters for a tool that runs during a full shift.
The Concurrency Challenge
Hereβs a lesson learned the hard way: SQLAlchemy async sessions are not safe for concurrent operations within a single transaction.
Early in development, we tried to write violation results and sentiment results to the database concurrently (since theyβre computed in parallel). This caused intermittent database corruption. The fix: LLM calls run in parallel, but database writes are strictly sequential. The parallel computation saves wall-clock time; the sequential writes ensure data integrity.
β
CORRECT β WRONG
LLM calls: LLM calls:
ββββββββββββ¬βββββββββββ ββββββββββββ¬βββββββββββ
βViolationsβSentiment β βViolationsβSentiment β
ββββββ¬ββββββ΄βββββ¬ββββββ ββββββ¬ββββββ΄βββββ¬ββββββ
β β β β
DB writes: DB writes:
ββββββΌββββββ ββββββΌββββββ¬βββββΌββββββ
βViolationsβ βViolationsβSentiment β
ββββββ¬ββββββ ββββββββββββ΄βββββββββββ
ββββββΌββββββ π₯ Corruption!
βSentiment β
ββββββββββββ
Deep Dive: The Dashboard
All this backend machinery would be useless without a way to see the results. Nudge has three dashboard views, each designed for a different use case.
The Main Dashboard

The dashboard is a single-page application built with vanilla HTML, CSS, and JavaScript β no React, no Vue, no build step. This was a deliberate choice for hackathon speed: zero toolchain overhead, instant reload, easy to deploy as a static file served by FastAPI.
Key elements:
- Compliance gauges β Real-time scores for each metric (customer service, sentiment) displayed as animated circular gauges
- Staff timeline β Each staff memberβs interactions plotted on a timeline, color-coded by violation severity
- Violation feed β Live stream of detected violations with severity badges, SOP references, and suggested fixes
- Coaching tips β AI-generated improvement suggestions, specific to what was actually said
- Drill-down panels β Click any staff member to see their individual interaction history, violations, and coaching tips
Everything updates live via SSE. No polling, no manual refresh.
The Manager View

The manager view is designed for shift supervisors. It focuses on per-staff performance rather than individual violations. Key feature: the conversation replay. Managers can read the full transcript of any interaction and see exactly where violations were flagged β with the relevant SOP rule highlighted.
The Heart Gauge: Visualizing Warmth
One design detail worth calling out: sentiment isnβt shown as a number. Itβs shown as a heart gauge β an SVG heart that fills from bottom to top, with color interpolated through a 5-stop warmth palette:
| Score Range | Color | Descriptor |
|---|---|---|
| 0-15 | Slate | Cold |
| 16-35 | Mauve | Cool |
| 36-55 | Rose | Settling |
| 56-75 | Coral | Warm |
| 76-100 | Scarlet | Heartfelt / Radiant |
The color interpolation uses smoothstep (t*t*(3-2*t)) for perceptually smooth transitions. The heart pulses gently when the score is above 45. Each skin overrides the heart colors to match its palette. Itβs a small thing, but it transforms a clinical metric into something that feels human.
The Kiosk View: The Break Room TV

The kiosk view is designed for a TV mounted in the break room or kitchen. Itβs a βdeparture boardβ skin β giant text, auto-rotating slides, no interaction needed. Readable from 3 meters away.
The kiosk rotates through three slide types, all domain-driven:
- Aggregate β Giant compliance ring or heart gauge, staff count, interaction count. βHow weβre doing together.β
- Spotlight β Individual staff member highlighted with their avatar, heart gauge, and an encouragement message
- Nudge β The most impactful coaching tip from the shift: βInstead of X, try Yβ in large readable text
Each domain configures its own kiosk labels. The restaurant kiosk says βEveryoneβs on track!β when there are no violations. The family kiosk says βThe household is in harmony today!β Same component, different soul.
Multi-Domain in Action
Switching domains transforms the entire experience. Hereβs the family domain β same dashboard, completely different context:



Notice how everything adapts: the role labels (Staff β Family), the metric names (Customer Service β Family Harmony), the icons, the color scheme, and most importantly β what constitutes a violation. In restaurant mode, not asking about allergies is critical. In family mode, comparing siblings is critical.
Skins: 15 Visual Identities
Beyond domain switching, Nudge supports 15 visual skins β each with its own philosophy and aesthetic identity. This started as βletβs add a dark modeβ and escalated into a full design system.




Each skin is pure CSS β no JavaScript changes. The skin system uses CSS custom properties (variables) for colors, fonts, and effects, switched via a data-skin attribute on the body element. A URL parameter (?skin=terminal) or the in-app skin switcher controls which skin is active.
Every skin answers a different βkey questionβ β theyβre not just color swaps but philosophically distinct designs:
| Skin | Philosophy | Key Question |
|---|---|---|
| Clean | Matte control room | Does it feel calm and professional? |
| Terminal | Green-on-black log | Does it feel like reading server logs? |
| Cafe | Earthy, warm, inviting | Does it feel like a cozy menu board? |
| Glass | Frosted translucent | Does it feel like a premium glass UI? |
| Neon | Synthwave arcade | Does it feel like an 80s arcade? |
| Porsche | Precision luxury | Does it feel like a Taycan instrument cluster? |
| Amazon | Deep jungle canopy | Does it feel like the rainforest floor at dawn? |
| Contrast | Maximum readability | Can someone with low vision read everything? |
Domain and skin are orthogonal β any of the 15 skins works with any of the 8 domains. Thatβs 120 visual combinations from one codebase.
β
Insight βββββββββββββββββββββββββββββββββββββ Building 12 skins taught us more about CSS than any tutorial. We discovered that display: none on a parent hides ALL children (including helpful βrotate to landscapeβ messages), that animation keyframes donβt replay when DOM elements are recreated by polling, and that pastel colors that look great on dark backgrounds fail WCAG contrast on light backgrounds. Each skin was a crash course in CSS specificity, cascade, and accessibility. βββββββββββββββββββββββββββββββββββββββββββββββββ
The Demo Inject: Hackathon Secret Weapon
Demo day at a hackathon is stressful. Network might be flaky. LLM API might be slow. Microphone might pick up the person next to you. You need a reliable fallback.
We built a demo inject system that pre-loads 10 realistic restaurant interactions with fully analyzed results β violations, sentiment scores, coaching tips, everything. One API call resets the database and hydrates it with the snapshot. The dashboard lights up with data instantly, no LLM calls needed, no network dependency.
The snapshots are generated offline: we run real conversations through the full pipeline, then serialize the database state to JSON. On demo day, itβs a restore operation β fast, deterministic, and impressive.
Each domain has its own snapshot, so switching to ?domain=family and hitting the demo button gives you a completely different dataset with family-appropriate interactions and violations.
Deployment: Cloudflare Tunnel
For the hackathon demo, we needed the app accessible from a URL β judges might want to try it on their phones, and the Omi device needs a public webhook endpoint.
We chose Cloudflare Tunnel over ngrok for several reasons:
- No interstitial page β ngrok shows a βvisit this site?β warning page in browsers that confuses non-technical users
- Custom domain β a clean, branded URL looks more professional than a random ngrok subdomain
- No bandwidth caps β ngrokβs free tier has limitations that bite you mid-demo
- Built-in rate limiting β Cloudflareβs rate limiting (30 req/min/IP) protects against accidental abuse
The tunnel exposes localhost:8000 to the internet. The backend proxies PTT (push-to-talk) requests to the facade at localhost:8001, so everything is accessible through a single URL. Total setup time: about 10 minutes including DNS.
Daily Reports: From Compliance to Empathy
Real-time alerts are useful during a shift. But what about after? Managers need a summary of the day. Staff need to know how they did. And hereβs where Nudge stops being a monitoring tool and starts being a coach.
The Design Philosophy
The raw data is violations, scores, and compliance percentages. But imagine being a waiter who just worked a tough shift, and your phone shows: βYou had 7 violations and a 35% compliance score.β Youβd feel judged. Defensive. You might never open the app again.
So we made a deliberate choice: the LLM prompt explicitly bans compliance language. Words like violation, compliance, score, failure, non-compliant, penalty, infraction never appear in reports. Instead, everything gets reframed through empathy:
| Compliance Framing | Empathetic Reframe |
|---|---|
| β7 violations todayβ | β7 growth moments todayβ |
| βCompliance score: 35%β | Warmth level: βWarming Upβ with hearts |
| βFailed to greet customerβ | βHereβs what you saidβ β βAn idea for next timeβ |
| βLow sentiment detectedβ | βRoom to add warmthβ |
| βYou need improvementβ | βOne thing to try tomorrowβ |
The LLM is specifically prompted: βYou are a kind, experienced mentor writing a personal note. Lead with a win. Be specific. Reference their actual words.β
The Staff Report: A Personal Note, Not a Scorecard
Every staff member gets their own daily report β a personal reflection on their shift. The structure follows a deliberate emotional arc:
- Greeting β βHey Priya, hereβs how your Wednesday wentβ (personal, warm)
- Your Win Today β Always leads positive, even on a tough day. Arjun had 7 growth moments, but his report opens with: βYou handled two tables during a tough shift when the kitchen was slammed. You stayed on the floor, kept things moving β that reliability matters.β
- Day at a Glance β Conversations count, moments captured, and a warmth label with hearts instead of percentages
- Highlights β Specific moments from their actual transcripts, not generic praise
- Growth Moments β Side-by-side speech bubbles: βWhat you saidβ β βAn idea for next time,β with context on why the alternative works
- Tomorrowβs Focus β One specific, actionable thing. Not three. One.
- Closing β A quote and a personal sign-off using their name
The warmth labels replace numerical scores entirely:
- Shining (80-100) β Celebration-heavy. Multiple highlights, maybe one optional polish suggestion. βYouβre setting the standard for the whole team.β
- Steady (50-79) β Balanced. A couple highlights, one or two growth moments. βGood day with room to grow.β
- Warming Up (0-49) β Growth-focused but still genuinely positive. The report finds something real to celebrate β they showed up, they kept the floor moving, they got the orders right. Then growth moments, max two, each with specific transcript evidence.
The speech bubble comparison is the heart of the design. Itβs not abstract advice (βbe warmerβ) β itβs βYou said βYeah the kitchen is backed up today. Itβs not my fault.β Try: βI completely understand your concern. Our kitchen is experiencing higher than usual volume today, but Iβm keeping a close eye on your order.ββ Then it explains why: βThis turns a tough moment into a chance to show guests youβre on their team.β
A staff member reading this doesnβt feel caught. They feel coached.
The Manager Briefing: Patterns, Not Just Numbers
The manager gets a different report β analytical but still human. It opens with a one-line headline capturing the dayβs story, then provides:
- Team Win β What went well, with names
- Todayβs Stars β Who excelled and specifically why
- Coaching Opportunities β Who needs support, with the actual quote that triggered it, and a concrete coaching suggestion (βPull him aside tomorrow and role-play handling wait time questionsβ)
- Todayβs Pattern β The one theme across the team (βthe warmth gap is our biggest opportunityβ)
- Coaching Priorities β Top 3 actionable items with who, what, and how
- Individual Reports β Cards linking to each personβs private report
The manager report includes direct transcript quotes in context. When it says Arjun needs coaching, it shows exactly what he said (βYeah the kitchen is backed up today. Itβs not my fault.β) and suggests exactly how to coach it (βPractice the phrase: βI completely understand β let me check on that for you right now.ββ). This isnβt a dashboard metric β itβs a conversation starter for tomorrowβs pre-shift huddle.
The Hybrid Prompt: 80% Cost Reduction
The key insight for reports: pre-aggregated metrics alone just restate dashboard numbers. Full transcripts are expensive and wasteful (most interactions are clean). The hybrid approach sends the LLM:
- ALL interactions as compact score lines (number, compliance%, sentiment, violation count)
- FLAGGED interactions only β key transcript excerpts around violations and coaching moments
- Existing coaching tips so the LLM can spot patterns, not repeat advice
- Temporal ordering so the LLM can discover trends (βimproved after lunchβ)
This keeps prompts at ~2-3K tokens while giving the LLM enough context to discover cross-interaction patterns. Result: 80% cost reduction compared to sending full transcripts, with better output quality because the LLM can focus on what matters.
What We Learned
Technical Lessons
Parallel LLM calls, sequential DB writes. The async concurrency model matters. Getting this wrong causes subtle data corruption thatβs hard to debug.
Skip work when possible. Not every interaction needs coaching tips. Clean interactions (no violations + high sentiment) skip the coaching stage entirely. In a real deployment, this saves ~40% of LLM compute.
Domain-as-JSON scales beautifully. Adding a new vertical takes 30 minutes of JSON editing. The entire system personality β prompts, rules, metrics, scoring β lives in data, not code.
SSE > WebSockets for dashboards. When data flows only serverβclient, SSE is simpler, auto-reconnects, and works through proxies that sometimes break WebSocket upgrades.
Facade pattern for hardware independence. Building the facade first meant we could develop and test the entire pipeline without any hardware. When the Omi device arrived, integration took minutes.
Design Lessons
Skins are harder than they look. CSS custom properties make color theming easy, but every skin needs testing across every page, viewport, and domain combination. We built automated tooling for this.
Severity calibration is everything. The difference between a useful compliance tool and an annoying one is whether the severity levels match human intuition. Too many false positives and people ignore it. Too few and it misses real issues.
Coaching tips must be specific. βBe more friendlyβ is useless. βInstead of βWhat do you want?β, try βWhat can I get for you today?ββ is actionable. The LLM prompt engineering for coaching tips went through the most iterations of any component.
The Hackathon Mindset
Our decisions log has exactly one entry: the day we chose the name. Thatβs it. No second-guessing, no bikeshedding about tech stacks, no endless architecture debates. The remaining decisions β FastAPI over Flask, Claude over GPT-4, real-time over batch β were made quickly and never revisited. When youβre building for a hackathon with a hard deadline, decision paralysis is the enemy. Trust your instincts, ship, test, iterate.
The Bigger Picture
Nudge started as a restaurant SOP tool and became something broader. The domain-as-JSON architecture means it can be a family communication coach, a sales call analyzer, a customer support quality monitor, or a new employee onboarding assistant β all from the same codebase.
The wearable form factor is what makes it work. You canβt ask people to talk into a phone or sit at a computer to get coached. But a small device clipped to their shirt that quietly listens and gently nudges? That fits into the flow of actual work and actual life.
The Omi device follows you from your shift at the restaurant to your drive home to your family dinner. Same device, different domains, same goal: helping you show up as the version of yourself youβre proud of.
Weβre exploring the possibility of an ambient AI coach β a gentle nudge to make every human interaction a little better. At work. At home. Everywhere the wearable goes.
Built for the Omi Hackathon Bengaluru 2026. Try the interactive presentation.