momory

Tech Insights

The engineering philosophy and technical challenges behind MOMORY.

LLM-Centric Context Engineering

Unlike traditional translation, MOMORY leverages the 'In-context Learning' capabilities of LLMs. By providing a sliding window of recent transcripts, we enable the model to understand the nuances of live conversation, such as subject omissions and ongoing topics.

This approach allows the AI to generate more coherent and contextually relevant subtitles compared to isolated sentence translation.

The Philosophy of Latency: Prioritizing 'Comfort' Over Raw Speed

While technically possible to achieve sub-second latency, we believe the best experience lies in a 'sweet spot' of 1.5 to 2.5 seconds. This intentional delay ensures that translations are triggered only after a complete thought or sentence is spoken, leading to higher accuracy and better context.

Translating too quickly results in 'fragmented translations' (e.g., 'I think...' -> '...this is a pen'), which increases cognitive load for the audience and wastes API calls on incomplete thoughts. Our 'safe' low-latency mode for paid tiers is engineered to provide a comfortable, high-quality experience while respecting API costs and quotas.

Quota & Tier Strategy

MOMORY is optimized for Gemini API's tiered quota system. Even with a paid key, new accounts (Tier 1) start with a limit of 1,500 RPD (Requests Per Day). We balance high-frequency updates with these strict limits.

Our core optimization strategies include:

  • Adaptive Burst BufferingIntelligently adjusts buffering based on silence. Responds instantly to the start of speech while grouping continuous talk into larger chunks to save RPD.
  • Silent No-Call LogicStrictly prevents API calls when no speech is detected, preserving your quota for meaningful moments.
  • Tier-Aware Low LatencyPaid Tier Mode enables safe low-latency translation (~1.5s lag) by balancing responsiveness with RPD consumption.
  • Contextual Sliding WindowManaging history with an efficient buffer to provide context without bloating input tokens (TPM).

Data Privacy with Gemini API: Free vs. Paid Tiers

MOMORY operates with a Zero-Server Privacy Architecture, meaning we do not store any of your conversational data or API keys on MOMORY's servers. Your Gemini API key is stored only in your browser's local storage. Voice data is processed locally via the Web Speech API and then sent directly to the Gemini API for translation.

However, it is crucial to understand Google's data policy for the Generative AI API, which differs between Free and Paid Tiers:

Free Tier Data PolicyWhen using a Free Tier API key, your input (conversation) data MAY be used by Google to improve its AI models. This is a common practice for free services to enhance AI capabilities. Read Google's Generative AI Terms of Service for more details.
Paid Tier (Tier 1+) Data PolicyWith a Paid Tier API key (Pay-as-you-go), your input data is NEVER used for model training and remains private. This offers enhanced privacy protection and significantly higher API quotas. We recommend upgrading if data privacy and higher performance are critical for your stream.

Real-time Stability Layer

Web Speech API results can be 'shaky' with frequent intermediate updates. MOMORY implements a stability layer that waits for a confidence threshold or a logical pause before triggering a translation, ensuring the overlay remains readable.

This reduces visual noise and keeps the audience focused on the content, not the flickering text.

Vibe-coding: UI/UX with Soul

Performance is a feature, but 'vibe' is an experience. We use modern frameworks like Tailwind CSS and Framer Motion to create a fluid, responsive UI that feels alive.

Key UI/UX considerations include high-performance feedback loops (like the 60fps volume meter), subtle 'glow' effects for active states, and standardized micro-interactions across all pages.

Maximizing LLM Potential

We don't just use AI to translate; we use it to 'interpret'. Through custom system instructions, streamers can define their own persona and slang dictionaries, allowing the AI to act as a specialized bridge for their unique community.

This semantic understanding allows for the translation of cultural nuances that traditional rule-based systems simply cannot handle.