Zaid

Overview

Neurovid is a decision-making tool for content creators. You upload a video and it returns a predicted engagement timeline with spike and drop detection, a neural signal heatmap across four dimensions (Attention, Emotion, Visual, Cognitive Load), and an interactive 3D brain visualization that reacts in real-time to video playback. The insights panel flags hook strength, confusion moments, and concrete suggestions for improvement.

Problem

Content creators make iteration decisions on vibes. You ship a video, wait for the retention graph, and if it tanks you guess why. By the time YouTube Analytics tells you people dropped off at 0:14, the edit is already live and the damage is done. There is no feedback loop before publish that tells you where the hook weakens, where attention fragments, or why a section confuses viewers.

Pre-publish prediction has historically required either massive test audiences or research lab infrastructure. Neither is accessible to a solo creator shipping weekly.

Approach

The architecture is a three-tier pipeline with a fixed signals contract so the implementation can evolve without breaking the UI.

- Frontend: Next.js with Zustand for state, React Three Fiber for the 3D brain viz, and a VideoPlayer that drives a TimelineGraph and InsightsPanel through the store
- API layer: Next.js API routes handle uploads and analysis fetches
- Analysis service: Python FastAPI service running video processing, audio processing, signal generation, and an insights engine
- Signal pipeline: Heuristic signal generators today (computer vision + audio analysis) designed to be drop-in replaced by real ML models (Meta's TRIBE v2 architecture) without changing any downstream interface

The key move was treating the signal API as a stable contract. The UI binds to signal shapes, not signal origins. Swap heuristics for a neural model later and nothing above changes.

Key Decisions

- Heuristic-first, ML-ready: Shipping with heuristics means the product works end-to-end today. The interface boundary means we can plug in TRIBE-style ML models the moment they are practical without rewriting the frontend.
- R3F for the brain viz over plain WebGL: The 3D brain is the product's distinctive surface. React Three Fiber gives declarative scene composition and Zustand subscription model plays well with video timestamp updates.
- FastAPI over a Node analysis service: The signal pipeline is CPU-heavy and will eventually be GPU-heavy. Python has the ecosystem for both (OpenCV, librosa, PyTorch later). Keeping it a separate service means the Next.js frontend stays deployable on Vercel.
- Four signal dimensions, not more: Attention / Emotion / Visual / Cognitive Load maps cleanly to creator-intelligible concepts. Adding more channels is signal noise for the user.

Challenges

Synchronizing three data streams (video playback, timeline graph, brain viz) to a single timestamp without frame drift required a central Zustand store that every component subscribes to - no prop drilling, no callback chains. Any component that wants to know "where are we in the video" reads from one source.

Designing the signal generator interface so that heuristic and ML implementations are truly interchangeable meant overspecifying the contract early: same input shape, same output shape, same error modes. It felt like premature abstraction until the moment it saved a rewrite.

Outcome

End-to-end pipeline working with heuristic signals. The 3D brain reacts to video playback, the timeline graph shows engagement predictions with spike detection, and the insights panel produces actionable hook / confusion / improvement notes. Signal backend is ready for an ML model swap whenever Meta's TRIBE v2 or an equivalent becomes practical to self-host.