Jack Li

What It Is

LaunchKit is an AI workspace for end-to-end e-commerce product research. A PM provides a category, and a multi-step workflow runs the research sequence end-to-end; at each step, an agent dispatches researcher, data-analyst, and reviewer subagents in parallel across 20+ tools (SellerSprite, Sorftime, Google Trends, Amazon PDP, web search) to produce a cited P&L, slide deck, and supporting deliverables. PMs steer mid-run with plain-language follow-ups; outputs land as xlsx, pptx, and pdf in the project file tree as the workflow runs.

Architecture

The system is distributed across four long-running services and two burst tiers, allowing each layer to scale on its own rhythm. Task identity is owned by the Next.js browser through the ?task= URL param, while agent output streams via Aegra's SSE. FastAPI on 8080 handles task CRUD, file uploads, the HMAC credential broker, and the publish endpoints for both skills and workflow templates. Aegra on 8000 runs the LangGraph workflow graph, dispatched by workflow_revision_id against an LRU compile cache.

Each workflow is a multi-step StateGraph with N typed step nodes. State accumulates through a step_outputs typed dict that each complete_step tool call appends to, and the subsequent step inherits. Every step invokes the same shared Deep Agent, and three named subagents (researcher, data-analyst, reviewer) are dispatched via the task tool. Skills and workflows publish through an authoring layer as immutable, content-hashed revisions; skill revisions sync into each sandbox and surface to the agent through SkillsMiddleware with progressive disclosure — only name and description appear in the system prompt, while the agent reads the full skill from the sandbox when triggered. Tool calls flow through a 6-phase API response cache (Postgres + Redis sliding-window rate-limit + SET NX fetch-locks) before hitting external APIs; deferred tools (SellerSprite, Sorftime, Google Trends, Amazon) are resolved by a tool_search semantic index running embeddinggemma-300m in-process. Each task gets a lazy E2B Firecracker microVM mounted to S3 via s3fs and a custom credlib that calls the broker for short-lived STS credentials. The classifier runs separately on Modal H100s with 192-way burst concurrency and a GPU memory snapshot.

Technical Deep Dive

1. Burst-GPU Amazon Classifier Pipeline

A PM researching blenders needs answers that don't show up in any Amazon filter. What fraction of the top thousand listings ship a glass jar versus plastic? How many bundle a smoothie cup or extra blade assembly? Which form-factors and color palettes dominate the bestseller pages? These calls decide downstream procurement and packaging — if 60% of competitors use glass jars, glass is cheap at scale, suppliers carry it in volume, and consumers already expect the weight; if every listing bundles a cup, shipping single-piece becomes a positioning choice instead of a default. None of that signal is structured data. It lives in product copy, photo captions, and bullet points scattered across each page, and the agent has to read each one to extract it. Inside a single tool call, the agent fetches a thousand Amazon PDPs (product detail pages), classifies each one across a dozen semantic axes, and returns a structured summary the agent uses to plan its next step. A thousand frontier-LLM classifications in seconds should not be possible. The pipeline does it in roughly the time of one slow PDP fetch.

The initial approach of per-record frontier-API calls was the obvious one and the wrong one, due to burst limitations. A 1000-way fanout from a single tool call creates exactly the sudden-spike pattern that token-bucket rate limiters throttle hardest, and frontier providers layer separate acceleration limits on top that 429 sharp ramps before the per-minute ceiling even applies — Anthropic's top published tier caps Claude Haiku 4.5 at 4M ITPM and 4000 RPM, the wrong shape for ~5M input tokens arriving in seconds. Managing anthropic-ratelimit-* headers and retry-after backoff under that load becomes its own subsystem. A small model on rented GPU sidesteps the whole thing, and is cheaper as a side effect since the model is 4B params rather than frontier-scale.

The classifier runs as a self-hosted Qwen 3.5-4B (FP8) on a warm Modal H100 with vLLM's xgrammar backend enforcing the schema. One container, continuous batching, 192-way concurrency — queue depth is controlled by the pod, eliminating 429s. Upstream PDP fetching runs to a thousand-way limit through a residential proxy network, and the two stages stream into one another so each PDP fires the per-record classifier the moment it arrives, with no wait for the slowest fetch. A fire-and-forget warmup probe runs in parallel with the first PDP fetch to hide the Modal cold-start.

The cost per record collapses to GPU burst minutes amortized across in-engine concurrency, not per-call API rates.

2. Per-Task S3-Backed Firecracker microVMs

A LaunchKit task is a research run that produces a project of files (xlsx, pptx, pdf) the user sees in their browser as the workflow runs. The agent driving it needs a real filesystem behind those files — it reads inputs, writes outputs, shells in to execute Python or render charts. That filesystem has to be isolated per task, scoped per project, and visible to three surfaces simultaneously: the agent in the sandbox, the FastAPI files endpoint serving the frontend's file tree, and Postgres holding metadata. The current design is per-task E2B Firecracker microVMs with /files/ mounted via s3fs FUSE.

The first iteration held authoritative file metadata in Postgres and reconciled from S3 at end-of-turn. This broke the moment the agent wrote a file mid-turn: the FUSE write hit S3 in seconds, but the agent's own read_file call (which queried Postgres) returned a 404 until the turn finished. The model would deadlock its own plan retrying. The fix made S3 the source of truth for reads — all file reads now query S3 directly. Writes still dual-write to both S3 and Postgres so the frontend file tree updates instantly, and three sync sites reconcile S3 back into Postgres: at end-of-turn, after an interrupt, and during a pre-turn catch-up gated by a per-project Redis flag.

Credentials live in a separate broker. Because microVMs can't hold long-lived AWS keys, the in-VM credlib signs each refresh with HMAC-SHA256 over a pinned canonical string of method, path, instance ID, timestamp, nonce, and body hash; the broker verifies, cross-checks E2B's instance metadata, and mints a scoped STS session per request.

Writes are visible across all surfaces within ~100ms instead of ~30s. Cold sandbox boot ~2s, warm resume ~1s. Zero static AWS credentials inside any sandbox. The "Postgres knows everything" instinct cost a week of phantom-bug reports.