Technical Blueprint v7 Blueprint Técnico v7
Shopilot.ai Shopilot.ai
The AI copilot for eCommerce — built like Cursor, powered like Claude Code, designed for sellers. El copilot de IA para eCommerce — construido como Cursor, potenciado como Claude Code, disenado para vendedores.
A native browser app that lives where sellers work. Shopilot sees your data, understands your business, executes actions, and proactively tells you what to do next. Una app nativa tipo navegador que vive donde trabajan los vendedores. Shopilot ve tus datos, entiende tu negocio, ejecuta acciones y proactivamente te dice que hacer.
10
Weeks to MVPSemanas al MVP
~65%
Backend reused from existing codeBackend reutilizado de codigo existente
192
Issues in LinearIssues en Linear
19
Architecture ProjectsProyectos Arquitectura
Changelog — v1 → v7 Changelog — v1 → v7
7 major iterations, 4 CTO/PM/PO audits, 27/27 checks PASS | Final blueprint with Linear workspace 7 iteraciones mayores, 4 auditorias CTO/PM/PO, 27/27 checks PASS | Blueprint final con Linear workspace
Final Blueprint — Linear WorkspaceBlueprint Final — Linear Workspace
Mar 10, 2026beautonomous, Team Shopilot (AUT). All project management migrated from markdown specs to Linear as single source of truth for execution tracking.Linear workspace completamente configurado — Workspace beautonomous, Team Shopilot (AUT). Todo el project management migrado de specs en markdown a Linear como fuente única de verdad para tracking de ejecución.pabesfu. Workflow automations: PR opened → In Progress, review requested → In Review, PR merged → Done. Branch naming: AUT-XX-description.Integración GitHub — Webhook a nivel organización en pabesfu. Automatizaciones de workflow: PR abierto → In Progress, review solicitado → In Review, PR mergeado → Done. Naming de branches: AUT-XX-description.#engineering channel for real-time notifications on issue updates, PR activity, and gate completions.Integración Slack — Conectado al canal #engineering para notificaciones en tiempo real de actualizaciones de issues, actividad de PRs, y completación de gates.Fresh Blueprint BuildReconstruccion Fresca del Blueprint
Mar 4, 2026core-*). Intelligence layer details the single-repo deployment (Lambda Node.js 18 TS + API Gateway v2). Eliminated projects shown inline in their respective layers.Sección 5 — Arquitectura — reescrita completamente. De 8 capas a 7 capas: Producto, Inteligencia, Conocimiento, Acción, Plataforma, Calidad, Interno. Cada capa ahora tiene una descripción de su rol. Cada proyecto incluye su nombre de repo (core-*). Capa de Inteligencia detalla el despliegue single-repo (Lambda Node.js 18 TS + API Gateway v2). Proyectos eliminados mostrados inline en sus capas respectivas.core-knowledge-*, core-action-*, core-quality-*, core-internal-*.Sección 6 — Mapa de Implementación de Proyectos — trasladada de v5 con nombres de repositorio corregidos alineados a la convención de nombres por capa (11 repos). Conteo de proyectos actualizado de 19 a 17 activos. Los nombres de repositorio ahora coinciden con las capas de arquitectura: core-knowledge-*, core-action-*, core-quality-*, core-internal-*.Stack Introduction + Project MapIntroduccion al Stack + Mapa de Proyectos
Mar 3, 2026Deep Spec Rewrites + Project RestructureReescrituras Deep Spec + Reestructura de Proyectos
Mar 2, 2026Deep Spec Rewrites — 13 projectsReescrituras Deep Spec — 13 proyectos
New Projects & StructureProyectos Nuevos y Estructura
Projects Eliminated — 3Proyectos Eliminados — 3
Cross-Refs & EnhancementsCross-Refs y Mejoras
Quality Assurance — 17 fixes across 3 audit roundsAseguramiento de Calidad — 17 fixes en 3 rondas de auditoria
Deep Specs + CTO/PM/PO AuditSpecs Profundas + Auditoria CTO/PM/PO
27/27 PASS Feb 27-28, 2026Freemium Pricing ModelModelo de Precio Freemium
Feb 27, 2026CTO Technical ReviewRevision Técnica del CTO
— Mateo Quintero
Feb 27, 2026Initial MVP BlueprintBlueprint MVP Inicial
Feb 26, 2026ContentsContenido
▸ Glossary — Key Terms Glosario — Terminos Clave 20 terms
Reasoning + Acting. The agent thinks step-by-step, decides which tool to use, observes the result, and repeats until the task is done. Core AI architecture pattern.Razonamiento + Accion. El agente piensa paso a paso, decide que herramienta usar, observa el resultado, y repite hasta completar la tarea. Patron core de arquitectura de IA.
Anthropic API feature that lets the LLM call functions (tools) with structured parameters. How the AI agent executes marketplace operations.Feature de la API de Anthropic que permite al LLM llamar funciones (herramientas) con parametros estructurados. Como el agente de IA ejecuta operaciones de marketplace.
Hidden instructions given to the AI before every conversation. Defines identity, guardrails, user profile, and execution capabilities. Assembled from 3 layers (L1 base + L2 session + L3 execution) by ISystemPromptComposer (#4). Marketplace terminology lives in the KB, not here.Instrucciones ocultas dadas a la IA antes de cada conversacion. Define identidad, guardrails, perfil de usuario y capacidades de ejecucion. Ensamblado desde 3 capas (L1 base + L2 sesion + L3 ejecucion) por ISystemPromptComposer (#4). La terminologia de marketplace vive en la KB, no aqui.
The total amount of text (in tokens) the LLM can process in a single interaction. Larger windows = more context, but higher cost. Shopilot targets ~200K tokens.La cantidad total de texto (en tokens) que el LLM puede procesar en una sola interaccion. Ventanas mas grandes = mas contexto, pero mayor costo. Shopilot apunta a ~200K tokens.
Anthropic optimization that caches the static part of the system prompt across calls, reducing cost by ~90% for repeated prefixes. Enables affordable always-on AI.Optimizacion de Anthropic que cachea la parte estatica del system prompt entre llamadas, reduciendo el costo ~90% para prefijos repetidos. Habilita IA siempre-activa a costo accesible.
Electron API component that renders web content (like a browser tab). Shopilot uses it to display real marketplace websites (MeLi, Amazon, Shopify) inside the native app.Componente de la API de Electron que renderiza contenido web (como una pestana de navegador). Shopilot lo usa para mostrar sitios reales de marketplaces dentro de la app nativa.
Safety rules appended to every AI interaction: never execute without user confirmation, never fabricate data, never give financial/legal advice. Non-negotiable constraints.Reglas de seguridad agregadas a cada interaccion de IA: nunca ejecutar sin confirmacion del usuario, nunca fabricar datos, nunca dar consejos financieros/legales. Restricciones no negociables.
Industry-standard protocol for secure API access. Sellers authorize Shopilot to access their marketplace accounts without sharing passwords. Tokens rotate automatically.Protocolo estandar de la industria para acceso seguro a APIs. Los vendedores autorizan a Shopilot para acceder a sus cuentas de marketplace sin compartir contrasenas. Los tokens rotan automaticamente.
Directed Acyclic Graph. A pipeline of data processing steps that run in order. Shopilot uses DAGs (via Airflow) to sync marketplace data every 6 hours.Grafo Aciclico Dirigido. Un pipeline de pasos de procesamiento de datos que se ejecutan en orden. Shopilot usa DAGs (via Airflow) para sincronizar datos de marketplace cada 6 horas.
Finding information by meaning, not keywords. Text is converted to numerical vectors; similar meanings cluster together. Powers the Cerebro knowledge base.Buscar informacion por significado, no por palabras clave. El texto se convierte en vectores numericos; significados similares se agrupan. Potencia la base de conocimiento Cerebro.
Retrieval-Augmented Generation. Before the AI responds, it retrieves relevant knowledge chunks from the database to ground its answer in real data, not hallucinations.Generacion Aumentada por Recuperacion. Antes de que la IA responda, recupera fragmentos de conocimiento relevantes de la base de datos para fundamentar su respuesta en datos reales, no alucinaciones.
Real-time communication channel between the desktop app and the server. Enables streaming AI responses (word by word) and live confirmation dialogs.Canal de comunicacion en tiempo real entre la app de escritorio y el servidor. Permite respuestas de IA en streaming (palabra por palabra) y dialogos de confirmacion en vivo.
Tools (36) are primitive API operations (get_product, update_price, search_competitors) that the LLM composes dynamically at runtime via Anthropic tool_use. Managed by Tool Registry (#3) with IToolExecutor as single execution port. No separate "Skills" layer — the agent reasons directly over primitive tools + KB context.Tools (36) son operaciones primitivas de API (get_product, update_price, search_competitors) que el LLM compone dinamicamente en runtime via Anthropic tool_use. Gestionados por Tool Registry (#3) con IToolExecutor como unico puerto de ejecucion. Sin capa separada de "Skills" — el agente razona directamente sobre tools primitivas + contexto KB.
Internal currency for AI usage. Each skill costs credits based on complexity (1-25cr per interaction). Free plan: 50cr/mo. Pro plan: 500cr/mo + purchasable Credit Packs.Moneda interna para uso de IA. Cada skill cuesta creditos segun complejidad (1-25cr por interaccion). Plan Free: 50cr/mes. Plan Pro: 500cr/mes + Credit Packs comprables.
Data lake pattern: Bronze (raw API responses) → Silver (cleaned, normalized) → Gold (aggregated, query-ready). Ensures data quality improves at each layer.Patron de data lake: Bronze (respuestas crudas de API) → Silver (limpiadas, normalizadas) → Gold (agregadas, listas para consulta). Asegura que la calidad de datos mejora en cada capa.
Controls how many API calls Shopilot makes per second to each marketplace. Prevents getting blocked by MeLi/Amazon/Shopify for excessive requests.Controla cuantas llamadas API hace Shopilot por segundo a cada marketplace. Previene ser bloqueado por MeLi/Amazon/Shopify por exceso de solicitudes.
Inter-Process Communication. How the Electron main process, renderer (UI), and preload scripts communicate securely via contextBridge.Comunicacion Entre Procesos. Como el proceso principal de Electron, el renderer (UI), y los preload scripts se comunican de forma segura via contextBridge.
Time-To-Live. How long cached data stays valid before being refreshed. Product cache: 5min. Data sync: 6h. Trace retention: 90 days.Tiempo de Vida. Cuanto tiempo los datos cacheados se mantienen validos antes de ser refrescados. Cache de productos: 5min. Data sync: 6h. Retencion de trazas: 90 dias.
Software design pattern where each marketplace (MeLi, Amazon, Shopify) is a pluggable module implementing the same interface. Adding a marketplace = adding one class.Patron de diseno de software donde cada marketplace (MeLi, Amazon, Shopify) es un modulo pluggable que implementa la misma interfaz. Agregar un marketplace = agregar una clase.
Google Cloud service that runs containerized backend services. Auto-scales from 0 to N instances based on traffic. Shopilot's FastAPI backend runs here.Servicio de Google Cloud que ejecuta servicios backend en contenedores. Auto-escala de 0 a N instancias segun el trafico. El backend FastAPI de Shopilot corre aqui.
1. What is Shopilot Que es Shopilot
Shopilot is a native desktop app that functions as a specialized browser for eCommerce. The seller navigates their marketplaces (Amazon, MercadoLibre, Shopify) as usual on the left side. On the right, Shopilot deploys as an intelligent sidebar — a conversational copilot with massive multi-marketplace context that doesn't just answer questions, but proactively proposes what to do next. Shopilot es una app nativa de escritorio que funciona como un navegador especializado en eCommerce. El vendedor navega sus marketplaces (Amazon, MercadoLibre, Shopify) de forma habitual en el lado izquierdo. Del lado derecho, Shopilot se despliega como un sidebar inteligente — un copilot conversacional con contexto masivo multi-marketplace que no solo responde preguntas, sino que propone proactivamente que hacer.
ConversationalConversacional
Natural language interface. Ask anything about your business. Shopilot responds with data, not opinions.Interfaz en lenguaje natural. Pregunta lo que quieras sobre tu negocio. Shopilot responde con datos, no con opiniones.
ExecutorEjecutor
Doesn't just recommend — acts. Edits products, adjusts prices, runs campaigns. With confirmation for risky actions.No solo recomienda — actua. Edita productos, ajusta precios, corre campanias. Con confirmacion para acciones riesgosas.
ProactiveProactivo
Doesn't wait for questions. Detects opportunities, flags problems, proposes actions before you ask.No espera preguntas. Detecta oportunidades, senala problemas, propone acciones antes de que preguntes.
2. How the Big Players Do It Como lo Hacen los Grandes
Shopilot borrows architectural patterns from three products that redefined developer productivity. Each teaches a different lesson. Shopilot toma patrones arquitectonicos de tres productos que redefinieron la productividad de desarrolladores. Cada uno ensena una leccion diferente.
Cursor
VS Code fork + proprietary models + native shellFork de VS Code + modelos propios + shell nativa
Architecture: Electron app wrapping a full VS Code fork. Custom IPC via gRPC + Protocol Buffers. Chromium renderer with Monaco editor. Shadow windows for LSP access. Rust modules via NAPI-RS for indexing.Arquitectura: App Electron envolviendo un fork completo de VS Code. IPC custom via gRPC + Protocol Buffers. Renderer Chromium con editor Monaco. Shadow windows para acceso LSP. Modulos Rust via NAPI-RS para indexado.
Key tech: Sparse MoE models for Tab (custom, not GPT-4). Turbopuffer vector DB. Merkle tree sync every 10min. Speculative Edits (~1000 tok/s, 13x speedup).Tech clave: Modelos MoE sparse para Tab (custom, no GPT-4). Turbopuffer vector DB. Sync Merkle tree cada 10min. Speculative Edits (~1000 tok/s, 13x speedup).
Lessons for ShopilotLecciones para Shopilot
- Native shell — Electron as browser container + sidebar. Same pattern: main content left, AI right.Shell nativa — Electron como contenedor de browser + sidebar. Mismo patron: contenido principal izquierda, IA derecha.
- Context engine — Cursor auto-collects files, tabs, errors. Shopilot auto-collects marketplace, page, product, metrics.Motor de contexto — Cursor auto-recolecta archivos, tabs, errores. Shopilot auto-recolecta marketplace, pagina, producto, metricas.
- Proactive suggestions — Tab completion is Cursor's killer feature. Shopilot's equivalent: proactive alerts and action proposals.Sugerencias proactivas — Tab completion es el killer feature de Cursor. El equivalente de Shopilot: alertas proactivas y propuestas de accion.
Claude Code
Terminal CLI + agent loop + primitive tools + security layersCLI terminal + agent loop + tools primitivas + capas de seguridad
Architecture: TypeScript/Node.js CLI. Ink (React for terminals). Async generator agent loop. 18 primitive tools. Exact string matching for edits. Subagent system (up to 10 parallel).Arquitectura: TypeScript/Node.js CLI. Ink (React para terminales). Agent loop con async generator. 18 tools primitivas. Matching exacto de strings para edits. Sistema de subagentes (hasta 10 en paralelo).
Key tech: 6-layer security with OS sandboxing. Prompt caching (92%+ hit rate). 3-layer memory. Tool result context guard (token budget per tool).Tech clave: 6 capas de seguridad con sandboxing OS. Prompt caching (92%+ hit rate). 3 capas de memoria. Context guard para resultados de tools (budget de tokens por tool).
Lessons for ShopilotLecciones para Shopilot
- ReAct agent loop — plan → execute tools → observe → decide. The heart of Shopilot's orchestration.ReAct agent loop — planear → ejecutar tools → observar → decidir. El corazon de la orquestacion de Shopilot.
- Primitive tools — Small, composable tools. Skills combine them into complex workflows.Tools primitivas — Herramientas pequenas y componibles. Los skills las combinan en flujos complejos.
- Risk taxonomy — 6-layer security maps to our read-only / reversible / irreversible model. Real money = real security.Taxonomía de riesgo — Las 6 capas de seguridad se mapean a nuestro modelo solo-lectura / reversible / irreversible. Dinero real = seguridad real.
OpenClaw
Multi-agent framework + skills system + tool policies + plugin hooksFramework multi-agente + sistema de skills + politicas de tools + hooks de plugins
Architecture: WebSocket gateway + agent routing. Tool factory with policy filtering per agent/channel. Stream function wrapping (decorator chain). Session write locks. Plugin hook lifecycle (before/after tool calls).Arquitectura: Gateway WebSocket + routing de agentes. Factory de tools con filtrado por politicas por agente/canal. Wrapping de funciones de stream (cadena de decoradores). Write locks de sesion. Lifecycle de hooks de plugins (before/after tool calls).
Lessons for ShopilotLecciones para Shopilot
- Tool Registry — OpenClaw-inspired pattern: register, filter by context, execute via single port (IToolExecutor). 36 primitive tools composed by LLM at runtime.Tool Registry — Patron inspirado en OpenClaw: registrar, filtrar por contexto, ejecutar via puerto unico (IToolExecutor). 36 tools primitivas compuestas por LLM en runtime.
- Tool policy — Filter tools by marketplace, plan, risk level. Not all users see all tools.Politica de tools — Filtrar tools por marketplace, plan, nivel de riesgo. No todos los usuarios ven todas las tools.
- Hook lifecycle — before_tool → execute → after_tool. Perfect for the feedback loop: capture what ran, what resulted.Lifecycle de hooks — before_tool → ejecutar → after_tool. Perfecto para el feedback loop: capturar que se ejecuto, que resulto.
3. What We Already Have Lo Que Ya Tenemos
Shopilot doesn't start from zero. Three production systems and one live product provide the foundation. Shopilot no arranca de cero. Tres sistemas en produccion y un producto live proveen la base.
Core Intelligence Conversation API (The Coach)
Conversational AI specialized in eCommerce. Clean Architecture + DDD. 12-step RAG pipeline. Multi-LLM support (Claude, OpenAI). Brand Health intent detection without spending tokens. Full observability with ConversationTrace. Cost tracking per execution per client. IA conversacional especializada en eCommerce. Clean Architecture + DDD. Pipeline RAG de 12 pasos. Soporte multi-LLM (Claude, OpenAI). Deteccion de intencion Brand Health sin gastar tokens. Observabilidad completa con ConversationTrace. Tracking de costo por ejecucion por cliente.
Multi-channel inputInput multi-canal
12-step orchestrationOrquestacion 12 pasos
Multi-provider LLMLLM multi-proveedor
Conversations + stateConversaciones + estado
Core Intelligence Knowledge Base (The Brain)
Structured knowledge base. Markdown + YAML front-matter in Git. 4-stage indexing pipeline (validate → chunk → embed → BigQuery). Semantic search over embeddings. Namespaces: pricing, ads, inventory, financial, organic, quality, compliance, reputation, returns, health, learning. Automated outdated document reporting. Base de conocimiento estructurada. Markdown + YAML front-matter en Git. Pipeline de indexacion de 4 etapas (validar → chunk → embed → BigQuery). Busqueda semantica sobre embeddings. Namespaces: pricing, ads, inventario, financiero, organico, calidad, compliance, reputacion, devoluciones, salud, aprendizaje. Reporte automatico de documentos desactualizados.
Multi-Marketplace Data OrchestratorOrquestador de Datos Multi-Marketplace
Apache Airflow orchestrating data ingestion from Shopify (GraphQL) and MercadoLibre (REST). Modular DAGs (Auth → Extractor → Transformer → Dispatcher). Medallion architecture on GCS (Parquet + Snappy). FastAPI Data API on Cloud Run. OAuth2 automatic token rotation. OpenMetadata for data governance. Terraform IaC. Apache Airflow orquestando ingesta de datos de Shopify (GraphQL) y MercadoLibre (REST). DAGs modulares (Auth → Extractor → Transformer → Dispatcher). Arquitectura Medallion en GCS (Parquet + Snappy). Data API FastAPI en Cloud Run. Rotacion automatica de tokens OAuth2. OpenMetadata para gobernanza de datos. Terraform IaC.
GraphQL connector
REST connector
GCS + Parquet + Hive
FastAPI + Cloud Run
Sellerfy v1
Bootstrapped SaaS with 200 paying users. Python (FastAPI) + React/Next.js + PostgreSQL/DynamoDB + AWS. Stripe billing integration. These 200 users validate market demand and become day-1 beta testers for Shopilot. SaaS bootstrapped con 200 usuarios pagando. Python (FastAPI) + React/Next.js + PostgreSQL/DynamoDB + AWS. Integracion de billing con Stripe. Estos 200 usuarios validan la demanda del mercado y se convierten en beta testers dia 1 de Shopilot.
Coach Evolution PlansPlanes de Evolucion del Coach
Iterative reasoning + tool execution. 5-layer design with confirmation flows.Razonamiento iterativo + ejecucion de tools. Diseno de 5 capas con flujos de confirmacion.
Risk taxonomy: read-only, reversible, irreversible. Pre-built for marketplace safety.Taxonomía de riesgo: solo-lectura, reversible, irreversible. Pre-disenado para seguridad en marketplaces.
Competitive intelligence (Rainforest), product audit, text generation, image generation. Code-ready specs.Inteligencia competitiva (Rainforest), auditoria de producto, generacion de texto, generacion de imagen. Specs listos para codificar.
4. What We Reuse & Why Que Reutilizamos y Por Que
#8 Observability
ConversationTrace + AgentTracking operational. Traces in PostgreSQL, cost per execution calculated by trigger, credits deducted automatically. Extend, don't rebuild.ConversationTrace + AgentTracking operacionales. Trazas en PostgreSQL, costo por ejecución calculado por trigger, créditos descontados automáticamente. Extender, no reconstruir.
#9 Cerebro KB
2,875 documents indexed, Go + Vertex AI 004 + BigQuery vectors pipeline. Semantic RAG active in production. Add updated Amazon + MeLi namespaces and marketplace trends. The indexing pipeline transfers as-is.2.875 documentos indexados, pipeline Go + Vertex AI 004 + vectores BigQuery. RAG semántico activo en producción. Agregar namespaces Amazon + MeLi actualizados y tendencias de marketplace. El pipeline de indexación se transfiere tal cual.
ILLMClient + Factory
Already supports Claude, OpenAI, OpenRouter. Factory selects provider per agent/config at runtime. Works as-is — lives inside #2, not a separate project.Ya soporta Claude, OpenAI, OpenRouter. Factory selecciona proveedor por agente y config en runtime. Funciona tal cual — vive dentro de #2, no es un proyecto separado.
#10 Data Sync
Batch Airflow + GCS pipeline persists for daily seller data ingestion. Add TypeScript adapters for real-time sync via EventBridge. Brand Health Index operational. Existing DAGs unchanged.Pipeline batch Airflow + GCS persiste para ingesta diaria de datos del vendedor. Añadir adapters TypeScript para sincronización real-time vía EventBridge. Brand Health Index operacional. Los DAGs existentes no se modifican.
#2 Orchestrator
From one-shot 12-step pipeline to autonomous ReAct agent. Same Lambda, same repo — the existing flow is extended, not replaced. Absorbs system prompt composition via ISystemPromptComposer.De pipeline one-shot de 12 pasos a agente autónomo ReAct. Mismo Lambda, mismo repo — el flujo existente se extiende, no se reemplaza. Absorbe la composición del system prompt vía ISystemPromptComposer.
#5 Context Aggregator
Context assembly logic already exists in RagOrchestrator. Refactor into IContextAssembler (KB + Brand Health RAG) + IContextWindowManager (dynamic token budget over 200K context window). Extract and formalize, don't rebuild.La lógica de ensamblado de contexto ya existe en RagOrchestrator. Refactorizar en IContextAssembler (KB + Brand Health RAG) + IContextWindowManager (presupuesto dinámico de tokens sobre 200K context window). Extraer y formalizar, no reconstruir.
#14 DevOps
CDK TypeScript (AWS) exists and deploys conversation-api today. Extend with unified multi-repo stack + Terraform for GCP (BigQuery, Vertex AI). Base infrastructure unchanged.CDK TypeScript (AWS) existe y despliega conversation-api hoy. Extender con stack unificado multi-repo + Terraform para GCP (BigQuery, Vertex AI). La infraestructura base no cambia.
#12 Marketplace Provider
core-action-marketplace-provider. Brand new service. TypeScript with IMarketplaceAdapter (MeLi REST + Amazon SP-API) + ITokenManager OAuth2 (absorbs #10 Auth Vault). Executes real marketplace actions: product mutations, price, stock, buyer communication, campaigns. The WRITE tools backend for the Coach.core-action-marketplace-provider. Servicio nuevo desde cero. TypeScript con IMarketplaceAdapter (MeLi REST + Amazon SP-API) + ITokenManager OAuth2 (absorbe #10 Auth Vault). Ejecuta acciones reales en el marketplace: mutaciones de producto, precio, stock, comunicación con compradores, campañas. Es el backend de las tools WRITE del Coach.
#13 Billing (absorbs #14)#13 Billing (absorbe #14)
core-platform-billing. PostgreSQL credit triggers and clients schema are the only starting point (schema, not code). Everything else — ICreditsGate, HttpCreditGate, POST /internal/gate, Stripe Checkout, webhooks, Free/Pro state machine, Credit Packs — built from scratch. Absorbs #14.core-platform-billing. Los triggers PostgreSQL de créditos y el schema clients son el único punto de partida (schema, no código). Todo lo demás — ICreditsGate, HttpCreditGate, POST /internal/gate, Stripe Checkout, webhooks, máquina de estados Free/Pro, Credit Packs — se construye desde cero. Absorbe #14.
#15 Feedback Loop
core-quality-feedback. Completely new repo and service. IFeedbackService + before/after impact measurement for WRITE actions (visits, sales, conversion) + 3 feedback sources. No reusable code from previous design.core-quality-feedback. Repo y servicio completamente nuevos. IFeedbackService + medición de impacto antes/después de acciones WRITE (visitas, ventas, conversión) + 3 fuentes de feedback. Sin código reutilizable del diseño anterior.
#3 Tool Registry
36 primitive tools (READ + ANALYSIS + WRITE + SYSTEM) via Anthropic tool_use. ToolPolicyFilter (plan + marketplace + risk level) + HookLifecycle (before_tool → execute → after_tool). Not a fixed catalog — the autonomous agent reasons which tool to use.36 tools primitivas (READ + ANALYSIS + WRITE + SYSTEM) vía tool_use de Anthropic. ToolPolicyFilter (plan + marketplace + nivel de riesgo) + HookLifecycle (before_tool → execute → after_tool). No es un catálogo fijo — el agente autónomo razona qué herramienta usar.
#4 Personality Engine
ISystemPromptComposer. 3 layers: L1 base identity (~1,200 tokens, cached between sessions), L2 session (UserProfile + critical alerts, ~400 tokens), L3 WRITE guardrails (~200 tokens, active only when WRITE tools are in play). Hard cap 1,200 tokens total. ~750-950 tokens in typical use.ISystemPromptComposer. 3 capas: L1 identidad base (~1.200 tokens, cached entre sesiones), L2 sesión (UserProfile + alertas críticas, ~400 tokens), L3 guardrails WRITE (~200 tokens, solo activa cuando hay tools WRITE en juego). Hard cap 1.200 tokens totales. ~750-950 tokens en uso típico.
#6 Proactive Suggestions
IProactiveSuggestionService. Lightweight LLM inference in the after_tool hook of HookLifecycle. Structured output: { hasSuggestion, message, suggestionType, priority, productId }. Max 2 suggestions per turn, question tone. Pro only. Cross-session deduplication via UserProfile (7-day window per suggestion type + product).IProactiveSuggestionService. Inferencia LLM ligera en el after_tool hook del HookLifecycle. Output estructurado: { hasSuggestion, message, suggestionType, priority, productId }. Máximo 2 sugerencias por turno, tono de pregunta. Solo Pro. Deduplicación cross-session via UserProfile (ventana de 7 días por tipo de sugerencia + producto).
#1 Native Shell
No existing code. Electron + WebContentsView for marketplace navigation + React sidebar with 5 views (Chat, Profile, Billing, Enrollment, Onboarding). Bidirectional WebSocket protocol with the Orchestrator. The biggest new piece of the stack.No hay código existente. Electron + WebContentsView para navegación en marketplace + sidebar React con 5 vistas (Chat, Perfil, Billing, Enrollment, Onboarding). Protocolo WebSocket bidireccional con el Orchestrator. La pieza nueva más grande del stack.
#16 Eval Framework
core-quality-stack-evaluation. 7 pipelines: LLM Judge (response quality), inter-project contracts, KB quality, E2E, desktop_build, figma_quality, api_monitor. Quality gate in CI/CD — blocks deploys that degrade quality. Does not run in product runtime.core-quality-stack-evaluation. 7 pipelines: LLM Judge (calidad de respuestas), contratos entre proyectos, calidad KB, E2E, desktop_build, figma_quality, api_monitor. Quality gate en CI/CD — bloquea deploys que degradan la calidad. No corre en runtime del producto.
#7 Guardrails
InputGuard (pre-LLM: prompt injection detection + off-scope requests) + OutputGuard (post-LLM: data leak prevention + dangerous content). Independent layer from Orchestrator — can evolve without touching the ReAct loop.InputGuard (pre-LLM: detección de prompt injection + requests off-scope) + OutputGuard (post-LLM: prevención de data leak + contenido peligroso). Capa independiente del Orchestrator — puede evolucionar sin tocar el loop ReAct.
#11 Enrichment
core-knowledge-enrichment. Market Intelligence (competitors, pricing, keywords, fee estimation) + Content Analysis (technical image and video analysis for marketplace). New service — no base code.core-knowledge-enrichment. Market Intelligence (competidores, precios, keywords, estimación de fees) + Content Analysis (análisis técnico de imagen y video para marketplace). Nuevo servicio — no hay código de base.
#17 Beautonomous
core-internal-team-workflow. Internal operational agent. OpenClaw UI (main interface) + Slack (notifications & approvals) + quality base structure per repo. Includes: bootstrap templates (CLAUDE.md, .claudeignore, settings.json, MEMORY.md, specs/, skills/), structure verification shell script (step 0 quality gate), quality-gate.yml (GitHub Action + Claude Code API script), OpenClaw system prompt with 3 governance roles, and quality agent prompt. Not just config — has quality infrastructure code written once and replicated across 11 repos.core-internal-team-workflow. Agente operativo interno del equipo. OpenClaw UI (interfaz principal) + Slack (notificaciones y aprobaciones) + estructura base de calidad en cada repo del stack. Incluye: templates de bootstrap (CLAUDE.md, .claudeignore, settings.json, MEMORY.md, specs/, skills/), shell script de verificación de estructura (paso 0 del quality gate), quality-gate.yml (GitHub Action + script Claude Code API), system prompt de OpenClaw con 3 roles de gobernanza, y prompt del agente de calidad. No es solo configuración — tiene código de infraestructura de calidad que se escribe una vez y se replica en los 11 repos.
Eliminated Projects (absorbed)Proyectos Eliminados (absorbidos)
OAuth token management is the responsibility of the same service that executes actionsLa gestión de tokens OAuth es responsabilidad del mismo servicio que ejecuta las acciones
Billing and Token Economy had circular dependencies — unified, single source of truthBilling y Token Economy tenían dependencias circulares — unificados, la fuente de verdad es una sola
SummaryResumen
2
REUSEREUTILIZAR
#8, #9
4
ADAPTADAPTAR
#10, #2, #5, #14
11
NEWNUEVO
#17, #12, #13, #3, #15, #4, #6, #1, #16, #7, #11
19 active projects total19 proyectos activos en total
5. Architecture Arquitectura
7-layer stack — 19 active projects mapped to a production-ready architecture. Stack de 7 capas — 19 proyectos activos mapeados a una arquitectura lista para producción.
LAYER 1 — PRODUCT CAPA 1 — PRODUCTO
What the seller installs and sees.Lo que el vendedor instala y ve.
core-product-desktop-client) — Electron + WebContentsView for marketplace nav + React sidebar with 5 views (Chat, Profile, Billing, Enrollment, Onboarding). Bidirectional WebSocket with the Orchestrator. The biggest new piece of the stack.
#1 Shell Nativo (core-product-desktop-client) — Electron + WebContentsView para navegación en marketplace + sidebar React con 5 vistas (Chat, Perfil, Billing, Enrollment, Onboarding). WebSocket bidireccional con el Orchestrator. La pieza nueva más grande del stack.
core-product-design-system) — specs and context repository (no executable code) bridging Figma design with React implementation. 44 components (Atomic Design), 9 brand decisions (D1-D9), design tokens (3 Figma variable collections: Primitives → Semantic → Component), UX Writing guide, 8 data viz patterns, AI-native interaction patterns. Claude reads Figma via MCP to implement components in #1. UX/UI team (executes Figma T0.BB–T4.BB) + Pablo (approves) + Sergio (consumes → React Mockups).
#18 Design System (core-product-design-system) — repositorio de specs y contexto (sin código ejecutable) que conecta diseño Figma con implementación React. 44 componentes (Atomic Design), 9 decisiones de marca (D1-D9), design tokens (3 colecciones de variables Figma: Primitives → Semantic → Component), guía de UX Writing, 8 patrones de data viz, patrones de interacción AI-native. Claude lee Figma via MCP para implementar componentes en #1. Equipo UX/UI (ejecuta Figma T0.BB–T4.BB) + Pablo (aprueba) + Sergio (consume → React Mockups).
LAYER 2 — INTELLIGENCE CAPA 2 — INTELIGENCIA
The Coach lives here. Single repo (core-intelligence-conversation-api) deployed as Lambda Node.js 18 TypeScript behind AWS API Gateway v2. Middleware: Memberstack JWT auth + Zod validation + rate limiting.El Coach vive aquí. Un solo repo (core-intelligence-conversation-api) desplegado como Lambda Node.js 18 TypeScript detrás de AWS API Gateway v2. Middleware: Memberstack JWT auth + Zod validation + rate limiting.
LAYER 3 — KNOWLEDGE CAPA 3 — CONOCIMIENTO
What the Coach knows.Lo que el Coach sabe.
core-knowledge-semantic-base) — 2,875 docs (Markdown + Git) + Go pipeline + Vertex AI 004 + BigQuery vectors. Semantic RAG active. Add updated Amazon + MeLi namespaces and marketplace trends. Pipeline transfers as-is.
#9 Cerebro KB (core-knowledge-semantic-base) — 2.875 docs (Markdown + Git) + pipeline Go + Vertex AI 004 + vectores BigQuery. RAG semántico activo. Agregar namespaces Amazon + MeLi actualizados y tendencias de marketplace. Pipeline se transfiere tal cual.
core-knowledge-data-synchronizator) — batch Airflow + GCS pipeline persists for daily ingestion. Add TypeScript adapters for real-time sync via EventBridge. Brand Health Index operational. Existing DAGs unchanged.
#10 Data Sync (core-knowledge-data-synchronizator) — pipeline batch Airflow + GCS persiste para ingesta diaria. Añadir adapters TypeScript real-time vía EventBridge. Brand Health Index operacional. Los DAGs existentes no se modifican.
core-knowledge-enrichment) — Market Intelligence (competitors, pricing, keywords, category fee estimation) + Content Analysis (technical image & video analysis for marketplace). New service — no base code.
#11 Enrichment (core-knowledge-enrichment) — Market Intelligence (competidores, precios, keywords, estimación de fees por categoría) + Content Analysis (análisis técnico de imágenes y video para marketplace). Nuevo servicio — no hay código de base.
LAYER 4 — ACTION CAPA 4 — ACCIÓN
What the Coach can do in the marketplace.Lo que el Coach puede hacer en el marketplace.
core-action-marketplace-provider) — brand new service. TypeScript + IMarketplaceAdapter (MeLi REST + Amazon SP-API) + ITokenManager OAuth2 (absorbs #10). Executes real marketplace actions: product mutations, price, stock, buyer communication, campaigns. WRITE tools backend for the Coach.
#12 Marketplace Provider (core-action-marketplace-provider) — servicio nuevo desde cero. TypeScript + IMarketplaceAdapter (MeLi REST + Amazon SP-API) + ITokenManager OAuth2 (absorbe #10). Ejecuta acciones reales en el marketplace: mutaciones de producto, precio, stock, comunicación con compradores, campañas. Backend de las tools WRITE del Coach.
LAYER 5 — PLATFORM CAPA 5 — PLATAFORMA
What sustains the business and infrastructure.Lo que sostiene el negocio e infraestructura.
core-platform-billing, absorbs #14) — new service from scratch. PostgreSQL schema + credit triggers extended to separate plan vs packs. ICreditsGate + HttpCreditGate + POST /internal/gate + Stripe Checkout + idempotent webhooks + Free/Pro state machine + Credit Packs + LLM prompt caching.
#13 Billing (core-platform-billing, absorbe #14) — nuevo servicio desde cero. Schema PostgreSQL + triggers de créditos extendidos para separar plan vs packs. ICreditsGate + HttpCreditGate + POST /internal/gate + Stripe Checkout + webhooks idempotentes + máquina de estados Free/Pro + Credit Packs + prompt caching LLM.
core-platform-infrastructure) — CDK TypeScript (AWS) exists and deploys conversation-api today. Extend with unified multi-repo stack + Terraform (GCP) for BigQuery and Vertex AI. 3 environments: dev / staging / prod.
#14 DevOps (core-platform-infrastructure) — CDK TypeScript (AWS) existe y despliega conversation-api hoy. Extender con stack unificado multi-repo + Terraform (GCP) para BigQuery y Vertex AI. 3 ambientes: dev / staging / prod.
core-platform-gtm-analytics) — go-to-market strategy (positioning, channels, early adopter acquisition, onboarding funnels) + analytics infrastructure for product usage, retention, conversion, and growth metrics. Owned by Pablo + external GTM team — no internal engineering tasks in MVP sprints.
#19 Go to Market & Analytics (core-platform-gtm-analytics) — estrategia de salida al mercado (posicionamiento, canales, adquisición de early adopters, funnels de onboarding) + infraestructura de analytics para uso del producto, retención, conversión y métricas de crecimiento. A cargo de Pablo + equipo externo de GTM — sin tareas de ingeniería interna en los sprints del MVP.
core-platform-billing
core-platform-billing
LAYER 6 — QUALITY CAPA 6 — CALIDAD
What measures if the Coach works well and learns from its actions.Lo que mide si el Coach funciona bien y aprende de sus acciones.
core-quality-feedback) — new service from scratch. IFeedbackService + WRITE action impact measurement (before/after metrics at 7 days: visits, sales, conversion) + 3 feedback sources. Separate repo.
#15 Feedback Loop (core-quality-feedback) — nuevo servicio desde cero. IFeedbackService + medición de impacto de acciones WRITE (métricas antes/después a 7 días: visitas, ventas, conversión) + 3 fuentes de feedback. Repo separado.
core-quality-stack-evaluation) — 7 pipelines: LLM Judge (response quality), inter-project contracts, KB quality, E2E, desktop builds (macOS+Windows, 11 checks), Figma quality (15 checks via REST API), API monitor. Quality gate in CI/CD — blocks deploys that degrade quality, validates builds are distributable, and audits Figma for MCP compatibility. Does not run in product runtime.
#16 Eval Suite (core-quality-stack-evaluation) — 7 pipelines: LLM Judge (calidad de respuestas), contratos entre proyectos, calidad KB, E2E, builds de escritorio (macOS+Windows, 11 checks), calidad Figma (15 checks via API REST), monitor de APIs. Quality gate en CI/CD — bloquea deploys que degradan la calidad, valida que los builds son distribuibles, y audita el Figma para compatibilidad MCP. No corre en runtime del producto.
LAYER 7 — INTERNAL CAPA 7 — INTERNO
How the team works.Cómo trabaja el equipo.
core-internal-team-workflow) — internal operational agent. OpenClaw UI + Slack + quality base structure per repo. Bootstrap templates, quality gate shell script (step 0), quality-gate.yml (GitHub Action + Claude Code API), system prompt with 3 governance roles, quality agent prompt. Not just config — has quality infrastructure code.
#17 Beautonomous (core-internal-team-workflow) — agente operativo interno del equipo. OpenClaw UI + Slack + estructura base de calidad por repo. Templates de bootstrap, shell script del quality gate (paso 0), quality-gate.yml (GitHub Action + Claude Code API), system prompt con 3 roles de gobernanza, prompt del agente de calidad. No es solo configuración — tiene código de infraestructura de calidad.
6. Project Implementation Map Mapa de Implementacion de Proyectos
Cross-cutting view of the 19 active projects — how they connect, what depends on what, and which repositories group them. Vista transversal de los 19 proyectos activos — como se conectan, que depende de que, y que repositorios los agrupan.
Project FamiliesFamilias de Proyectos
| # | FamilyFamilia | ProjectsProyectos |
|---|---|---|
| 1 | What the user installsLo que el usuario instala | #1 Native Shell, #18 Design System |
| 2 | The Coach itselfEl Coach en si | #2, #3, #4, #5, #6, #7, #8 (conversation-api) |
| 3 | What the Coach knowsLo que el Coach sabe | #9 KB, #10 Data Sync, #11 Enrichment |
| 4 | What the Coach can doLo que el Coach puede hacer | #12 Marketplace Provider |
| 5 | How it gets paidComo se paga | #13 Billing & Credit Economy, #19 GTM & Analytics |
| 6 | Where it runsDonde corre | #14 DevOps |
| 7 | Learning & qualityAprendizaje y calidad | #15 Feedback Loop, #16 Eval Suite |
| 8 | How we workComo trabajamos | #17 Beautonomous |
Repository OrganizationOrganizacion por Repositorio
| RepositoryRepositorio | ProjectsProyectos |
|---|---|
core-intelligence-conversation-api |
#8, #2, #3, #4, #5, #6, #7 |
core-knowledge-semantic-base |
#9 |
core-knowledge-data-synchronizator |
#10 |
core-action-marketplace-provider |
#12 |
core-knowledge-enrichment |
#11 |
core-quality-feedback |
#15 |
core-product-desktop-client |
#1 |
core-platform-billing |
#13 |
core-platform-infrastructure |
#14 |
core-quality-stack-evaluation |
#16 |
core-internal-team-workflow |
#17 |
shopilot-design-system |
#18 |
core-platform-gtm-analytics |
#19 |
Connection DiagramDiagrama de Conexiones
USUARIO
|
v
[#1 Native Shell] (Electron app) ← [#18 Design System] (tokens + components)
|
v
[#7 Guardrails] InputGuard (pre-LLM)
|
v
[#2 Orchestrator] + [#4 Personality] + [#5 Context Agg]
|
+---> [#3 Tool Registry]
| |
| +---> [#12 Marketplace Provider] (includes TokenManager, absorbs #10)
| +---> [#11 Enrichment Layer]
| +---> [#10 Data Sync]
| +---> [#6 Proactive Suggestions]
|
+---> [#9 Cerebro KB]
|
v
[#7 Guardrails] OutputGuard (post-LLM)
|
v
RESPUESTA AL USUARIO
--- Platform ---
[#13 Billing & Credit Economy] (metering + payments)
[#14 DevOps] (infrastructure)
[#8 Observability] (tracing + logging)
[#15 Feedback Loop] (impact measurement)
[#16 Eval Suite] (CI/CD quality gate)
[#17 Beautonomous] (team operations)
[#19 GTM & Analytics] (launch strategy + usage tracking)
Who Is Responsible for WhatQuien es Responsable de Que
| CapabilityCapacidad | Project(s)Proyecto(s) |
|---|---|
| Render marketplace pageRenderizar pagina de marketplace | #1 Native Shell |
| Visual design system & brand tokensSistema de diseño visual y tokens de marca | #18 Design System |
| Reason & decide next actionRazonar y decidir siguiente accion | #2 Orchestrator |
| Execute marketplace writesEjecutar escrituras en marketplace | #12 Marketplace Provider |
| Fetch external market dataObtener datos externos de mercado | #11 Enrichment Layer |
| Analyze images & videoAnalizar imagenes y video | #11 Enrichment Layer |
| Sync seller dataSincronizar datos del vendedor | #10 Data Sync |
| Inject relevant contextInyectar contexto relevante | #5 Context Aggregator + #9 Cerebro KB |
| Compose system promptComponer system prompt | #4 Personality Engine |
| Route & enforce tool policiesEnrutar y aplicar politicas de tools | #3 Tool Registry |
| Validate input & output safetyValidar seguridad de entrada y salida | #7 Guardrails |
| Track usage & billingRastrear uso y facturacion | #13 Billing & Credit Economy |
| Measure quality (CI/CD)Medir calidad (CI/CD) | #16 Eval Suite |
| Collect feedback & learnRecolectar feedback y aprender | #15 Feedback Loop |
| Go-to-market strategy & analyticsEstrategia de salida al mercado & analytics | #19 Go to Market & Analytics |
Critical DependenciesDependencias Criticas
7. Beautonomous — Internal Operations Agent Beautonomous — Agente Operativo Interno
core-internal-team-workflow (#17). OpenClaw UI as primary interface + Slack for notifications and approvals + quality base structure in every repository. 4 engineers operating like 10–15.
core-internal-team-workflow (#17). OpenClaw UI como interfaz principal + Slack para notificaciones y aprobaciones + estructura base de calidad en cada repositorio. 4 ingenieros operando como 10–15.
7.1 — What It SolvesQué Resuelve
The problem isn’t technical capacity — it’s operational fragmentation: to know what’s happening you need to check Linear, GitHub and Slack separately; simple changes require interrupting someone; there’s no centralized place to approve changes or trigger reviews. El problema no es la capacidad técnica — es la fragmentación operativa: para saber qué está pasando hay que ir a Linear, GitHub y Slack por separado; los cambios simples requieren interrumpir a alguien; no hay un lugar centralizado para aprobar cambios o disparar reviews.
Beautonomous lives in OpenClaw UI — the team opens the core-internal-team-workflow project and works from there. Slack receives proactive notifications and pipeline approvals, which can be answered directly from a Slack message without opening OpenClaw. The terminal with Claude Code is a third path for direct technical operations on repositories.
Beautonomous vive en OpenClaw UI — el equipo abre el proyecto core-internal-team-workflow y trabaja desde ahí. Slack recibe las notificaciones proactivas y las aprobaciones del pipeline de deploy, que pueden responderse directamente desde un mensaje de Slack sin abrir OpenClaw. La terminal con Claude Code es una tercera vía para operaciones técnicas directas sobre repositorios.
7.2 — ArchitectureArquitectura
┌──────────────────────────────┐ ┌──────────────────────────────┐
│ OPENCLAW UI │ │ SLACK │
│ Primary interface │ │ Second native channel │
│ │ │ │
│ Full conversation + context │ │ Direct conversation │
│ All tools + auth by role │ │ Proactive notifications │
│ History + audit log │ │ Pipeline approvals │
└──────────────┬───────────────┘ └───────────────┬──────────────┘
│ │
└──────────────┬────────────────────┘
│
Terminal / Claude Code
(direct technical operations)
│
┌─────────────────────────────▼───────────────────────────────────┐
│ OPENCLAW — Agent Engine │
│ ReAct Loop · Governance Guard · Audit Log │
│ Auth: identifies role automatically by logged-in user │
│ Connectors: GitHub · Linear · Code · Slack │
└─────────────────────────────┬───────────────────────────────────┘
│ invokes via API / GitHub Actions
┌─────────────────────────────▼───────────────────────────────────┐
│ QUALITY BASE STRUCTURE — Per stack repository │
│ ├── CLAUDE.md repo instructions + conventions │
│ ├── .claude/memory/ persistent context │
│ └── quality-gate.yml GitHub Action: lint + tests + review │
└─────────────────────────────────────────────────────────────────┘
| Slack ChannelCanal Slack | PurposeUso |
|---|---|
| #engineering | Technical decisions, unreviewed PRs, architectureDecisiones técnicas, PRs sin revisar, arquitectura |
| #deploys | Quality gate results, workflow status, CI/CD failuresResultados del quality gate, estado de workflows, fallas en CI/CD |
| #general | Team communicationComunicación del equipo |
| #team | Daily sprint summary (9:00 AM)Resumen diario de sprint (9:00 AM) |
7.3 — Four Capabilities (v1)Cuatro Capacidades (v1)
1. View status from SlackVer status desde Slack
Any member asks in Slack and gets a synthesized response from GitHub, Linear and Slack. Daily auto-summary in #team at 9:00 AM: pending PRs, failing CI, tasks in progress per person, active blockers.Cualquier miembro pregunta en Slack y obtiene una respuesta sintetizada desde GitHub, Linear y Slack. Resumen diario automático en #team a las 9:00 AM: PRs pendientes, CI fallando, tareas en progreso por persona, bloqueos activos.
2. Create & manage tasks from SlackCrear y gestionar tareas desde Slack
Create tasks, assign them, change status and add comments in Linear — from Slack, without opening Linear.Crear tareas, asignarlas, cambiar estado y agregar comentarios en Linear — desde Slack, sin abrir Linear.
3. Approve PRsAprobar PRs
When a PR passes the quality gate, Beautonomous notifies Mateo with the summary, diff and review result. Mateo responds from OpenClaw UI or Slack DM. If the PR targets production, the same flow reaches Pablo after Mateo approves. No need to open GitHub.Cuando un PR pasa el quality gate, Beautonomous notifica a Mateo con el resumen, diff y resultado de la revisión. Mateo responde desde OpenClaw UI o Slack DM. Si el PR va a producción, el mismo flujo llega a Pablo después de que Mateo aprueba. No hay que abrir GitHub.
4. Activate quality agentActivar quality agent
The quality gate runs automatically on every PR. Can also be triggered manually from OpenClaw UI, terminal or Slack. Includes inter-repo contract validation: if a PR breaks an interface another project consumes, the gate fails with the specific reason.El quality gate corre automáticamente en cada PR. También puede activarse manualmente desde OpenClaw UI, terminal o Slack. Incluye validación de contratos entre repos: si un PR rompe una interfaz que otro proyecto consume, el gate falla con la razón específica.
7.4 — GovernanceGobernanza
The most critical component. Without it, an agent with access to all repositories and pipelines is an operational risk. El componente más crítico. Sin ella, un agente con acceso a todos los repositorios y pipelines es un riesgo operativo.
Three RolesTres Roles
| RoleRol | WhoQuién | Can doPuede hacer | CannotNo puede |
|---|---|---|---|
| El Capitán | Pablo Estrada | Full read · Linear tasks · UI changes (generates PR) · final prod approvalLectura total · tareas Linear · cambios UI (genera PR) · aprobación final a prod | Workflows · backend/infraWorkflows · backend/infra |
| El Mago | Mateo Quintero | Everything · approve PRs · any workflow · manage roles · technical sign-offTodo · aprobar PRs · cualquier workflow · gestionar roles · firma técnica | — |
| El Artesano | Andres · Sergio | Full read · propose changes via PR · staging workflows · own tasksLectura total · proponer cambios via PR · staging workflows · tareas propias | Approve PRs · infra · prodAprobar PRs · infra · prod |
Risk TaxonomyTaxonomía de Riesgo
| LevelNivel | FlowFlujo | ExamplesEjemplos |
|---|---|---|
| ReadLectura | No confirmationSin confirmación | View PRs, query tasks, read logsVer PRs, consultar tareas, leer logs |
| ReversibleReversible | Requester confirmsConfirmación del solicitante | Create task, comment PR, UI text, staging workflowCrear tarea, comentar PR, texto UI, workflow staging |
| Needs approvalRequiere aprobación | Requester + El Mago via SlackSolicitante + El Mago vía Slack | Backend logic, infra config, staging env varsLógica backend, config infra, variables staging |
| IrreversibleIrreversible | Requester + El Mago + permanent recordSolicitante + El Mago + registro permanente | Prod env vars, billing changes, delete dataVariables prod, billing, eliminar datos |
Authorization FlowFlujo de Autorización
1. Agent identifies required tools and risk level of the set.El agente identifica herramientas necesarias y nivel de riesgo del conjunto.
2. Verifies user role has permission. If not: explains and offers to escalate to El Mago.Verifica que el rol del usuario tiene permiso. Si no: explica y ofrece escalar a El Mago.
3. Shows exactly what it will do — diff for code, preview for Slack.Muestra exactamente qué va a hacer — diff para código, preview para Slack.
4. User confirms. For “needs approval”: El Mago receives Slack notification.El usuario confirma. Para “requiere aprobación”: El Mago recibe notificación en Slack.
5. Executes and records in Audit Log: timestamp, user, role, tool, params, result.Ejecuta y registra en el Audit Log: timestamp, usuario, rol, herramienta, parámetros, resultado.
PrinciplesPrincipios
1. Least privilege — each role accesses only what it needs.Mínimo privilegio — cada rol accede solo a lo que necesita.
2. Explicit confirmation — no state-modifying action runs without confirmation.Confirmación explícita — ninguna acción que modifique estado se ejecuta sin confirmación.
3. Full traceability — every action (who, when, what args, what result) is in the Audit Log.Trazabilidad completa — toda acción (quién, cuándo, qué argumentos, qué resultado) queda en el Audit Log.
4. Read/write separation — querying never requires permission; modifying always does.Separación lectura/escritura — consultar nunca requiere permiso; modificar siempre lo requiere.
5. Declared reversibility — each action declares if it has rollback or not.Reversibilidad declarada — cada acción declara si tiene rollback o no.
7.5 — Quality GateQuality Gate
Runs automatically on every PR to develop or main, and manually from Slack. 5 sequential steps — if any fails, the PR doesn’t advance.
Se activa automáticamente en cada PR hacia develop o main, y manualmente desde Slack. 5 pasos secuenciales — si cualquiera falla, el PR no avanza.
| StepPaso | ToolHerramienta | DetectsDetecta |
|---|---|---|
| 0. Base structureEstructura base | Shell script | Required files present (CLAUDE.md, .claude/*, quality-gate.yml)Archivos requeridos presentes (CLAUDE.md, .claude/*, quality-gate.yml) |
| 1. Lint + typestipos | ESLint + tsc / ruff | Syntax errors, wrong typesErrores de sintaxis, tipos incorrectos |
| 2. Tests | Jest / pytest | Broken tests, coverage below minimumTests rotos, cobertura bajo el mínimo |
| 3. Architecture review | Claude Code API | Clean Architecture violations, broken inter-repo contractsViolaciones de Clean Architecture, contratos rotos entre repos |
| 4. Convention check | Claude Code API | Naming, folder structure, repo patternsNaming, estructura de carpetas, patrones del repo |
Steps 3–4 receive full repo context: CLAUDE.md + MEMORY.md + .claude/specs/* + PR diff. Skills (e.g. clean-ddd-hexagonal, solid) are available as additional context.
Pasos 3–4 reciben contexto completo del repo: CLAUDE.md + MEMORY.md + .claude/specs/* + diff del PR. Skills (ej: clean-ddd-hexagonal, solid) disponibles como contexto adicional.
7.6 — Approval Pipeline & EnvironmentsPipeline de Aprobación y Ambientes
PR opened
│
▼
Quality Gate (auto — Claude Code)
├── FAILS → #deploys + DM to Artesano → back to Artesano. End.
│
└── PASSES → DM to Mateo in Slack
│
├── REJECTS → comment on PR + DM to Artesano → End.
│
└── APPROVES
├── target staging → auto merge
└── target prod → DM to Pablo in Slack
├── REJECTS → End.
└── APPROVES → merge → deploy prod
| EnvironmentAmbiente | Branch | ApprovalsAprobaciones |
|---|---|---|
| dev | feature/* |
Quality gate |
| staging | develop |
Quality gate + Mateo |
| prod | main |
Quality gate + Mateo + Pablo |
7.7 — Quality Base Structure Per RepoEstructura Base de Calidad por Repo
All 11 active repositories have a minimum structure. This is what turns Beautonomous from a generic agent into one that knows each project specifically. Bootstrap from core-internal-team-workflow/templates/.
Los 11 repositorios activos del stack tienen una estructura mínima. Es lo que convierte a Beautonomous de un agente genérico a uno que conoce cada proyecto específicamente. Bootstrap desde core-internal-team-workflow/templates/.
repo/
├── CLAUDE.md # Agent contract with this repo
├── .claudeignore # Files Claude should not read
└── .claude/
├── settings.json # Team permissions + hooks (committed)
├── memory/
│ └── MEMORY.md # Persistent context per repo
├── specs/
│ ├── architecture.md # Architecture decisions + boundaries
│ ├── contracts.md # Inter-repo contracts (detail)
│ └── testing.md # What to test and how
└── skills/ (symlinks) # Relevant skills for this repo
├── clean-ddd-hexagonal
├── solid
└── clean-architecture
Inter-Repo ContractsContratos entre Repos
A contract is any interface between two projects that, if changed in one, breaks the other. They live in each repo’s CLAUDE.md under ## Contratos con otros repos. The quality gate reads them as context when reviewing a PR.
Un contrato es cualquier interfaz entre dos proyectos que, si cambia en uno, rompe el otro. Viven en el CLAUDE.md de cada repo bajo ## Contratos con otros repos. El quality gate los lee como contexto al revisar un PR.
## Contratos con otros repos
### Expone (otros repos dependen de esto)
- ICreditsGate.canProceed({ userId, toolCategory }) → { allowed, reason }
Consumidor: core-intelligence-conversation-api
Rompe si: cambia la firma, cambia el significado de `allowed`
### Consume (este repo depende de esto)
- POST /internal/gate (core-platform-billing)
Rompe si: cambia el path, cambia el body schema
Skills Per RepoSkills por Repo
| RepositoryRepositorio | Skills |
|---|---|
conversation-api | clean-ddd-hexagonal · solid · clean-architecture · rag-retrieval · rag-architect · llm-app-patterns · prompt-engineering-patterns · hybrid-search · heuristic-evaluation · heuristics-and-checklists · evolutionary-metric-ranking |
semantic-base | rag-retrieval · rag-architect · hybrid-search · llm-app-patterns · solid · heuristics-and-checklists |
data-synchronizator | solid · heuristics-and-checklists |
desktop-client | solid · heuristic-evaluation · clean-architecture · heuristics-and-checklists |
infrastructure | solid · heuristics-and-checklists |
marketplace-provider | clean-ddd-hexagonal · solid · clean-architecture · heuristics-and-checklists |
billing | clean-ddd-hexagonal · solid · clean-architecture · heuristics-and-checklists |
enrichment | solid · clean-ddd-hexagonal · clean-architecture · llm-app-patterns · prompt-engineering-patterns · heuristics-and-checklists |
feedback | solid · clean-ddd-hexagonal · clean-architecture · evolutionary-metric-ranking · llm-app-patterns · heuristics-and-checklists |
stack-evaluation | solid · clean-ddd-hexagonal · llm-app-patterns · prompt-engineering-patterns · heuristics-and-checklists · evolutionary-metric-ranking |
team-workflow | solid · heuristics-and-checklists · prompt-engineering-patterns |
7.8 — ProactivityProactividad
Beautonomous does not wait to be asked to notify critical situations. Beautonomous no espera a que le pregunten para notificar situaciones críticas.
| TriggerDisparador | Automatic ActionAcción Automática |
|---|---|
| GitHub Action fails (any repo)GitHub Action falla (cualquier repo) | Message in #deploys: workflow, repo, branch, log linkMensaje en #deploys: workflow, repo, rama, link al log |
GitHub Action fails on main / prodGitHub Action falla en main / prod |
#deploys + DM to El Mago#deploys + DM a El Mago |
| PR unreviewed > 4 hoursPR sin revisar > 4 horas | Ping in #engineering with link and authorPing en #engineering con enlace y autor |
| Linear task blocked > 2 daysTarea Linear bloqueada > 2 días | Alert to El Mago with block contextAlerta a El Mago con contexto del bloqueo |
| 9:00 AM daily9:00 AM diario | #team summary: pending PRs, failing CI, tasks per person, blockersResumen en #team: PRs pendientes, CI fallando, tareas por persona, bloqueos |
7.9 — System Prompt (OpenClaw)System Prompt (OpenClaw)
# Beautonomous — Agente Operativo Interno de Shopilot
Eres el agente operativo del equipo. Tu función: dar visibilidad completa
del proyecto y ejecutar acciones en GitHub, Linear, Slack y el código.
Operas desde OpenClaw UI, Slack y terminal. El rol del usuario ya viene
determinado por OpenClaw — nunca lo asumas ni lo pidas explícitamente.
## Usuario actual
{USER_NAME} | {USER_EMAIL} | Rol: {USER_ROLE}
## Roles
El Capitán (pablo@shopilot.ai):
- Lectura total · tareas Linear · cambios UI (genera PR)
- Aprobación final de negocio para prod
- NO puede: workflows, backend/infra
El Mago (mateo@shopilot.ai):
- Acceso completo a todos los sistemas
- Aprobar PRs, cualquier workflow, gestionar roles
- Firma técnica en el pipeline de aprobación
El Artesano (andres@shopilot.ai, sergio@shopilot.ai):
- Lectura total · proponer cambios via PR
- Staging workflows · tareas propias
- Enviar mensajes a Slack (con confirmación)
## Gobernanza — NUNCA omitas estas reglas
1. Antes de cualquier escritura: muestra exactamente qué vas a hacer.
2. Para código: muestra el diff completo antes de crear el PR.
3. Para Slack: muestra la vista previa antes de publicar.
4. Si el rol no tiene permiso: explica y ofrece escalar a El Mago.
5. Acciones de alto riesgo requieren confirmación de El Mago, siempre.
6. Confirma el resultado: qué cambió, dónde, cuándo.
## Repositorios del stack
core-intelligence-conversation-api (Coach — Node.js 18 TS, Lambda)
core-knowledge-semantic-base (KB — Go + Vertex AI + BigQuery)
core-knowledge-data-synchronizator (Data Sync — Airflow Python + GCS)
core-product-desktop-client (App — Electron + React)
core-platform-infrastructure (Infra — CDK TS + Terraform GCP)
core-action-marketplace-provider (Marketplace — TypeScript)
core-platform-billing (Billing — TypeScript)
core-knowledge-enrichment (Enrichment — TypeScript)
core-quality-feedback (Feedback — TypeScript)
core-quality-stack-evaluation (Eval — TypeScript)
core-internal-team-workflow (this project — config)
## Canales Slack autorizados
#engineering · #deploys · #general · #team
7.10 — Connectors (OpenClaw Native — OAuth Only)Conectores (OpenClaw Nativos — Solo OAuth)
| ConnectorConector | ReadLectura | WriteEscritura | # |
|---|---|---|---|
| GitHub | repos, PRs, issues, workflows, logsrepos, PRs, issues, workflows, logs | issues, comments, propose PR, trigger workflowsissues, comentarios, PR propuesto, disparar workflows | 10 |
| Linear | tasks, sprints, team metricstareas, sprints, métricas de equipo | create/assign/comment tasks, change state/prioritycrear/asignar/comentar tareas, cambiar estado/prioridad | 9 |
| Code | read file, search codeleer archivo, buscar en código | low-risk changes via PR, propose logic changes via PRcambios de bajo riesgo via PR, proponer cambios via PR | 7 |
| Slack | channels, threads, searchcanales, hilos, búsqueda | messages (with confirmation), approval notificationsmensajes (con confirmación), notificaciones de aprobación | 5 |
7.11 — What It Does NOT DoQué NO Hace
7.12 — Code vs ConfigurationCódigo vs Configuración
| ComponentComponente | TypeTipo | WhereDónde |
|---|---|---|
| System prompt + roles + repos | ConfigConfiguración | OpenClaw panel |
| Slack bot integration | ConfigConfiguración | OpenClaw + Slack OAuth |
| CLAUDE.md per repopor repo | Text — team writesTexto — escribe el equipo | Root of each repoRaíz de cada repo |
| MEMORY.md per repopor repo | Text — agent + El MagoTexto — agente + El Mago | .claude/memory/ |
| quality-gate.yml + API script | YAML + code — write onceYAML + código — se escribe una vez | .github/workflows/ |
| Branch protection rules | ConfigConfiguración | GitHub Settings |
The only code to write is the script invoking Claude Code via API inside quality-gate.yml. Written once, replicated from the template in core-internal-team-workflow/templates/.
El único código que hay que escribir es el script que invoca Claude Code vía API dentro de quality-gate.yml. Se escribe una vez y se replica desde el template en core-internal-team-workflow/templates/.
7.13 — Current StateEstado Actual
| CapabilityCapacidad | StatusEstado |
|---|---|
| View status from SlackVer status desde Slack | ❌ Pending — OpenClaw + connectorsPendiente — OpenClaw + conectores |
| Create tasks from SlackCrear tareas desde Slack | ❌ Pending — Linear OAuthPendiente — Linear OAuth |
| Approve PRs from SlackAprobar PRs desde Slack | ❌ Pending — quality gate + branch protectionPendiente — quality gate + branch protection |
| Activate quality agentActivar quality agent | ❌ Pending — quality-gate.yml in 11 reposPendiente — quality-gate.yml en 11 repos |
| Proactivity (alerts + daily summary)Proactividad (alertas + resumen diario) | ❌ Pending — OpenClaw configuredPendiente — OpenClaw configurado |
| Quality base per repo (full bootstrap)Estructura base por repo (bootstrap completo) | 🔨 Partial — only in conversation-apiParcial — solo en conversation-api |
Everything starts from zero except the quality base structure in one repo. Todo parte de cero excepto la estructura base de calidad en un repo.
8. How We Will Build Shopilot — The 18 Projects Cómo Vamos a Construir Shopilot — Los 18 Proyectos
7 architecture layers, 11 repositories, 18 active projects. Each project will own a specific piece of the stack — from the desktop app the seller will install to the internal agent that will help the team build it. 36 tools across 3 services (data-synchronizator, marketplace-provider, enrichment) will give the Coach the ability to read, analyze, and act on the seller’s behalf. The intelligence layer will live in core-intelligence-conversation-api — a single Lambda hosting 7 projects (#2, #3, #4, #5, #6, #7, #8): the reasoning loop, the tools it can invoke, its personality, the context it assembles, proactive suggestions, guardrails, and observability. Knowledge will flow from three separate repos: core-knowledge-semantic-base (#9) for editorial KB, core-knowledge-data-synchronizator (#10) for the seller’s real-time data, and core-knowledge-enrichment (#11) for external market intelligence and content analysis. Actions on the marketplace will route through core-action-marketplace-provider (#12) — the only service that touches MeLi and Amazon APIs. Quality will be measured by core-quality-feedback (#15) for business impact and core-quality-stack-evaluation (#16) for CI/CD quality gates. The platform layer will handle billing via core-platform-billing (#13) and infrastructure via core-platform-infrastructure (#14). The seller will interact through core-product-desktop-client (#1) — an Electron app with the Coach in the sidebar. And the team itself will operate with core-internal-team-workflow (#17) — Beautonomous, the internal agent that helps build and govern the entire stack.
7 capas de arquitectura, 11 repositorios, 18 proyectos activos. Cada proyecto será dueño de una pieza específica del stack — desde la app de escritorio que instalará el vendedor hasta el agente interno que ayudará al equipo a construirlo. 36 tools en 3 servicios (data-synchronizator, marketplace-provider, enrichment) le darán al Coach la capacidad de leer, analizar y actuar en nombre del vendedor. La capa de inteligencia vivirá en core-intelligence-conversation-api — un solo Lambda alojando 7 proyectos (#2, #3, #4, #5, #6, #7, #8): el loop de razonamiento, las tools que puede invocar, su personalidad, el contexto que ensambla, sugerencias proactivas, guardrails y observabilidad. El conocimiento fluirá desde tres repos separados: core-knowledge-semantic-base (#9) para la KB editorial, core-knowledge-data-synchronizator (#10) para los datos del vendedor en tiempo real, y core-knowledge-enrichment (#11) para inteligencia de mercado externa y análisis de contenido. Las acciones en el marketplace pasarán por core-action-marketplace-provider (#12) — el único servicio que toca las APIs de MeLi y Amazon. La calidad se medirá con core-quality-feedback (#15) para impacto de negocio y core-quality-stack-evaluation (#16) para quality gates en CI/CD. La capa de plataforma manejará billing via core-platform-billing (#13) e infraestructura via core-platform-infrastructure (#14). El vendedor interactuará a través de core-product-desktop-client (#1) — una app Electron con el Coach en el sidebar. Y el equipo operará con core-internal-team-workflow (#17) — Beautonomous, el agente interno que ayuda a construir y gobernar todo el stack.
| # | ProjectProyecto | Layer | OwnerResponsable | Status |
|---|---|---|---|---|
| 1 | Native Shell | 1-Product1-Producto | Sergio | REWRITE |
| 2 | ReAct Orchestrator | 2-Intelligence2-Inteligencia | Mateo | ADAPT+NEW |
| 3 | Tool Registry & Policy Engine | 2-Intelligence2-Inteligencia | Mateo | NEW |
| 4 | Personality Engine | 2-Intelligence2-Inteligencia | Mateo | NEW |
| 5 | Context Aggregator | 2-Intelligence2-Inteligencia | Mateo | NEW |
| 6 | Proactive Suggestions Engine | 2-Intelligence2-Inteligencia | Mateo | NEW |
| 7 | Guardrails | 2-Intelligence2-Inteligencia | Mateo | NEW |
| 8 | Observability & Traceability | 2-Intelligence2-Inteligencia | Mateo | EXISTS ADAPT |
| 9 | Cerebro / Knowledge Base | 3-Knowledge3-Conocimiento | Mateo | EXISTS |
| 10 | Data Sync | 3-Knowledge3-Conocimiento | Andrés | ADAPT+NEW |
| 11 | Enrichment Layer | 3-Knowledge3-Conocimiento | Mateo | NEW |
| 12 | Marketplace Provider | 4-Action4-Acción | Andrés | REWRITE |
| 13 | Billing & Credit Economy | 5-Platform5-Plataforma | Sergio | REWRITE |
| 14 | DevOps (IaC) | 5-Platform5-Plataforma | Andrés | EXISTS NEW |
| 15 | Feedback Loop | 6-Quality6-Calidad | Sergio | REWRITE |
| 16 | Eval Suite | 6-Quality6-Calidad | Pablo | NEW |
| 17 | Beautonomous | 7-Internal7-Interno | Pablo | NEW |
| 18 | Design System | 1-Product1-Producto | Pablo · Sergio · ExternalPablo · Sergio · Externo | NEW |
8.1 Introduction to the Projects Introduccion a los Proyectos
The complete stack on one page. No technical references — only what each piece does and why it exists. El stack completo en una página. Sin referencias técnicas — solo qué hace cada pieza y por qué existe.
The Complete IdeaLa Idea Completa
Shopilot is a desktop browser where the seller navigates their marketplace with an AI Coach in the sidebar. The Coach knows their business, can analyze their situation, execute actions, and learn over time. Shopilot es un navegador de escritorio donde el vendedor navega en su marketplace con un Coach de IA en el sidebar. El Coach conoce su negocio, puede analizar su situación, ejecutar acciones y aprender con el tiempo.
The 19 projects are the pieces that make this possible (19 active). Los 19 proyectos son las piezas que hacen posible eso (19 activos).
The 8 Project FamiliesLas 8 Familias de Proyectos
1. What the user installs1. Lo que el usuario instala
core-product-desktop-client — the Electron browser. Without this, there is no product.core-product-desktop-client — el navegador Electron. Sin esto no hay producto.
2. The Coach itself2. El Coach en sí
core-intelligence-conversation-api — where all intelligence lives. Groups 7 internal projects: the reasoning loop (#2), the tools it can use (#3), its personality and tone (#4), the context it assembles before responding (#5), the proactive suggestions it detects (#6), the layer that protects it from malicious inputs (#7), and the record of everything it does (#8).core-intelligence-conversation-api — donde vive toda la inteligencia. Agrupa 7 proyectos internos: el loop que razona y decide (#2), las herramientas que puede usar (#3), su personalidad y tono (#4), el contexto que ensambla antes de responder (#5), las sugerencias proactivas que detecta (#6), la capa que lo protege de inputs maliciosos (#7), y el registro de todo lo que hace (#8).
3. What the Coach knows3. Lo que el Coach sabe
KB (core-knowledge-semantic-base) — editorial knowledge: guides, policies, best practices. | Data Sync (core-knowledge-data-synchronizator) — the seller's own real-time data. | Enrichment (core-knowledge-enrichment) — external market data and content analysis.
KB (core-knowledge-semantic-base) — conocimiento editorial: guías, políticas, mejores prácticas. | Data Sync (core-knowledge-data-synchronizator) — datos propios del vendedor en tiempo real. | Enrichment (core-knowledge-enrichment) — datos del mercado externo y análisis de contenido.
4. What the Coach can do4. Lo que el Coach puede hacer
Marketplace Provider (core-action-marketplace-provider) — executes real changes: publishes, responds, creates campaigns. Includes seller OAuth token management (Auth Vault, #10, absorbed — no longer a standalone project).Marketplace Provider (core-action-marketplace-provider) — ejecuta cambios reales: publica, responde, crea campañas. Incluye la gestión de tokens OAuth del vendedor (Auth Vault, #10, absorbido — ya no es proyecto independiente).
5. How it gets paid5. Cómo se paga
core-platform-billing — measures consumption and charges the seller. Merges Token Economy (#13) and Billing (#14). #14 no longer a standalone project — everything lives in #13.core-platform-billing — mide el consumo y cobra al vendedor. Fusiona Token Economy (#13) y Billing (#14). #14 ya no es proyecto independiente — todo vive en #13.
6. Where it runs6. Dónde corre
DevOps (core-platform-infrastructure) — all cloud infrastructure in one place.DevOps (core-platform-infrastructure) — toda la infraestructura cloud en un solo lugar.
7. What the Coach learns and how we measure quality7. Lo que el Coach aprende y cómo medimos calidad
Feedback Loop (core-quality-feedback) — measures whether the Coach's actions truly helped the seller. Records the impact of each change on sales, visits, and conversion. The seller sees: “your title change generated +54% visits in 7 days”. | Eval Framework (core-quality-stack-evaluation) — measures whether the Coach responds well before reaching production. Blocks changes that degrade quality.Feedback Loop (core-quality-feedback) — mide si las acciones del Coach realmente ayudaron al vendedor. Registra el impacto de cada cambio en ventas, visitas y conversión. El vendedor ve: “tu cambio de título generó +54% visitas en 7 días”. | Eval Framework (core-quality-stack-evaluation) — mide si el Coach responde bien antes de llegar a producción. Bloquea cambios que empeoran la calidad.
8. How we work8. Cómo trabajamos
Beautonomous (core-internal-team-workflow) — the team's internal Coach. We build with the same tools we make.Beautonomous (core-internal-team-workflow) — el Coach interno del equipo. Construimos con las mismas herramientas que hacemos.
The Flow in Plain LanguageEl Flujo en Palabras Simples
- 1.The seller opens Shopilot and navigates to MeLi or Amazon.El vendedor abre Shopilot y navega en MeLi o Amazon.
- 2.They ask the Coach something in the sidebar.Le pregunta algo al Coach en el sidebar.
- 3.The Coach understands the question, consults its knowledge (KB), the seller's data (Data Sync), and the market (Enrichment).El Coach entiende la pregunta, consulta su conocimiento (KB), los datos del vendedor (Data Sync) y del mercado (Enrichment).
- 4.If it needs to act, it uses the Marketplace Provider to execute the operation.Si necesita actuar, usa el Marketplace Provider para ejecutar la operación.
- 5.It responds with context, precision, and proactive suggestions if it detects opportunities.Responde con contexto, precisión y sugerencias proactivas si detecta oportunidades.
- 6.Everything is logged (Observability). If there was an action, the Feedback Loop measures its impact 7 days later and shows it to the seller.Todo queda registrado (Observability). Si hubo una acción, el Feedback Loop mide su impacto 7 días después y se lo muestra al vendedor.
- 7.The seller pays for what they consume (Billing + Token Economy).El vendedor paga por lo que consume (Billing + Token Economy).
Minimum Viable Core — What Must Exist for the First Useful ProductNúcleo Mínimo Viable — Qué Debe Existir para el Primer Producto Útil
For the Coach to respond wellPara que el Coach responda bien
#2 Orchestrator — without it there is no Coachsin él no hay Coach
#9 KB — without knowledge the Coach guessessin conocimiento el Coach adivina
#10 Data Sync — without seller data it speaks in the abstractsin datos del vendedor habla en abstracto
#8 Observability — without tracking, billing is impossiblesin tracking no hay billing posible
#13 Billing — without billing there is no business modelsin billing no hay modelo de negocio
With just these 5, the Coach is already a product.Con solo estos 5, el Coach ya es un producto.
For the Coach to actPara que el Coach actúe
Add to the 5 above:Agregar a los 5 anteriores:
#3 Tool Registry — without it the Coach can't execute any toolsin él el Coach no puede ejecutar ninguna tool
#12 Marketplace Provider — without it WRITE tools have nowhere to gosin él las WRITE tools no tienen a dónde ir
With these 7, the Coach can make real changes in the marketplace.Con estos 7, el Coach puede hacer cambios reales en el marketplace.
For the complete productPara el producto completo
#1 Shell — the seller uses it while browsingel vendedor lo usa mientras navega
#11 Enrichment — the Coach sees the marketel Coach ve el mercado
#15 Feedback Loop — the Coach learns from its own actionsel Coach aprende de sus acciones
#4 #5 #6 #7 — quality, security, full experiencecalidad, seguridad, experiencia completa
#16 Eval + #14 DevOps are cross-cutting — they accompany all phases.#16 Eval + #14 DevOps son transversales — acompañan todas las fases.
Current Stack StatusEstado Actual del Stack
| ✔ OperationalOperacional | #8 Observability · #9 KB · #10 Data Sync (Brand Health + seller data) |
| 🔨 BuildingEn construcción | #2 Orchestrator (one-shot works, ReAct loop in progress) · #4 Personality · #5 Context · #13 Billing (credits) · #14 DevOps (CDK partial) |
| 📋 PendingPendiente | #3 Tool Registry · #12 Marketplace Provider · #11 Enrichment · #15 Feedback · #16 Eval · #1 Shell · #6 Proactive · #7 Guardrails · #17 Beautonomous |
The Coach already answers questions with real context (KB + Brand Health). What remains: it can act (tools, Marketplace Provider) and the user can see it (Shell).El Coach ya responde preguntas con contexto real (KB + Brand Health). Lo que falta es que pueda actuar (tools, Marketplace Provider) y que el usuario lo vea (Shell).
8.2 Stack Map Mapa del Stack
Cross-cutting view of the full stack. What each project does, who depends on whom, and how the Coach is the thread connecting everything. Vista transversal del stack completo. Qué hace cada proyecto, quién depende de quién, y cómo el Coach es el hilo conductor de todo.
A marketplace seller navigates their store from Shopilot and has an AI Coach that understands their business, can act on their behalf, and learns over time.Un vendedor de marketplace navega en su tienda desde Shopilot y tiene un Coach de IA que entiende su negocio, puede actuar en su nombre y aprende con el tiempo.
The 19 Active Projects by LayerLos 19 Proyectos Activos por Capa
Product Layer — what the user seesCapa de producto — lo que el usuario ve
core-product-desktop-client) — the desktop browser. The seller navigates MeLi/Amazon while the Coach is in the sidebar.el navegador de escritorio. El vendedor navega en MeLi/Amazon mientras el Coach está en el sidebar.Intelligence Layer — the CoachCapa de inteligencia — el Coach
core-intelligence-conversation-api) — the Coach's brain. Receives a question, reasons, decides which tools to use, responds.el cerebro del Coach. Recibe una pregunta, razona, decide qué tools usar, responde.Learning & Quality Layer — what improves the stack over timeCapa de aprendizaje y calidad — lo que mejora el stack con el tiempo
core-quality-feedback) — measures the business impact of WRITE actions: visits, sales, and conversion before and after each change. Detects which strategies work.mide el impacto de negocio de las acciones WRITE: visitas, ventas y conversión antes y después de cada cambio. Detecta qué estrategias funcionan.core-quality-stack-evaluation) — measures whether the Coach responds well. Validates contracts between projects. Blocks changes that degrade quality in CI/CD.mide si el Coach responde bien. Valida contratos entre proyectos. Bloquea cambios que empeoran la calidad en CI/CD.Data Layer — what feeds the CoachCapa de datos — lo que alimenta al Coach
Action Layer — what the Coach can doCapa de acción — lo que el Coach puede hacer
Platform Layer — what sustains everythingCapa de plataforma — lo que sostiene todo
Internal Layer — how we workCapa interna — cómo trabajamos
core-internal-team-workflow) — the team's internal Coach. Helps build and operate the stack using the same projects it manages.el Coach interno del equipo. Ayuda a construir y operar el stack usando los mismos proyectos que gestiona.Connection DiagramDiagrama de Conexiones
╔══════════════════════════════════════════════════════════════════╗
║ USUARIO ║
║ ┌──────────────────────────────────────────────────────────┐ ║
║ │ core-product-desktop-client (#1) │ ║
║ │ Navegador Electron + Sidebar Coach │ ║
║ └──────────────────────────┬───────────────────────────────┘ ║
╚═════════════════════════════│════════════════════════════════════╝
│ pregunta / respuesta
╔═════════════════════════════▼════════════════════════════════════╗
║ INTELIGENCIA (core-intelligence-conversation-api) ║
║ ║
║ ┌─────────────────────────────────────────────────────────┐ ║
║ │ Guardrails (#7) → valida input antes de procesar │ ║
║ └──────────────────────────┬──────────────────────────── ┘ ║
║ │ ║
║ ┌──────────────────────────▼─────────────────────────────┐ ║
║ │ Orchestrator (#2) │ ║
║ │ razona → decide tools → actua → observa → responde │ ║
║ └──┬──────────┬───────────┬──────────────────────────┬────┘ ║
║ │ │ │ │ ║
║ Context Personality Tool Registry (#3) Proactive ║
║ Aggregator Engine ┌─────────────────┐ Suggestions ║
║ (#5) (#4) │ READ tools ─────┤──▶ Data Sync ║
║ │ │ WRITE tools ─────┤──▶ Marketplace ║
║ │ │ ANALYSIS tools──▶┤──▶ Enrichment ║
║ │ └─────────────────┘ (#6) ║
║ │ ║
║ ┌──▼──────────────────────────────────────────────────────┐ ║
║ │ Context sources │ ║
║ │ KB (#9) + Brand Health (#10) + UserProfile │ ║
║ └─────────────────────────────────────────────────────────┘ ║
║ ║
║ Guardrails (#7) → valida output antes de responder ║
║ Observability (#8) → registra todo ║
║ WRITE tools → emiten FeedbackEntry a core-quality-feedback ║
╚══════════════════════════════════════════════════════════════════╝
│
┌──────────────────┤──────────────────┐
▼ ▼ ▼
core-knowledge- core-action- core-knowledge-
data-synchronizator marketplace- enrichment (#11)
(#10) provider (#12) Mercado externo
Datos del vendedor WRITE en + analisis visual
en tiempo real marketplace
core-quality-feedback (#15)
Mide impacto de WRITE actions (7 dias despues)
← lee metricas de core-knowledge-data-synchronizator (#10)
╔══════════════════════════════════════════════════════════════════╗
║ PLATAFORMA ║
║ core-platform-billing (#13) — Token Economy + Billing fusionados ║
║ DevOps (#14) ───────────── provisiona toda la infra cloud ║
╚══════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════╗
║ EQUIPO Y CALIDAD ║
║ core-internal-team-workflow (#17) — el Coach interno del equipo ║
║ core-quality-stack-evaluation (#16) — evalua calidad en CI/CD ║
╚══════════════════════════════════════════════════════════════════╝
Responsibility MatrixMatriz de Responsabilidades
What each piece of the stack is responsible for — one sentence per responsibility, mapped to the projects that own it.De qué es responsable cada pieza del stack — una frase por responsabilidad, mapeada a los proyectos que la poseen.
| ResponsibilityResponsabilidad | ProjectProyecto |
|---|---|
| The seller can talk to the CoachEl vendedor puede hablar con el Coach | #1 Shell + #2 Orchestrator |
| The Coach understands the seller's businessEl Coach entiende el negocio del vendedor | #9 KB + #10 Data Sync + #5 Context Aggregator |
| The Coach can act on the marketplaceEl Coach puede actuar en el marketplace | #12 Marketplace Provider |
| The Coach has market data and analysisEl Coach tiene datos de mercado y análisis | #11 Enrichment |
| The Coach speaks with the right voiceEl Coach habla con la voz correcta | #4 Personality Engine |
| The Coach detects opportunitiesEl Coach detecta oportunidades | #6 Proactive Suggestions |
| The Coach measures action impactEl Coach mide el impacto de sus acciones | #15 Feedback Loop |
| The Coach can't be manipulatedEl Coach no puede ser manipulado | #7 Guardrails |
| We know if the Coach responds wellSabemos si el Coach responde bien | #16 Eval Framework |
| We know what it does and how much it costsSabemos qué hace y cuánto cuesta | #8 Observability |
| The seller pays for the serviceEl vendedor paga por el servicio | #13 Billing |
| Everything runs in productionTodo corre en producción | #14 DevOps |
| The team works with AIEl equipo trabaja con IA | #17 Beautonomous |
| The product has a unified visual identityEl producto tiene una identidad visual unificada | #18 Design System |
Critical Dependencies — What Blocks WhatDependencias Críticas — Qué Bloquea Qué
#10 Data Sync ──────────────────────────────▶ #3 Tool Registry (READ tools) #12 Marketplace Provider ───────────────────▶ #3 Tool Registry (WRITE tools) #11 Enrichment ─────────────────────────────▶ #3 Tool Registry (ANALYSIS tools) #9 KB ─────────────────────────────────────▶ #5 Context Aggregator #2 Orchestrator ───────────────────────────▶ todos los anteriores #13 Token Economy ──────────────────────────▶ #2 Orchestrator (presupuesto de tokens) #1 Shell ──────────────────────────────────▶ #2 Orchestrator (canal de entrada) #12 Marketplace Provider ───────────────────▶ #15 Feedback Loop (senal de WRITE ejecutado) #10 Data Sync ──────────────────────────────▶ #15 Feedback Loop (metricas after a 7 dias) #15 Feedback Loop ──────────────────────────▶ #9 KB (Fase 3: FeedbackLearner actualiza KB) #8 Observability ──────────────────────────▶ #16 Eval Framework (casos de eval desde trazas)
Without Data Sync, Enrichment, and Marketplace Provider, the Orchestrator has no data and cannot act. They are the three projects that unlock the Coach's real utility.Sin Data Sync, Enrichment y Marketplace Provider, el Orchestrator no tiene datos ni puede actuar. Son los tres proyectos que desbloquean la utilidad real del Coach.
Contracts Between ProjectsContratos Entre Proyectos
The exact points where one project talks to another. Every arrow in the diagram has a contract behind it.Los puntos exactos donde un proyecto habla con otro. Cada flecha del diagrama tiene un contrato detrás.
| FromDesde | ToHacia | ContractContrato |
|---|---|---|
| #1 Shell | #2 Orchestrator | HTTP REST → WebSocket (Phase 4). User message / Coach response.HTTP REST → WebSocket (Fase 4). Mensaje usuario / respuesta Coach. |
| #2 Orchestrator | #13 Billing | ICreditsGate / POST /internal/gate — verifies credits before each tool call.ICreditsGate — verifica créditos antes de cada tool call. |
| #3 Tool Registry | #10 Data Sync | HTTP — READ tools call the Data Sync API.HTTP — tools READ llaman a la API de Data Sync. |
| #3 Tool Registry | #12 Marketplace Provider | HTTP — WRITE tools call the Provider API.HTTP — tools WRITE llaman a la API del Provider. |
| #3 Tool Registry | #11 Enrichment | HTTP — ANALYSIS tools call the Enrichment API.HTTP — tools ANALYSIS llaman a la API de Enrichment. |
| #2 Orchestrator | #15 Feedback Loop | FeedbackEntry emitted by HookLifecycle.after_tool after each successful WRITE.FeedbackEntry emitida por HookLifecycle.after_tool después de cada WRITE exitoso. |
| #15 Feedback Loop | #10 Data Sync | HTTP — queries product metrics 7 days after WRITE (before/after comparison).HTTP — consulta métricas del producto 7 días después del WRITE. |
| #8 Observability | #16 Eval Framework | Export of real traces to build automatic evaluation cases.Exportación de trazas reales para construir casos de evaluación automáticos. |
External ContractsContratos Externos
Points where our stack connects to third-party systems outside our control.Puntos donde nuestro stack se conecta con sistemas de terceros fuera de nuestro control.
| FromDesde | External SystemSistema Externo | ContractContrato |
|---|---|---|
| #13 Billing | Stripe | Checkout Sessions + webhooks (payment_succeeded, subscription events)Checkout Sessions + webhooks (payment_succeeded, eventos de suscripción) |
| #12 Marketplace Provider | MercadoLibre API | REST — product ops, questions, campaignsREST — operaciones de producto, preguntas, campañas |
| #12 Marketplace Provider | Amazon SP-API | SP-API — same operations, different adapterSP-API — mismas operaciones, adaptador diferente |
| #10 Data Sync | MeLi / Amazon | Polling or push of seller dataPolling o push de datos del vendedor |
| #2 Orchestrator | Anthropic / Vertex | LLM calls with cache_control for prompt cachingLlamadas LLM con cache_control para caché de prompts |
Completeness CheckVerificación de Completitud
Every area of the product is covered by at least one project. No orphan responsibilities.Cada área del producto está cubierta por al menos un proyecto. Sin responsabilidades huérfanas.
| AreaÁrea | Covered byCubierto por | StatusEstado |
|---|---|---|
| User interfaceInterfaz de usuario | #1 Native Shell | ✅ |
| AI reasoningRazonamiento IA | #2 Orchestrator | ✅ |
| Tool executionEjecución de herramientas | #3 Tool Registry | ✅ |
| Editorial knowledgeConocimiento editorial | #9 KB | ✅ |
| Seller dataDatos del vendedor | #10 Data Sync | ✅ |
| Market intelligenceInteligencia de mercado | #11 Enrichment | ✅ |
| Marketplace actionsAcciones en marketplace | #12 Marketplace Provider | ✅ |
| Personality & tonePersonalidad y tono | #4 Personality Engine | ✅ |
| Context assemblyEnsamblaje de contexto | #5 Context Aggregator | ✅ |
| Proactive detectionDetección proactiva | #6 Proactive Suggestions | ✅ |
| Security & guardrailsSeguridad y guardrails | #7 Guardrails | ✅ |
| Observability & loggingObservabilidad y logging | #8 Observability | ✅ |
| Impact measurementMedición de impacto | #15 Feedback Loop | ✅ |
| Quality evaluationEvaluación de calidad | #16 Eval Framework | ✅ |
| Billing & monetizationFacturación y monetización | #13 Billing | ✅ |
| InfrastructureInfraestructura | #14 DevOps | ✅ |
| Internal team operationsOperaciones internas del equipo | #17 Beautonomous | ✅ |
| Brand identity & componentsIdentidad de marca y componentes | #18 Design System | ✅ |
Layer 1 — PRODUCTCapa 1 — PRODUCTO
What the seller installs and seesLo que el vendedor instala y ve
Native Shell
Interface & UX — Sergio
The biggest new build (~35% of total effort) and the user's primary interface. An Electron desktop app that functions as a specialized eCommerce browser. Left side: WebContentsView renders marketplace websites (MeLi, Amazon, Shopify) with full functionality — sellers navigate naturally. Right side: React sidebar provides the AI copilot interface — chat with real-time WebSocket streaming, proactive suggestion cards, tool progress indicators, and confirmation dialogs for risky actions. Beyond the chat, the sidebar includes dedicated views: ProfileView (user settings, preferences, workspace config), BillingView (plan status, credits, upgrade/pack purchase via Stripe Checkout), EnrollmentView (marketplace OAuth2 connection flows), and an OnboardingWizard for first-time users that guides profile setup, first marketplace connection, and initial Coach interaction. Marketplace Detector identifies URL patterns to auto-detect which marketplace is active. Tab system supports multiple independent sessions with separate cookies. IPC Bridge handles communication between Electron main process and React renderer. The Coach backend streams events via WebSocket — text chunks, tool progress, confirmations — following a formalized bidirectional protocol. Each phase produces a deployable build that validates real backend flows — no separate testing interface needed. Dev Tools panel (toggle in dev mode) exposes traces, tokens, and latencies for internal QA. Mac-only for MVP with .dmg distribution via electron-builder. La construccion nueva mas grande (~35% del esfuerzo total) y la interfaz principal del usuario. Una app de escritorio Electron que funciona como un navegador especializado de eCommerce. Lado izquierdo: WebContentsView renderiza sitios web de marketplaces (MeLi, Amazon, Shopify) con funcionalidad completa — los vendedores navegan naturalmente. Lado derecho: sidebar React provee la interfaz del copiloto IA — chat con streaming WebSocket en tiempo real, cards de sugerencias proactivas, indicadores de progreso de herramientas, y dialogos de confirmacion para acciones riesgosas. Mas alla del chat, el sidebar incluye vistas dedicadas: ProfileView (configuracion de usuario, preferencias, workspace), BillingView (estado de plan, creditos, upgrade/compra de packs via Stripe Checkout), EnrollmentView (flujos de conexion OAuth2 con marketplaces), y un OnboardingWizard para usuarios nuevos que guia setup de perfil, primera conexion a marketplace, e interaccion inicial con el Coach. Marketplace Detector identifica patrones de URL para auto-detectar que marketplace esta activo. Sistema de tabs soporta multiples sesiones independientes con cookies separadas. IPC Bridge maneja comunicacion entre proceso principal de Electron y renderer React. El backend del Coach transmite eventos via WebSocket — text chunks, progreso de tools, confirmaciones — siguiendo un protocolo bidireccional formalizado. Cada fase produce un build desplegable que valida flujos reales del backend — no se necesita interfaz de testing separada. Panel Dev Tools (toggle en modo dev) expone traces, tokens y latencias para QA interno. Solo Mac para MVP con distribucion .dmg via electron-builder.
Beautonomous governance: confirmation dialogs in the sidebar render Core's ConfirmationFlow state machine — the Shell is the UI enforcement layer that presents PENDING actions and captures the seller's CONFIRMED/REJECTED decision. No WRITE executes without this dialog being resolved.Governance de Beautonomous: los diálogos de confirmación en el sidebar renderizan la máquina de estados del ConfirmationFlow de Core — la Shell es la capa de aplicación de UI que presenta acciones PENDING y captura la decisión CONFIRMED/REJECTED del vendedor. Ningún WRITE se ejecuta sin que este diálogo sea resuelto.
Design System governance: core-product-design-system (#18) is the single source of truth for all visual components. The Figma follows Atomic Design (atoms, molecules, organisms, templates, pages). Claude consumes the Figma via Figma MCP when implementing UI components — no React components are created outside of what is defined in the Figma.Governance del Design System: core-product-design-system (#18) es la fuente única de verdad para todos los componentes visuales. El Figma sigue Atomic Design (átomos, moléculas, organismos, plantillas, páginas). Claude consume el Figma via Figma MCP al implementar componentes de UI — no se crean componentes React fuera de lo definido en el Figma.
Window management, WebContentsViewGestion de ventanas, WebContentsView
MeLi + Amazon + Shopify URL patterns (remote config)Patrones URL MeLi + Amazon + Shopify (config remota)
Multiple sessions, independent cookiesMultiples sesiones, cookies independientes
Chat, suggestions, progress, confirmationsChat, sugerencias, progreso, confirmaciones
Main to renderer communicationComunicacion main a renderer
electron-updater + GitHub Releaseselectron-updater + GitHub Releases
Mac .dmg (electron-builder)Mac .dmg (electron-builder)
Traces, tokens, latencies (dev mode)Trazas, tokens, latencias (modo dev)
User settings, preferences, workspaceConfig de usuario, preferencias, workspace
Plan status, credits, Stripe CheckoutEstado de plan, creditos, Stripe Checkout
OAuth2 marketplace connection flowsFlujos de conexion OAuth2 a marketplaces
First-use guided setup (5 steps)Setup guiado para primer uso (5 pasos)
Recommended Tech StackStack Tecnologico Recomendado
Data Models, WebSocket Protocol, API Signatures & Acceptance Criteria Modelos de Datos, Protocolo WebSocket, APIs & Criterios de Aceptación
// Electron Main Process Components:
// 1. WindowManager: 1200x800 default, split 70% WebContentsView / 30% Sidebar
// 2. MarketplaceDetector URL patterns:
// mercadolibre.com.*/seller/* -> MeLi Seller Center
// mercadolibre.com.*/p/* -> MeLi Product Page
// sellercentral.amazon.com/* -> Amazon Seller Central
// admin.shopify.com/* -> Shopify Admin
// * -> Other (no injection)
// 3. SessionManager: separate cookie partitions per marketplace
// 4. TabManager: Cmd+1=MeLi, Cmd+2=Amazon, Cmd+3=Shopify
// 5. CredentialsManager: JWT in Electron safeStorage
// 6. OAuthPopupManager: dedicated BrowserWindow for OAuth2 redirects
// 7. IPC channels: marketplace:detected, auth:get-token, nav:go, notification:show, oauth:start
// Preload Script (secure bridge):
contextBridge.exposeInMainWorld('shopilotBridge', {
getMarketplaceContext: () => ipcRenderer.invoke('ctx'),
getAuthToken: () => ipcRenderer.invoke('auth:get-token'),
navigateTo: (url: string) => ipcRenderer.send('nav:go', url),
onNotification: (cb: Function) => ipcRenderer.on('notification', cb),
startOAuth: (marketplace: string) => ipcRenderer.invoke('oauth:start', marketplace),
})
// NEVER expose ipcRenderer directly
// === COACH → SHELL (server-to-client events) ===
interface TextChunkEvent { type: 'text_chunk'; text: string }
interface ToolStartEvent { type: 'tool_start'; toolId: string; toolName: string; category: 'READ'|'WRITE'|'ANALYSIS'|'SYSTEM' }
interface ToolProgressEvent { type: 'tool_progress'; toolId: string; message: string } // "Consultando metricas..."
interface ToolCompleteEvent { type: 'tool_complete'; toolId: string; resultSummary: string }
interface ConfirmationReqEv { type: 'confirmation_required'; confirmationId: string; action: string; before: object; after: object }
interface SuggestionEvent { type: 'suggestion'; id: string; suggestionType: string; message: string; priority: number }
interface ErrorEvent { type: 'error'; code: string; message: string }
interface RoundEndEvent { type: 'round_end'; roundNumber: number; creditsUsed: number }
// === SHELL → COACH (client-to-server events) ===
interface MessageEvent { type: 'message'; text: string; marketplaceContext: MarketplaceContext }
interface ConfirmationResEv { type: 'confirmation_result'; confirmationId: string; approved: boolean }
interface CancelEvent { type: 'cancel' }
interface ContextUpdateEvent { type: 'context_update'; marketplace: string; page: string; productId?: string }
// Connection: wss://api.shopilot.ai/ws?token={JWT}
// Reconnect: exponential backoff (1s, 2s, 4s, 8s, max 30s)
// Heartbeat: ping/pong every 30s, timeout 90s
// Session restore: on reconnect, server replays last incomplete round
// === React Hooks (7 total, was 4) === // useShopilot() — WebSocket lifecycle, send/receive, streaming text assembly, reconnect // useMarketplace() — IPC marketplace:detected events, current context // useSuggestions() — Receives suggestion events via WebSocket (was: REST polling) // useCredentials() — Get/refresh JWT via IPC // useProfile() — GET/PUT /users/:userId/profile, form state (NEW) // useBilling() — GET /billing/status, plan/credits/period (NEW) // useEnrollment() — GET /auth/marketplaces/:userId, connect/disconnect flows (NEW) // === Sidebar Views (react-router, in-sidebar navigation) === // /chat — Default: ChatPanel + SuggestionCards + ToolProgress + ConfirmationDialog // /profile — ProfileView: name, categories, goals, language, connected marketplaces (read-only) // /billing — BillingView: plan badge, credits bar, billing period, upgrade/pack/portal buttons // /enrollment — EnrollmentView: marketplace cards (MeLi/Amazon/Shopify), connect/disconnect, status // /onboarding — OnboardingWizard: 5-step guided setup (first run only) // === Sidebar Navigation === // Header: Logo + marketplace indicator + credits badge + nav icons (chat | profile | billing | enrollment) // Bottom nav or icon bar for view switching — chat is always the primary/default view // === Keyboard Shortcuts === // Cmd+L -> Focus chat input // Cmd+B -> Toggle sidebar // Cmd+K -> Command palette // Cmd+1..9 -> Switch marketplace tabs // Escape -> Cancel current operation / dismiss confirmation // Cmd+Enter -> Send message (Enter = newline) // Cmd+Shift+D -> Toggle Dev Tools panel (dev mode only) // Auto-Updater: electron-updater + GitHub Releases, check on startup + every 6h
// From #2 Orchestrator: // WebSocket wss://api.shopilot.ai/ws — streaming protocol defined above // From #13 Billing: // GET /billing/status — BillingStatus (plan, credits, period) // POST /billing/checkout — Stripe Checkout redirect (upgrade to Pro) // POST /billing/packs/checkout — Stripe Checkout redirect (buy Credit Pack) // POST /billing/portal — Stripe Customer Portal redirect // From #12 Marketplace Provider: // POST /auth/connect/:marketplace — Start OAuth2, returns redirect URL // GET /auth/callback/:marketplace — OAuth2 callback (handled in popup) // GET /auth/marketplaces/:userId — List connected marketplaces + status // DELETE /auth/disconnect/:userId/:marketplace — Revoke tokens + cleanup // From UserProfile API (DynamoDB): // GET /users/:userId/profile — UserProfile (name, categories, goals, prefs) // PUT /users/:userId/profile — Update UserProfile
- [Shell] App opens, loads MeLi, sidebar works in <5s
- [Shell] MarketplaceDetector identifies: seller center, product detail, search results
- [Shell] App does not exceed 500MB RAM with 3 tabs open (target 400MB)
- [Shell] .dmg installs on Mac without unsigned app errors
- [Chat] WebSocket streaming: text_chunk events render <50ms after receipt (no visible lag)
- [Chat] tool_start/tool_progress/tool_complete events show real-time progress indicator
- [Chat] Confirmations show clear before/after with working Accept/Reject buttons
- [Chat] WebSocket reconnects automatically with exponential backoff, session restores last incomplete round
- [Profile] ProfileView loads user data, edits save to backend, changes reflect in Coach context within next message
- [Billing] BillingView shows plan, credits remaining (bar), billing period. Upgrade button opens Stripe Checkout. Pack button opens pack Checkout
- [Enrollment] "Connect MeLi" opens OAuth2 popup, completes flow, marketplace appears as connected. Same for Amazon and Shopify
- [Enrollment] "Disconnect" revokes tokens and updates status immediately
- [Onboarding] First-time user sees OnboardingWizard: profile setup → marketplace connect → first chat → tour. Completion persisted
- [Onboarding] Returning user skips onboarding, goes directly to chat
- [Shortcuts] Cmd+L focus chat, Cmd+B toggle sidebar, Cmd+K command palette
- [Update] Auto-updater downloads and installs updates without crash
- [Shell] App abre, carga MeLi, sidebar funciona en <5s
- [Shell] MarketplaceDetector identifica: seller center, detalle de producto, resultados busqueda
- [Shell] App no excede 500MB RAM con 3 tabs abiertos (meta 400MB)
- [Shell] .dmg se instala en Mac sin errores de app no firmada
- [Chat] Streaming WebSocket: eventos text_chunk renderizan <50ms despues de recepcion (sin lag visible)
- [Chat] Eventos tool_start/tool_progress/tool_complete muestran indicador de progreso en tiempo real
- [Chat] Confirmaciones muestran before/after claro con botones Aceptar/Rechazar funcionales
- [Chat] WebSocket reconecta automaticamente con backoff exponencial, sesion restaura ultimo round incompleto
- [Perfil] ProfileView carga datos del usuario, ediciones guardan al backend, cambios se reflejan en contexto del Coach en el siguiente mensaje
- [Billing] BillingView muestra plan, creditos restantes (barra), periodo de facturacion. Boton upgrade abre Stripe Checkout. Boton pack abre Checkout de packs
- [Enrollment] "Conectar MeLi" abre popup OAuth2, completa flujo, marketplace aparece como conectado. Igual para Amazon y Shopify
- [Enrollment] "Desconectar" revoca tokens y actualiza estado inmediatamente
- [Onboarding] Usuario nuevo ve OnboardingWizard: setup perfil → conectar marketplace → primer chat → tour. Completacion persistida
- [Onboarding] Usuario que regresa salta onboarding, va directo al chat
- [Shortcuts] Cmd+L foco chat, Cmd+B toggle sidebar, Cmd+K paleta de comandos
- [Update] Auto-updater descarga e instala updates sin crash
Window: 1200x800 · Split: 70/30 · RAM: <500MB (target 400MB) · WS heartbeat: 30s · WS reconnect: exp backoff max 30s · Auto-update: startup + 6h · Mac only MVP
How It WorksComo Funciona
+---------------------------------------------------------------+
| ELECTRON MAIN PROCESS |
| +------------------+ +------------------+ +--------------+ |
| | WindowManager | | MarketplaceDetect| | SessionMgr | |
| | 1200x800 default | | URL Patterns: | | Per-mktplace | |
| | 70/30 split | | meli.com/seller | | cookie | |
| | WebContentsView | | sellercentral.* | | partitions | |
| | minimize to tray | | admin.shopify.* | | Persist | |
| +------------------+ +--------+---------+ +--------------+ |
| +------------------+ | +--------------+ |
| | TabManager | v | OAuthPopup | |
| | Cmd+1..9 switch | IPC: marketplace:det | Dedicated | |
| | Independent | IPC: oauth:start | BrowserWindow| |
| | cookies per tab | IPC: auth:get-token | for OAuth2 | |
| +------------------+ +------------------+ | redirects | |
| +------------------+ | CredentialsMgr | +--------------+ |
| | IPC Handlers | | JWT safeStorage | +--------------+ |
| | 7 channels | | Auto-refresh | | AutoUpdater | |
| +------------------+ +------------------+ +--------------+ |
+---------------------------------------------------------------+
|
v (contextBridge - secure preload)
+---------------------------------------------------------------+
| RENDERER (React Sidebar) |
| |
| +-----------------------------------------------------------+|
| | Header: Logo + Mktplace indicator + Credits + Nav icons ||
| | [Chat] [Profile] [Billing] [Enrollment] ||
| +-----------------------------------------------------------+|
| | ||
| | /chat (default) /profile /billing ||
| | +---------------------+ +--------------+ +--------------+ ||
| | | SuggestionCards | | Name, email | | Plan: Pro | ||
| | | ChatPanel | | Categories | | Credits: 342 | ||
| | | - Streaming text | | Goals | | [==== ] 68% | ||
| | | - ToolProgress | | Language | | Period: ends | ||
| | | - Confirmations | | Connected | | [Upgrade] | ||
| | | - Input + palette | | marketplaces | | [Buy Pack] | ||
| | +---------------------+ +--------------+ +--------------+ ||
| | ||
| | /enrollment /onboarding (first run) ||
| | +---------------------+ +---------------------------------+|
| | | [MeLi] Connected | | Step 1: Profile setup ||
| | | [Amazon] Connect | | Step 2: Connect marketplace ||
| | | [Shopify] Connect | | Step 3: First chat with Coach ||
| | | [Disconnect] | | Step 4: Quick sidebar tour ||
| | +---------------------+ +---------------------------------+|
| | ||
| | StatusBar: WS dot + view name + shortcut hint ||
| +-----------------------------------------------------------+|
| |
| Hooks: useShopilot | useMarketplace | useSuggestions |
| useCredentials | useProfile | useBilling | useEnrollment|
+---------------------------------------------------------------+
=== WEBSOCKET STREAMING (Shell <-> Coach) ===
Shell Coach (#2 Orchestrator)
| |
|-- message {text, context} --> |
| |-- ReAct loop starts
| <-- text_chunk {text} ----- | (LLM streaming)
| <-- text_chunk {text} ----- |
| <-- tool_start {name} ----- | (before_tool hook)
| <-- tool_progress {msg} --- | ("Consultando metricas...")
| <-- tool_complete {res} --- | (after_tool hook)
| <-- confirmation_required - | (WRITE action)
|-- confirmation_result ------> | (user approves/rejects)
| <-- text_chunk {text} ----- | (Coach explains result)
| <-- round_end {credits} --- | (round complete)
| |
Reconnect: exponential backoff (1s, 2s, 4s... max 30s)
Heartbeat: ping/pong 30s, timeout 90s
Session restore: server replays last incomplete round on reconnect
The Native Shell is an Electron desktop app (~35% of total project effort) that combines a WebContentsView for marketplace browsing (70% of window) with a React sidebar for the AI copilot (30%). The Main Process runs 7 managers: WindowManager, MarketplaceDetector, SessionManager, TabManager, CredentialsManager, OAuthPopupManager (dedicated BrowserWindow for OAuth2 redirect flows — never uses the main WebContentsView), and AutoUpdater. The Renderer is a full React app with 7 custom hooks: useShopilot (WebSocket streaming protocol), useMarketplace (IPC context), useSuggestions (WebSocket events), useCredentials (JWT via IPC), useProfile (user settings CRUD), useBilling (plan/credits status from #13), and useEnrollment (marketplace connect/disconnect via #12). The sidebar uses react-router for in-sidebar view navigation: /chat (default), /profile, /billing, /enrollment, and /onboarding (first-run wizard). Communication with the Coach backend uses a bidirectional WebSocket protocol with 8 server-to-client event types and 4 client-to-server event types, replacing the previous REST polling approach. The OnboardingWizard guides first-time users through profile setup, marketplace connection, and first Coach interaction before showing the main chat view. Mac-only for MVP (.dmg with code signing).El Native Shell es una app de escritorio Electron (~35% del esfuerzo total del proyecto) que combina un WebContentsView para navegacion del marketplace (70% de la ventana) con un sidebar React para el copilot de IA (30%). El Proceso Principal ejecuta 7 managers: WindowManager, MarketplaceDetector, SessionManager, TabManager, CredentialsManager, OAuthPopupManager (BrowserWindow dedicada para flujos redirect OAuth2 — nunca usa el WebContentsView principal), y AutoUpdater. El Renderer es una app React completa con 7 hooks custom: useShopilot (protocolo streaming WebSocket), useMarketplace (contexto IPC), useSuggestions (eventos WebSocket), useCredentials (JWT via IPC), useProfile (CRUD de configuracion de usuario), useBilling (estado plan/creditos de #13), y useEnrollment (conectar/desconectar marketplaces via #12). El sidebar usa react-router para navegacion interna entre vistas: /chat (default), /profile, /billing, /enrollment, y /onboarding (wizard de primer uso). La comunicacion con el backend del Coach usa un protocolo WebSocket bidireccional con 8 tipos de eventos server-to-client y 4 client-to-server, reemplazando el enfoque anterior de REST polling. El OnboardingWizard guia a usuarios nuevos a traves de setup de perfil, conexion de marketplace, e interaccion inicial con el Coach antes de mostrar la vista principal de chat. Solo Mac para MVP (.dmg con code signing).
Implementation PlanPlan de Implementacion
Phase 1: Electron Skeleton + WebContentsView (Week 1-3)Fase 1: Esqueleto Electron + WebContentsView (Semana 1-3)
Set up Electron 28+ project with TypeScript, React 18, TailwindCSS, Zustand, and react-router. Implement WindowManager with 1200x800 default, 70/30 split between WebContentsView and React sidebar. Build the WebContentsView that loads MeLi Seller Center. Implement the secure preload script with contextBridge (never expose ipcRenderer directly). Set up electron-builder for .dmg packaging. Set up sidebar router scaffolding (/chat, /profile, /billing, /enrollment, /onboarding as placeholder routes). Validation: deployable .dmg that loads MeLi with sidebar placeholder and route navigation working — confirms Electron shell + WebContentsView + build pipeline end-to-end.Configurar proyecto Electron 28+ con TypeScript, React 18, TailwindCSS, Zustand, y react-router. Implementar WindowManager con 1200x800 default, split 70/30 entre WebContentsView y sidebar React. Construir el WebContentsView que carga MeLi Seller Center. Implementar el preload script seguro con contextBridge (nunca exponer ipcRenderer directamente). Configurar electron-builder para empaquetado .dmg. Configurar scaffolding del router del sidebar (/chat, /profile, /billing, /enrollment, /onboarding como rutas placeholder). Validacion: .dmg desplegable que carga MeLi con sidebar placeholder y navegacion de rutas funcionando — confirma Electron shell + WebContentsView + pipeline de build end-to-end.
Phase 2: MarketplaceDetector + Tabs + Enrollment (Week 3-5)Fase 2: MarketplaceDetector + Tabs + Enrollment (Semana 3-5)
Implement MarketplaceDetector with URL pattern matching for MeLi, Amazon, and Shopify. Build the IPC bridge: marketplace:detected events sent to sidebar. Implement TabManager with independent cookie partitions per tab via SessionManager, tab bar UI, and Cmd+1-9 quick switching. Build OAuthPopupManager: dedicated BrowserWindow that opens for OAuth2 redirects (POST /auth/connect/:marketplace), intercepts callback URL, closes popup on success. Build EnrollmentView: marketplace cards with Connect/Disconnect buttons, status indicators, useEnrollment() hook consuming #12 APIs. Validation: navigate across marketplaces, connect MeLi via OAuth2 popup, see it as "Connected" in EnrollmentView, tabs maintain independent sessions.Implementar MarketplaceDetector con matching de patrones de URL para MeLi, Amazon y Shopify. Construir puente IPC: eventos marketplace:detected enviados al sidebar. Implementar TabManager con particiones de cookies independientes por tab via SessionManager, UI de barra de tabs, y switching Cmd+1-9. Construir OAuthPopupManager: BrowserWindow dedicada que se abre para redirects OAuth2 (POST /auth/connect/:marketplace), intercepta URL de callback, cierra popup al completar. Construir EnrollmentView: tarjetas de marketplace con botones Conectar/Desconectar, indicadores de estado, hook useEnrollment() consumiendo APIs de #12. Validacion: navegar entre marketplaces, conectar MeLi via popup OAuth2, verlo como "Conectado" en EnrollmentView, tabs mantienen sesiones independientes.
Phase 3: Chat + WebSocket Streaming + Profile + Billing (Week 5-8)Fase 3: Chat + Streaming WebSocket + Perfil + Billing (Semana 5-8)
Build the ChatPanel: message list with react-markdown rendering, implement WebSocket streaming protocol (useShopilot hook: connect, text_chunk assembly, tool_start/progress/complete indicators, confirmation_required/result flow, reconnect with exponential backoff, session restore). Build ConfirmationDialog with before/after preview and approve/reject buttons. Input area with auto-resize textarea and "/" command palette. Build ProfileView: useProfile() hook for GET/PUT UserProfile, form with name, categories, goals, language preference (react-hook-form). Build BillingView: useBilling() hook for GET /billing/status, plan badge, credits progress bar, billing period display, Upgrade/Buy Pack/Manage buttons that redirect to Stripe Checkout/Portal. Build Dev Tools panel (toggle via Cmd+Shift+D). Validation: full end-to-end flow — login, chat with streaming, confirm write actions, edit profile, view billing status, upgrade via Stripe, view traces in Dev Tools.Construir ChatPanel: lista de mensajes con renderizado react-markdown, implementar protocolo streaming WebSocket (hook useShopilot: conectar, ensamblaje de text_chunk, indicadores tool_start/progress/complete, flujo confirmation_required/result, reconexion con backoff exponencial, restauracion de sesion). Construir ConfirmationDialog con preview antes/despues y botones aprobar/rechazar. Area de input con textarea auto-resize y paleta de comandos "/". Construir ProfileView: hook useProfile() para GET/PUT UserProfile, formulario con nombre, categorias, goals, preferencia de idioma (react-hook-form). Construir BillingView: hook useBilling() para GET /billing/status, badge de plan, barra de progreso de creditos, display de periodo de facturacion, botones Upgrade/Comprar Pack/Gestionar que redirigen a Stripe Checkout/Portal. Construir panel Dev Tools (toggle via Cmd+Shift+D). Validacion: flujo end-to-end completo — login, chat con streaming, confirmar acciones de escritura, editar perfil, ver estado de billing, upgrade via Stripe, ver trazas en Dev Tools.
Phase 4: Onboarding + Polish + Signing + Auto-Update (Week 8-11)Fase 4: Onboarding + Pulido + Firma + Auto-Update (Semana 8-11)
Build OnboardingWizard: 5-step guided flow for first-time users — Step 1: profile setup (name, categories, goals), Step 2: connect first marketplace (reuses EnrollmentView OAuth2 flow), Step 3: first guided Coach interaction ("Ask Shopilot about your metrics"), Step 4: quick sidebar tour (highlight chat, suggestions, shortcuts). Detect first-run via UserProfile.onboardingCompleted flag. Implement all keyboard shortcuts. Add minimize-to-tray for proactive notifications. Set up AutoUpdater with electron-updater + GitHub Releases. Obtain Apple Developer Certificate for code signing + notarization. Performance optimization: <5s startup, <300MB RAM with 3 tabs. Add Sentry for crash reporting. Final QA and .dmg distribution.Construir OnboardingWizard: flujo guiado de 5 pasos para usuarios nuevos — Paso 1: setup de perfil (nombre, categorias, goals), Paso 2: conectar primer marketplace (reutiliza flujo OAuth2 de EnrollmentView), Paso 3: primera interaccion guiada con el Coach ("Preguntale a Shopilot sobre tus metricas"), Paso 4: tour rapido del sidebar (resaltar chat, sugerencias, shortcuts). Detectar primer uso via flag UserProfile.onboardingCompleted. Implementar todos los atajos de teclado. Agregar minimizar a tray para notificaciones proactivas. Configurar AutoUpdater con electron-updater + GitHub Releases. Obtener Apple Developer Certificate para code signing + notarizacion. Optimizacion de rendimiento: <5s startup, <300MB RAM con 3 tabs. Agregar Sentry para crash reporting. QA final y distribucion .dmg.
Risk AnalysisAnalisis de Riesgos
Marketplace Session/Cookie BreakageRotura de Sesion/Cookies del Marketplace
Impact: HighImpacto: Alto
Mitigation: Each marketplace gets an isolated session partition (Electron partition: persist:mercadolibre, persist:amazon). Sessions persist across app restarts. If a session expires, detect the login redirect and prompt the user to re-authenticate. Never inject JavaScript into marketplace pages to avoid triggering anti-bot protections.Mitigacion: Cada marketplace obtiene una particion de sesion aislada (Electron partition: persist:mercadolibre, persist:amazon). Las sesiones persisten entre reinicios de la app. Si una sesion expira, detectar el redireccionamiento de login y solicitar al usuario que re-autentique. Nunca inyectar JavaScript en paginas del marketplace para evitar disparar protecciones anti-bot.
Electron Memory BloatInflacion de Memoria de Electron
Impact: HighImpacto: Alto
Mitigation: Hard limit of 3 tabs at MVP. Lazy-load tab content (background tabs are suspended after 5min of inactivity). Monitor memory via Electron process.getProcessMemoryInfo(). Alert and force-GC if total exceeds 400MB. Profile Chromium renderer per tab to identify leaks. Target: under 300MB with 3 active tabs.Mitigacion: Limite duro de 3 tabs en MVP. Carga lazy del contenido de tabs (tabs en background se suspenden despues de 5min de inactividad). Monitorear memoria via Electron process.getProcessMemoryInfo(). Alertar y forzar GC si el total excede 400MB. Perfilar renderer de Chromium por tab para identificar leaks. Objetivo: bajo 300MB con 3 tabs activas.
macOS Code Signing + Notarization DelaysRetrasos en Code Signing + Notarizacion de macOS
Impact: MediumImpacto: Medio
Mitigation: Apply for Apple Developer Program in Week 1 (approval takes 24-48h). Test code signing and notarization in CI pipeline early (Week 3). Without proper signing, macOS Gatekeeper blocks the app entirely. Keep electron-builder config version-controlled for reproducible builds.Mitigacion: Solicitar Apple Developer Program en Semana 1 (la aprobacion toma 24-48h). Testear code signing y notarizacion en pipeline CI temprano (Semana 3). Sin firma adecuada, macOS Gatekeeper bloquea la app completamente. Mantener configuracion de electron-builder con control de versiones para builds reproducibles.
OAuth2 Popup Flow Blocked by OS/MarketplaceFlujo OAuth2 en Popup Bloqueado por OS/Marketplace
Impact: HighImpacto: Alto
Mitigation: Use a dedicated Electron BrowserWindow (not a system browser) for OAuth2 redirects — this avoids popup blockers and gives us control over the redirect interception. Intercept the callback URL via webContents.on('will-redirect') before it reaches the marketplace's callback endpoint. If MeLi/Amazon/Shopify change their OAuth2 flow or block Electron user-agents, fall back to system browser with localhost callback server (loopback redirect). Test OAuth2 flows against all 3 marketplaces in Phase 2.Mitigacion: Usar BrowserWindow dedicada de Electron (no navegador del sistema) para redirects OAuth2 — esto evita bloqueadores de popups y nos da control sobre la intercepcion del redirect. Interceptar URL de callback via webContents.on('will-redirect') antes de que llegue al endpoint de callback del marketplace. Si MeLi/Amazon/Shopify cambian su flujo OAuth2 o bloquean user-agents de Electron, caer a navegador del sistema con servidor callback localhost (redirect loopback). Testear flujos OAuth2 contra los 3 marketplaces en Fase 2.
WebSocket Connection InstabilityInestabilidad de Conexion WebSocket
Impact: HighImpacto: Alto
Mitigation: Exponential backoff reconnection (1s, 2s, 4s, 8s, max 30s). Ping/pong heartbeat every 30s with 90s timeout to detect dead connections. On reconnect, server replays the last incomplete round to prevent lost context. StatusBar shows real-time connection indicator (green/yellow/red). If WebSocket fails persistently (>3 retries), fall back to REST long-polling as degraded mode — chat still works, just without streaming progress indicators.Mitigacion: Reconexion con backoff exponencial (1s, 2s, 4s, 8s, max 30s). Heartbeat ping/pong cada 30s con timeout de 90s para detectar conexiones muertas. Al reconectar, el servidor reenvia el ultimo round incompleto para prevenir perdida de contexto. StatusBar muestra indicador de conexion en tiempo real (verde/amarillo/rojo). Si WebSocket falla persistentemente (>3 reintentos), caer a REST long-polling como modo degradado — el chat sigue funcionando, solo sin indicadores de progreso en streaming.
Key DecisionsDecisiones Clave
Electron over browser extension or web app — A browser extension cannot control tab sessions, isolate cookie partitions, or provide a persistent sidebar across navigations. A web app cannot detect which marketplace page the user is viewing. Electron gives us full control: embedded browser with session isolation, native OS integrations (tray, keychain, notifications), and auto-updates. The ~35% effort premium is justified by the vastly superior UX.Electron en vez de extension de navegador o web app — Una extension de navegador no puede controlar sesiones de tabs, aislar particiones de cookies, ni proveer un sidebar persistente entre navegaciones. Una web app no puede detectar que pagina de marketplace esta viendo el usuario. Electron nos da control total: navegador embebido con aislamiento de sesiones, integraciones nativas del OS (tray, keychain, notificaciones), y auto-updates. La prima de ~35% de esfuerzo se justifica por la UX vastamente superior.
Zustand over Redux for state management — The sidebar state is relatively simple: current marketplace context, chat messages, suggestions list, auth token. Redux's boilerplate (actions, reducers, selectors) is overkill. Zustand provides the same reactivity with 80% less code. The entire store fits in a single file. If state complexity grows in Scope Full, migration to Redux Toolkit is straightforward.Zustand en vez de Redux para manejo de estado — El estado del sidebar es relativamente simple: contexto actual del marketplace, mensajes de chat, lista de sugerencias, token de auth. El boilerplate de Redux (actions, reducers, selectors) es excesivo. Zustand provee la misma reactividad con 80% menos codigo. Todo el store cabe en un solo archivo. Si la complejidad del estado crece en Scope Full, la migracion a Redux Toolkit es directa.
Mac-only MVP, defer Windows and Linux — 70%+ of LatAm e-commerce sellers use macOS or can install a .dmg. Building for Windows (code signing with EV certificate, Windows Defender SmartScreen) and Linux (AppImage, Snap, deb) adds 2-3 weeks of CI/CD complexity. Ship Mac first, validate product-market fit, then expand platform coverage based on user demand.MVP solo Mac, diferir Windows y Linux — 70%+ de vendedores de e-commerce en LatAm usan macOS o pueden instalar un .dmg. Construir para Windows (code signing con certificado EV, Windows Defender SmartScreen) y Linux (AppImage, Snap, deb) agrega 2-3 semanas de complejidad en CI/CD. Lanzar Mac primero, validar product-market fit, luego expandir cobertura de plataformas basado en demanda de usuarios.
WebSocket over SSE for Coach communication — SSE is server-to-client only. The Shell needs to send confirmation_result (approve/reject) and context_update (marketplace changed) to the Coach — that requires bidirectional communication. WebSocket provides full-duplex on a single connection. The IEventEmitter abstraction in #2 already supports swapping WebSocketEventEmitter in Phase 4. API Gateway WebSocket (AWS) in production, ws library for local dev.WebSocket en vez de SSE para comunicacion con Coach — SSE es solo server-to-client. La Shell necesita enviar confirmation_result (aprobar/rechazar) y context_update (marketplace cambio) al Coach — eso requiere comunicacion bidireccional. WebSocket provee full-duplex en una sola conexion. La abstraccion IEventEmitter en #2 ya soporta intercambiar WebSocketEventEmitter en Fase 4. API Gateway WebSocket (AWS) en produccion, libreria ws para dev local.
All views inside the sidebar, not separate windows — Profile, Billing, and Enrollment could each be separate Electron windows, but that fragments the UX. Keeping everything in the sidebar with react-router maintains the "copilot panel" mental model — the sidebar is always visible alongside the marketplace. Navigation between views is instant (no window creation overhead). The chat remains the primary view; others are secondary settings screens.Todas las vistas dentro del sidebar, no ventanas separadas — Perfil, Billing y Enrollment podrian ser ventanas Electron separadas, pero eso fragmenta la UX. Mantener todo en el sidebar con react-router preserva el modelo mental de "panel copilot" — el sidebar esta siempre visible junto al marketplace. La navegacion entre vistas es instantanea (sin overhead de creacion de ventanas). El chat es la vista principal; las demas son pantallas de configuracion secundarias.
Onboarding as skippable wizard, not mandatory gate — The wizard guides first-time users through profile + enrollment + first chat, but can be dismissed and resumed later from ProfileView. This avoids blocking power users who already know what they're doing, while still reducing friction for newcomers. Completion state persisted in UserProfile.onboardingCompleted (DynamoDB).Onboarding como wizard saltable, no gate obligatorio — El wizard guia a usuarios nuevos por perfil + enrollment + primer chat, pero puede cerrarse y retomarse luego desde ProfileView. Esto evita bloquear a power users que ya saben lo que hacen, mientras reduce friccion para recien llegados. Estado de completacion persistido en UserProfile.onboardingCompleted (DynamoDB).
MVP Scope
[v4] MeLi + Amazon + Shopify detection, dark mode, Mac .dmg. RAM: 500MB max (target 400MB). Breadcrumb nav (not URL bar). Remote config for URL patterns. Credit balance in sidebar + upgrade/buy-pack CTAs. WebSocket streaming protocol (bidirectional). ProfileView + BillingView + EnrollmentView (OAuth2 popup). OnboardingWizard (5-step first-run). 7 React hooks, 5 sidebar views via react-router. [v4] Deteccion MeLi + Amazon + Shopify, dark mode, Mac .dmg. RAM: 500MB max (meta 400MB). Navegacion breadcrumb (no barra URL). Config remota para patrones URL. Saldo de creditos en sidebar + CTAs upgrade/comprar packs. Protocolo streaming WebSocket (bidireccional). ProfileView + BillingView + EnrollmentView (popup OAuth2). OnboardingWizard (5 pasos en primer uso). 7 hooks React, 5 vistas sidebar via react-router.
Inspired byInspirado en
New (~35% effort). Cursor Electron pattern. Nuevo (~35% esfuerzo). Patron Electron de Cursor.
📝 Project ChangelogChangelog del Proyecto
Design System
Brand & Components — UX/UI (executes Figma) · Pablo (approves) · Sergio (consumes → React Mockups)
Repository of specifications and context (no executable code) that bridges design in Figma with React implementation in core-product-desktop-client (#1). 44 components catalogued following Atomic Design (13 atoms, 6 AI-native atoms, 10 molecules, 16 organisms, 5 screens), 9 pending brand decisions (D1-D9), and a design-to-code pipeline via Figma MCP. The external UX/UI team delivers the brand book and Figma files (T0.BB–T4.BB each sprint); Pablo approves each delivery; Sergio consumes the components and creates integration Mockups in React. Claude reads the Figma via MCP (get_file, get_node, get_variable_defs, get_metadata) to generate matching React components — no components are created outside of what is defined in the Figma. Includes UX Writing guide (copies, tone, number formatting), 8 data visualization patterns for sellers, AI-native interaction patterns (streaming, thinking, tool stages, confirmation), and desktop Electron patterns (title bar, split pane, tab bar, status bar, keyboard shortcuts). Backed by competitive brand analysis of 16 reference brands (Cursor, Linear, Shopify/Polaris, Vercel/Geist, Anthropic, HubSpot/Canvas, Brex, Mercury, and 8 more).
Repositorio de especificaciones y contexto (sin código ejecutable) que conecta el diseño en Figma con la implementación React en core-product-desktop-client (#1). 44 componentes catalogados siguiendo Atomic Design (13 átomos, 6 átomos AI-native, 10 moléculas, 16 organismos, 5 pantallas), 9 decisiones de marca pendientes (D1-D9), y un pipeline design-to-code via Figma MCP. El equipo externo UX/UI entrega el brand book y los archivos Figma (T0.BB–T4.BB cada sprint); Pablo aprueba cada entrega; Sergio consume los componentes y crea Mockups de integración en React. Claude lee el Figma via MCP (get_file, get_node, get_variable_defs, get_metadata) para generar componentes React — no se crean componentes fuera de lo definido en el Figma. Incluye guía de UX Writing (copies, tono, formato de números), 8 patrones de visualización de datos para vendedores, patrones de interacción AI-native (streaming, thinking, tool stages, confirmación), y patrones desktop Electron (title bar, split pane, tab bar, status bar, atajos de teclado). Respaldado por análisis competitivo de 16 marcas de referencia (Cursor, Linear, Shopify/Polaris, Vercel/Geist, Anthropic, HubSpot/Canvas, Brex, Mercury, y 8 más).
4 mandatory agent rules: (1) No inventing components outside Figma — if one is missing, stop and report. (2) No hardcoded values — all colors, sizes, radii, spacing come from Figma variables with codeSyntax. (3) Figma is source of truth — no prior knowledge, no other projects, no “what looks good”. (4) Verify before implementing — must execute get_variable_defs + get_metadata via MCP before writing any React code.
4 reglas mandatorias del agente: (1) No inventar componentes fuera de Figma — si falta uno, detenerse y reportar. (2) No hardcodear valores — todos los colores, tamaños, radios y espaciados vienen de variables Figma con codeSyntax. (3) Figma es fuente de verdad — no se usa conocimiento previo, ni otros proyectos, ni “lo que se vea bien”. (4) Verificar antes de implementar — debe ejecutar get_variable_defs + get_metadata via MCP antes de escribir código React.
9 brand decisions (D1-D9), logo, palette, typography 9 decisiones de marca (D1-D9), logo, paleta, tipografía
3 collections: Primitives → Semantic → Component 3 colecciones: Primitives → Semantic → Component
13 atoms + 6 AI-native + 10 molecules + 16 organisms + 5 screens 13 átomos + 6 AI-native + 10 moléculas + 16 organismos + 5 pantallas
get_file, get_node, get_variable_defs, get_metadata get_file, get_node, get_variable_defs, get_metadata
Copy patterns, tone, number formatting, terminology Patrones de copy, tono, formato de números, terminología
8 patterns: KPI cards, tables, gauges, sparklines 8 patrones: KPI cards, tablas, gauges, sparklines
Streaming, thinking, tool stages, confirmation, errors Streaming, thinking, tool stages, confirmación, errores
16 brand analyses: Cursor, Linear, Polaris, Vercel… 16 análisis de marca: Cursor, Linear, Polaris, Vercel…
Tech Stack Stack Tecnológico
Component Catalogue, Figma Architecture, Token Schema, Brand Decisions, Pipeline & Acceptance Criteria Catálogo de Componentes, Arquitectura Figma, Esquema de Tokens, Decisiones de Marca, Pipeline & Criterios de Aceptación
// === ATOMS (13) — base components, single responsibility === Button, Input, Badge, Icon, StatusDot, Spinner, Toggle, Divider, AvatarInitials, CreditBadge, ProgressBar, Tooltip, KbdShortcut // === AI-NATIVE ATOMS (6) — specific to conversational copilot === StreamingCursor (█), ThinkingPulse (•••), ToolBadge, AgentStatusBar, RiskBadge, TTLCountdown // === MOLECULES (10) — atom combinations === SearchBar, TabBar, Select, Toggle (labeled), Tooltip (rich), ProgressBar (labeled), Dropdown, KbdShortcut (combo), InputField, CreditDisplay // === ORGANISMS (16) — complex UI blocks === ChatInputBar, MessageBubble, ContextBar, ToolAccordion, ConfirmDialog, ReActStream, ProactiveCard, DataTable, AuditLog, RollbackPanel, FraudAlert, MarketplaceKPI, CreditEconomy, OnboardingStep, EnrollmentCard, ErrorRecovery // === SCREENS (5) — full views at 360px sidebar width === ChatView, Dashboard, Settings, Billing, Enrollment
D1: Brand emotion PENDING A: "Warm Precision" (rec) / B: "Data Intelligence" / C: "Growth Engine" D2: Primary color PENDING Orange #F97316 / Indigo #6366F1 / Sky #0EA5E9 / Emerald #10B981 D3: Background mode PENDING Dark-first (rec) / Light-first / Both D4: Typography PENDING Inter + JetBrains Mono / Geist + Geist Mono / IBM Plex D5: Logo PENDING Wordmark / Icon + Wordmark (rec) / Abstract mark D6: Border radius PENDING Sharp (2-4px) / Standard (6-8px) / Rounded (12-20px) D7: Shadow policy PENDING None / Minimal / Soft D8: Semantic palette PENDING Green/Amber/Red/Blue standard (blocked by D3) D9: UI voice PENDING Direct & human / Technical & precise / Empowering & confident // Until D1-D9 are decided, cannot generate design-tokens.json nor implement components
// 00 Primitives (raw values — NEVER applied directly to designs) color/blue/50..900, gray/50..900, red/50..900, green/50..900 spacing/0, 1(4px), 2(8px), 3(12px), 4(16px), 6(24px), 8(32px) radius/none(0), sm(4), md(8), lg(12), xl(16), full(9999) font-size/xs(12), sm(14), base(16), lg(18), xl(20), 2xl(24) // 01 Semantic (with Light/Dark modes — what components use) color/bg/primary → white (light) | gray/900 (dark) color/bg/secondary → gray/50 (light) | gray/800 (dark) color/text/primary → gray/900 (light) | white (dark) color/border/default → gray/200 (light) | gray/700 (dark) color/interactive/primary → blue/600 spacing/component/padding-sm → spacing/2 radius/component/button → radius/md // 02 Component (optional — unique overrides only) button/bg/primary → color/interactive/primary input/border → color/border/default input/border-focus → color/border/focus // Code Syntax mapping: variable → var(--color-bg-primary)
Figma Team: Shopilot Design ├── [LIB] Foundations & Tokens ← variables, colors, typography, spacing ├── [LIB] Iconography ← icons (changes frequently, many assets) ├── [LIB] Core Components ← atoms + molecules ├── [LIB] Pattern Components ← organisms + templates ├── Documentation & Playground ← usage guides, do/don't (not published) └── Changelog & Governance ← change history (not published) // Publish order: Tokens → Icons → Core Components → Patterns // Each component file has: Cover, Getting Started, Changelog, Atoms, Molecules, Organisms, ._Base, Archive
// .claude/settings.json — MCP permissions mcp__figma__get_file // read full Figma file mcp__figma__get_node // read specific component node mcp__figma__get_design_context // extract design context mcp__figma__get_variable_defs // read token variables (MUST run before implementing) mcp__figma__get_metadata // read component metadata (MUST run before implementing) mcp__pencil__design // complementary: rapid prototyping without designer // Workflow: // 1. Agent runs get_variable_defs + get_metadata // 2. Reads component structure from Figma // 3. Implements matching React component in core-product-desktop-client // 4. Code review verifies fidelity to Figma spec
// 5 mandatory deliverables from external design team: 1. Logo system: icon+wordmark, reduced icon, monochrome (white+black), SVG+PNG @1x-3x 2. Color palette: primary, secondary, semantic (success/warning/error/info), neutrals, dark mode mapping, WCAG AA ratios 3. Typography: display + body + monospace fonts, scale (10-24px), weights, files (.woff2 + .ttf) 4. Voice & tone: 3-5 brand personality attributes, context variations, "sounds like" vs "doesn't sound like" 5. 44 components with brand identity applied: all states (default, hover, active, focus, disabled), dark mode, WCAG AA // NOT in scope: user flows, social media guidelines, corporate stationery, motion specs, code implementation
Brand decisions (Pablo D1-D9)
↓
External design team creates Figma with variables + codeSyntax
↓
Agent runs get_variable_defs + get_metadata via Figma MCP
↓
Generates design-tokens.json (W3C DTCG format)
↓
Style Dictionary transforms → CSS :root + Tailwind config
↓
Agent implements React component in core-product-desktop-client (#1)
↓
Code review verifies fidelity: tokens + a11y + responsive + states
Status: specs → ✓ | Figma → ✗ | design-tokens.json → ✗ | React components → ✗ Blockers: 1. 9 brand decisions (D1-D9) not yet taken — blocks entire pipeline 2. No brand assets — no logo, no licensed fonts, no brand book in repo 3. No Figma file — no source of truth for agent (requires multi-file libraries, 3 variable collections, Code Syntax, Auto Layout, Light/Dark modes) 4. No brand book — external design team has not delivered yet
core-product-design-system/
├── README
├── CLAUDE.md ← agent instructions
├── brand/ ← brand assets (PENDING)
│ ├── logo/ ← logo variants (SVG, PNG)
│ ├── fonts/ ← licensed fonts
│ └── brand-book.md ← brand guidelines
├── tokens/ ← pipeline output (PENDING)
│ └── design-tokens.json ← generated from Figma variables
└── .claude/
├── settings.json ← MCP permissions (figma + pencil)
├── memory/MEMORY.md ← state, decisions, catalogue
└── specs/ (8 docs, 1930 lines) ← architecture, contracts, dev plan, testing, figma specs
AC-18.1: All 9 brand decisions (D1-D9) resolved and documented AC-18.2: Brand book delivered: logo system + color palette + typography + voice/tone + 44 components with brand applied AC-18.3: Figma multi-file structure: Foundations, Icons, Core Components, Patterns (not monolithic) AC-18.4: 3 variable collections (Primitives → Semantic → Component) with Code Syntax + Light/Dark modes AC-18.5: All 44 components in Figma: Auto Layout, all states, Component Properties, slash naming AC-18.6: design-tokens.json generated → Style Dictionary → CSS :root + Tailwind config in #1 AC-18.7: Claude reads and implements Figma components in #1 via Figma MCP (get_variable_defs + get_metadata) AC-18.8: WCAG 2.2 AA compliance: contrast ≥4.5:1 text, ≥3:1 UI, :focus-visible, touch target ≥44px AC-18.9: No React components in #1 exist outside of what is defined in the Figma AC-18.10: Zero Figma anti-patterns: no detached instances, no hardcoded hex, no generic names, no absolute positioning
How It WorksCómo Funciona
+---------------------------------------------------------------+
| BRAND DECISIONS (Pablo D1-D9) |
| D1 Emotion · D2 Color · D3 Mode · D4 Type · D5 Logo |
| D6 Radius · D7 Shadows · D8 Semantic · D9 Voice |
+-------------------------------+-------------------------------+
|
v
+---------------------------------------------------------------+
| EXTERNAL DESIGN TEAM |
| Brand book: logo + palette + typography + voice + 44 comps |
| Figma: multi-file libraries, 3 variable collections |
| Variables: Primitives → Semantic (L/D modes) → Component |
| All components: Auto Layout, states, Code Syntax |
+-------------------------------+-------------------------------+
|
v
+---------------------------------------------------------------+
| FIGMA MCP BRIDGE |
| mcp__figma__get_variable_defs → token extraction |
| mcp__figma__get_metadata → component metadata |
| mcp__figma__get_node → component structure |
| mcp__pencil__design → rapid prototyping |
+-------------------------------+-------------------------------+
|
v
+---------------------------------------------------------------+
| DESIGN TOKEN PIPELINE |
| Figma Variables → design-tokens.json (W3C DTCG) |
| Style Dictionary 4 → CSS :root + tailwind.config.ts |
| --sp-color-brand-primary, --sp-spacing-4, --sp-radius-md |
+-------------------------------+-------------------------------+
|
v
+---------------------------------------------------------------+
| NATIVE SHELL (#1) — IMPLEMENTATION |
| Claude reads Figma → generates React + Tailwind components |
| 44 components implemented following Figma spec exactly |
| Code review verifies: tokens + a11y + responsive + states |
+---------------------------------------------------------------+
📋 Project Changelog Changelog del Proyecto
Layer 2 — INTELLIGENCECapa 2 — INTELIGENCIA
The Coach lives hereEl Coach vive aquí
ReAct Orchestrator
Intelligence — Mateo
THE HEART OF SHOPILOT. Replaces the existing one-shot RAG pipeline (ConversationFlowOrchestrator) with a stateful ReAct loop: reason, decide, act, observe — iterating until the LLM responds with text or hits MAX_ROUNDS=10. Today the Coach is one-shot: one question, one LLM call, one answer — no conversation history sent, no tools, no chaining. The orchestrator fixes all three: multi-round reasoning, tool execution with ConfirmationFlow for writes, and full conversation history in every turn. Built on TypeScript/AWS Lambda (NOT Python). The orchestrator is a port-based architecture — it doesn't know the LLM model, the transport protocol, or the billing system. IEventEmitter decouples the loop from REST (Phase 0.3) and WebSocket (Phase 4). Proactive mode (Phase 4) is a separate entry point that shares the loop but has its own context construction. Four additional capabilities strengthen the loop: (1) resilient error handling — tool errors become observations for the LLM to reason about and retry, up to 3 consecutive failures; (2) post-generation verification — the Coach contrasts cited data against retrieved data before responding; (3) extended thinking — a selective reasoning budget for complex multi-variable analysis; (4) SubtaskRunner — parallel execution of independent sub-objectives for multi-entity queries. EL CORAZON DE SHOPILOT. Reemplaza el pipeline RAG one-shot existente (ConversationFlowOrchestrator) con un loop ReAct con estado: razonar, decidir, actuar, observar — iterando hasta que el LLM responda con texto o alcance MAX_ROUNDS=10. Hoy el Coach es one-shot: una pregunta, un LLM call, una respuesta — sin historial de conversacion enviado, sin tools, sin encadenamiento. El orquestador corrige los tres: razonamiento multi-ronda, ejecucion de tools con ConfirmationFlow para escrituras, e historial completo de conversacion en cada turno. Construido sobre TypeScript/AWS Lambda (NO Python). El orquestador es una arquitectura basada en puertos — no conoce el modelo LLM, el protocolo de transporte, ni el sistema de billing. IEventEmitter desacopla el loop de REST (Fase 0.3) y WebSocket (Fase 4). El modo proactivo (Fase 4) es un entry point separado que comparte el loop pero tiene su propia construccion de contexto. Cuatro capacidades adicionales fortalecen el loop: (1) manejo resiliente de errores — los errores de tools se convierten en observaciones para que el LLM razone y reintente, hasta 3 fallos consecutivos; (2) verificacion post-generacion — el Coach contrasta datos citados contra datos recuperados antes de responder; (3) razonamiento extendido — presupuesto de razonamiento selectivo para analisis complejos multi-variable; (4) SubtaskRunner — ejecucion paralela de sub-objetivos independientes para consultas multi-entidad.
Beautonomous governance: the Orchestrator IS Core's execution engine — every ReAct loop iteration enforces Core's governance rules (Section 7.5). Every WRITE action passes through ConfirmationFlow (PENDING → CONFIRMED/REJECTED/EXPIRED). Every tool call is gated by Core's permission matrix before execution.Governance de Beautonomous: el Orquestador ES el motor de ejecución de Core — cada iteración del loop ReAct aplica las reglas de governance de Core (Sección 7.5). Cada acción WRITE pasa por el ConfirmationFlow (PENDING → CONFIRMED/REJECTED/EXPIRED). Cada tool call está controlada por la matriz de permisos de Core antes de ejecutarse.
Current State (What Exists Today)Estado Actual (Que Existe Hoy)
Reusable As-IsReutilizable Sin Cambios
- ILLMClient + LLMClientFactory
- ConversationTrackingOrchestrator
- RagOrchestrator (KB provider)
- BrandHealthContextService
- ConversationManagementService
- MessageRepository
Needs ImprovementNecesita Mejora
- ConversationFlowOrchestrator: no history sent to LLMConversationFlowOrchestrator: no envia historial al LLM
- ILLMClient.generateAnswer(): RAG-specific signature — new generate(messages, tools) method is a shared deliverable with #3ILLMClient.generateAnswer(): firma especifica de RAG — nuevo metodo generate(messages, tools) es entregable compartido con #3
- COACH_SYSTEM_PROMPT: hardcoded constantCOACH_SYSTEM_PROMPT: constante hardcodeada
Missing (To Build)Falta (Por Construir)
- ReActOrchestrator
- OrchestrationSession
- IContextWindowManager
- IOrchestrationRepository
- IEventEmitter + adapters
- ConfirmationFlow (Phase 1)
- ISystemPromptComposer (Phase 1)
- ToolErrorRecovery (Phase 1)
- ResponseVerifier (Phase 2)
- ExtendedThinking (Phase 2)
- SubtaskRunner (Phase 4)
Reason/decide/act/observe (MAX_ROUNDS=10)Razonar/decidir/actuar/observar (MAX_ROUNDS=10)
ULID, stateful domain modelULID, modelo de dominio con estado
REST (0.3) / WebSocket (Ph4)REST (0.3) / WebSocket (F4)
Compaction at 92% + truncationCompactacion al 92% + truncacion
DynamoDB persist, 30min TTL (Ph1)Persistencia DynamoDB, TTL 30min (F1)
3-layer + cache_control (Ph1)3 capas + cache_control (F1)
Error → observation, 3 retry limit (Ph1)Error → observacion, limite 3 reintentos (F1)
Hallucination detection post-LLM (Ph2)Deteccion de alucinaciones post-LLM (F2)
Selective reasoning budget (Ph2)Presupuesto de razonamiento selectivo (F2)
Parallel sub-objectives (Ph4)Sub-objetivos en paralelo (F4)
Tech Stack (TypeScript / AWS Lambda)Stack Tecnologico (TypeScript / AWS Lambda)
Data Models, Interfaces & Acceptance Criteria Modelos de Datos, Interfaces & Criterios de Aceptación
// OrchestrationSession — stateful domain model
interface OrchestrationSession {
sessionId: string // ULID
conversationId: string // DynamoDB conversation ID
userId: string // Memberstack ID
marketplace: Marketplace
status: 'idle' | 'running' | 'awaiting_confirmation' | 'done' | 'error'
currentRound: number // 0..MAX_ROUNDS
messages: ConversationMessage[]
// { role: 'user', content: string }
// { role: 'assistant', content: ContentBlock[] } // may include tool_use
// { role: 'user', content: ToolResultBlock[] }
pendingConfirmation: ConfirmationRequest | null
// { toolName, toolArgs, riskLevel: 'reversible'|'irreversible',
// preview: { before, after }, expiresAt: Date (30min) }
costAccumulator: { totalTokens: number, toolCallCount: number }
}
// RoundDecision — output of evaluating each LLM response
type RoundDecision =
| 'respond' // text only -> emit response, end loop
| 'execute_tool' // tool_use blocks -> execute and observe
| 'await_confirm' // write tool -> pause loop, persist session
| 'max_rounds' // round 10 -> force final response
| 'cost_guard' // >50K tokens -> force final response
// IReActOrchestrator — conversational entry point
interface IReActOrchestrator {
handleMessage(params: {
userId: string; conversationId: string;
content: string; marketplace: Marketplace;
}): Promise<OrchestrationResult>
resumeAfterConfirmation(params: {
sessionId: string; confirmed: boolean;
}): Promise<OrchestrationResult>
}
// IEventEmitter — decouples loop from transport
type OrchestrationEvent =
| { type: 'text_chunk'; content: string }
| { type: 'tool_start'; toolName: string; args: unknown }
| { type: 'tool_result'; toolName: string; result: unknown; latencyMs: number }
| { type: 'confirmation_required'; preview: ConfirmationPreview }
| { type: 'end_turn'; summary: OrchestrationSummary }
| { type: 'error'; message: string }
// Phase 0.3: RestResponseEventEmitter (accumulate, return at end)
// Phase 4: WebSocketEventEmitter (real-time streaming)
// OrchestrationSummary — included in end_turn event
interface OrchestrationSummary {
rounds: number
tokensUsed: number
actionsExecuted: ActionSummary[] // WRITE tools executed this turn
confirmationsRequested: number
latencyMs: number
}
// ActionSummary — one WRITE action performed by the Coach
interface ActionSummary {
toolName: string // e.g. 'update_product_content'
sku: string
fieldChanged: string // e.g. 'title'
previousValue: unknown // from snapshot_product
newValue: unknown
success: boolean
}
// Enables: Shell shows seller what changed, Feedback Loop (#15) measures impact
// IContextWindowManager — compaction + truncation
interface IContextWindowManager {
guard(messages: ConversationMessage[], modelTokenLimit: number): Promise<ConversationMessage[]>
truncateToolResult(result: unknown, maxTokens: number): unknown
}
// Triggers at 92% of model limit. Preserves last 10 messages + active round tool_results.
// IOrchestrationRepository — session persistence for ConfirmationFlow
// DynamoDB with 35min TTL. Only used during PAUSE/RESUME.
// IOrchestrationTracer — fire-and-forget observability
// Reuses ConversationTrackingOrchestrator. If PostgreSQL fails, loop continues.
- [Ph 0.3] Conversation history sent to LLM in every turn (fix: data exists in DynamoDB, just load and pass)
- [Ph 0.3] ReAct loop active with MAX_ROUNDS=10 guard + cost guard (50K tokens)
- [Ph 0.3] Context compaction triggers at 92% of model limit, preserves last 10 messages
- [Ph 0.3] OrchestrationSession persisted in DynamoDB with 35min TTL
- [Ph 0.3] REST response via RestResponseEventEmitter (accumulate + return)
- [Ph 1] ConfirmationFlow: pause/serialize/confirm/reject/timeout works end-to-end
- [Ph 1] Tool loop detection: same tool+args in 2 consecutive rounds is blocked
- [Ph 1] Prompt caching with cache_control reduces input token cost 60-80%
- [Ph 4] WebSocket streaming with text_chunk events in real-time
- [Ph 1] Tool error recovery: errors become observations, LLM retries with corrected args. Max 3 consecutive failures on same tool → respond with available info
- [Ph 2] ResponseVerifier: post-generation check contrasts cited data (fees, metrics, prices) against retrieved data. Discrepancy → clarification or regeneration
- [Ph 2] Extended thinking: selective activation for complex multi-variable analysis (e.g., sales diagnosis across price, positioning, competitors, seasonality). Not active for simple queries
- [Ph 4] SubtaskRunner: independent sub-objectives executed in parallel (e.g., "analyze my top 3 products" → 3 parallel analyses → merged response)
- [Ph 1] end_turn event includes actionsExecuted[] enumerating every WRITE tool result from the turn (toolName, SKU, fieldChanged, previousValue, newValue, success)
- [F 0.3] Historial de conversacion enviado al LLM en cada turno (fix: dato existe en DynamoDB, solo cargar y pasar)
- [F 0.3] Loop ReAct activo con guard MAX_ROUNDS=10 + cost guard (50K tokens)
- [F 0.3] Compactacion de contexto se dispara al 92% del limite del modelo, preserva ultimos 10 mensajes
- [F 0.3] OrchestrationSession persistida en DynamoDB con TTL de 35min
- [F 0.3] Respuesta REST via RestResponseEventEmitter (acumular + retornar)
- [F 1] ConfirmationFlow: pause/serializar/confirmar/rechazar/timeout funciona end-to-end
- [F 1] Deteccion de loop de tools: misma tool+args en 2 rondas consecutivas se bloquea
- [F 1] Prompt caching con cache_control reduce costo de tokens de entrada 60-80%
- [F 4] WebSocket streaming con eventos text_chunk en tiempo real
- [F 1] Recuperacion de errores de tools: errores se convierten en observaciones, LLM reintenta con args corregidos. Max 3 fallos consecutivos en misma tool → responder con info disponible
- [F 2] ResponseVerifier: verificacion post-generacion contrasta datos citados (fees, metricas, precios) contra datos recuperados. Discrepancia → aclaracion o regeneracion
- [F 2] Razonamiento extendido: activacion selectiva para analisis complejos multi-variable (ej: diagnostico de ventas cruzando precio, posicionamiento, competidores, temporada). No se activa para consultas simples
- [F 4] SubtaskRunner: sub-objetivos independientes ejecutados en paralelo (ej: "analiza mis 3 productos mas vendidos" → 3 analisis en paralelo → respuesta fusionada)
- [F 1] Evento end_turn incluye actionsExecuted[] listando cada resultado de WRITE tool del turno (toolName, SKU, fieldChanged, previousValue, newValue, success)
MAX_ROUNDS: 10 · Cost guard: 50K tokens · Compaction: 92% · Truncation: 4K tokens · Confirmation TTL: 35min · Cache hit L1: >95%
How It Works — State MachineComo Funciona — Maquina de Estados
(user message arrives via REST / WebSocket)
|
v
PREPARE
+-- SystemPromptComposer.compose(marketplace, userProfile, tools, health)
+-- ContextWindowManager.guard(messages) <- compaction if >92%
+-- messages.push({ role: 'user', content })
|
v (currentRound++)
REASON <- ILLMClient.generate(messages, toolDefinitions)
|
|-- text only? ----------------------------------------> RESPOND -> DONE
|
|-- tool_use blocks? ----------------------------------------
| |
| ToolPolicyFilter.check(toolName, userId, plan) |
| +-- denied -> append tool_result(error) -> next round|
| +-- read_only -> EXECUTE -> append result -> next round |
| +-- write -> AWAIT_CONFIRMATION |
| | |
| serialize OrchestrationSession to DynamoDB |
| IEventEmitter.emit(confirmation_required) |
| (Lambda terminates, awaits reconnection) |
| | |
| confirmed | rejected | timeout(30min) |
| | | | |
| EXECUTE append msg DONE |
| | | |
| append result next round |
| | |
+-------------------+---------------------------------------
|
|-- currentRound == MAX_ROUNDS? ----------------> force RESPOND
|-- costAccumulator.totalTokens > 50K? ---------> force RESPOND
|
v
RESPOND
+-- IEventEmitter.emit({ type: 'end_turn', ... })
+-- IOrchestrationTracer.completeExecution()
+-- MessageRepository.save(assistantMessage)
+-- ConversationRepository.update()
|
v
DONE
The ReAct Orchestrator replaces the one-shot ConversationFlowOrchestrator. Today: HandleConversationQueryUseCase -> ConversationFlowOrchestrator runs a fixed pipeline (embedding -> vector search -> brand health -> single LLM call -> save). No loop, no tools, no history sent. The new orchestrator is a state machine: PREPARE builds context, REASON calls the LLM, DECIDE evaluates the output (text-only, tool_use, max_rounds, cost_guard), ACT executes tools or pauses for confirmation, OBSERVE appends results and loops. Each dependency is a port: ILLMClient (exists), IToolExecutor (Phase 1), ISystemPromptComposer (Phase 1), IContextWindowManager (new), IOrchestrationRepository (new), IEventEmitter (new), ICreditsGate (Phase 1), IOrchestrationTracer (reused). The orchestrator doesn't know which LLM model it uses, how events reach the client, or how credits are calculated.El Orquestador ReAct reemplaza el ConversationFlowOrchestrator one-shot. Hoy: HandleConversationQueryUseCase -> ConversationFlowOrchestrator ejecuta un pipeline fijo (embedding -> busqueda vectorial -> brand health -> un solo LLM call -> guardar). Sin loop, sin tools, sin historial enviado. El nuevo orquestador es una maquina de estados: PREPARE construye contexto, REASON llama al LLM, DECIDE evalua el output (solo-texto, tool_use, max_rounds, cost_guard), ACT ejecuta tools o pausa para confirmacion, OBSERVE agrega resultados y repite. Cada dependencia es un puerto: ILLMClient (existe), IToolExecutor (Fase 1), ISystemPromptComposer (Fase 1), IContextWindowManager (nuevo), IOrchestrationRepository (nuevo), IEventEmitter (nuevo), ICreditsGate (Fase 1), IOrchestrationTracer (reutilizado). El orquestador no sabe que modelo LLM usa, como los eventos llegan al cliente, ni como se calculan los creditos.
Scope Boundaries — What This Layer Does NOT OwnLimites de Alcance — Lo Que Esta Capa NO Controla
ToolRegistry / ToolDispatcher
Owned by Tool Registry (#3). Orchestrator calls IToolExecutor.execute() only.Pertenece a Tool Registry (#3). El orquestador solo llama IToolExecutor.execute().
Credit VerificationVerificacion de Creditos
Owned by Billing & Credit Economy (#13). Orchestrator calls ICreditsGate.canProceed().Pertenece a Billing & Credit Economy (#13). El orquestador llama ICreditsGate.canProceed().
TraceabilityTrazabilidad
Owned by Observability (#8). Orchestrator notifies IOrchestrationTracer (fire-and-forget).Pertenece a Observability (#8). El orquestador notifica IOrchestrationTracer (fire-and-forget).
WebSocket / SSE
Transport adapters via IEventEmitter. Changing transport doesn't touch the loop.Adapters de transporte via IEventEmitter. Cambiar transporte no toca el loop.
Context Aggregator
Owned by #5. Orchestrator calls IContextAssembler for user prompt context (KB + Brand Health RAG).Pertenece a #5. El orquestador llama IContextAssembler para contexto del user prompt (KB + Brand Health RAG).
Prompt ContentContenido del Prompt
Layer 1 content (personality, tone) is config, not orchestrator logic.El contenido de Layer 1 (personalidad, tono) es configuracion, no logica del orquestador.
Implementation Plan (Phased)Plan de Implementacion (Por Fases)
Phase 0.3: ReAct Loop WITHOUT Tools (Immediate Priority)Fase 0.3: Loop ReAct SIN Tools (Prioridad Inmediata)
ReActOrchestrator replaces ConversationFlowOrchestrator as main orchestrator. OrchestrationSession domain model with ULID. Conversation history loaded from DynamoDB and sent to LLM on every turn (critical fix — data exists, just load and pass). IContextWindowManager: compaction at 92% model limit, preserves last 10 messages + active round tool_results. IOrchestrationRepository + DynamoDB adapter (35min TTL). RestResponseEventEmitter: accumulates events, returns complete response at end via REST. MAX_ROUNDS guard + cost guard (50K tokens). Even without tools, the LLM can reason in multiple rounds over available context.ReActOrchestrator reemplaza ConversationFlowOrchestrator como orquestador principal. Modelo de dominio OrchestrationSession con ULID. Historial de conversacion cargado de DynamoDB y enviado al LLM en cada turno (fix critico — el dato existe, solo cargarlo y pasarlo). IContextWindowManager: compactacion al 92% del limite del modelo, preserva ultimos 10 mensajes + tool_results del round activo. IOrchestrationRepository + adaptador DynamoDB (TTL 35min). RestResponseEventEmitter: acumula eventos, retorna respuesta completa al final via REST. Guard MAX_ROUNDS + cost guard (50K tokens). Aun sin tools, el LLM puede razonar en multiples rondas sobre el contexto disponible.
Phase 1: ConfirmationFlow + Tool IntegrationFase 1: ConfirmationFlow + Integracion de Tools
ConfirmationFlow state machine: PENDING -> CONFIRMED / REJECTED / EXPIRED. Session serialized to DynamoDB before pause, restored on confirm. ISystemPromptComposer with 3 layers: L1 BaseCoachBlock (~500 tok, >95% cache hit), L2 SessionBlock (~150-300 tok, ~70% cache hit), L3 ExecutionBlock (~100-150 tok, ~90% cache hit). Integration with IToolExecutor from ToolRegistry (#3). Prompt caching with Anthropic cache_control headers. ICreditsGate: verify quota before each tool call. Tool loop detection: same tool+args in 2 consecutive rounds -> block and notify. ToolErrorRecovery: when a tool returns an error, it becomes an observation in the ReAct loop — the LLM sees the error, reasons about what went wrong, and can retry with corrected arguments or call a different tool. Max 3 consecutive failures on the same tool → Coach responds with available information. InputGuard integration: IGuardService.validateInput() runs before the loop; IGuardService.validateOutput() runs after (#7).Maquina de estados ConfirmationFlow: PENDING -> CONFIRMED / REJECTED / EXPIRED. Sesion serializada a DynamoDB antes de pausar, restaurada al confirmar. ISystemPromptComposer con 3 capas: L1 BaseCoachBlock (~500 tok, >95% cache hit), L2 SessionBlock (~150-300 tok, ~70% cache hit), L3 ExecutionBlock (~100-150 tok, ~90% cache hit). Integracion con IToolExecutor del ToolRegistry (#3). Prompt caching con headers cache_control de Anthropic. ICreditsGate: verificar cuota antes de cada tool call. Deteccion de loop de tools: misma tool+args en 2 rondas consecutivas -> bloquear y notificar. ToolErrorRecovery: cuando una tool retorna un error, se convierte en observacion en el loop ReAct — el LLM ve el error, razona sobre que salio mal, y puede reintentar con argumentos corregidos o llamar a otra tool. Max 3 fallos consecutivos en la misma tool → el Coach responde con la informacion disponible. Integracion con InputGuard: IGuardService.validateInput() corre antes del loop; IGuardService.validateOutput() corre despues (#7).
Phase 2: Verification + Extended ThinkingFase 2: Verificacion + Razonamiento Extendido
ResponseVerifier: post-generation step that contrasts cited data (fees, metrics, prices) against retrieved data from tools and KB. If discrepancy detected → adds clarification or regenerates with instruction to use correct data. User only sees the final verified response. Extended thinking: for complex multi-variable analysis, the orchestrator assigns an extended reasoning budget to the LLM. Activation is selective — simple queries ("how much stock?") don't trigger it, complex diagnostics (sales analysis across price, positioning, competitors, seasonality) do. The thinking process is not visible to the user.ResponseVerifier: paso post-generacion que contrasta datos citados (fees, metricas, precios) contra datos recuperados de tools y KB. Si detecta discrepancia → agrega aclaracion o regenera con instruccion de usar el dato correcto. El usuario solo ve la respuesta final verificada. Razonamiento extendido: para analisis complejos multi-variable, el orquestador asigna un presupuesto de razonamiento extendido al LLM. La activacion es selectiva — consultas simples ("¿cuanto stock tengo?") no lo activan, diagnosticos complejos (analisis de ventas cruzando precio, posicionamiento, competidores, temporada) si. El proceso de pensamiento no es visible para el usuario.
Phase 4: Streaming + Proactive Mode + SubtaskRunnerFase 4: Streaming + Modo Proactivo + SubtaskRunner
WebSocketEventEmitter replaces RestResponseEventEmitter. Real-time streaming: text_chunk events while LLM generates. Session restoration on WebSocket reconnection. ProactiveMode: separate entry point via EventBridge Scheduler, different context construction (no conversation history), output via push notification. Shares the ReAct loop and ILLMClient but has its own context builder and completion condition. SubtaskRunner: when the user asks a query with multiple independent parts ("analyze my top 3 products"), the orchestrator decomposes into parallel sub-tasks. Each sub-task has its own reasoning loop and tools. Results are merged into a unified response. Only activates when sub-objectives are truly independent (product A analysis doesn't affect product B analysis). Same consolidated response — just faster.WebSocketEventEmitter reemplaza RestResponseEventEmitter. Streaming en tiempo real: eventos text_chunk mientras el LLM genera. Restauracion de sesion en reconexion WebSocket. ProactiveMode: entry point separado via EventBridge Scheduler, construccion de contexto diferente (sin historial de conversacion), salida via push notification. Comparte el loop ReAct e ILLMClient pero tiene su propio constructor de contexto y condicion de finalizacion. SubtaskRunner: cuando el usuario hace una consulta con multiples partes independientes ("analiza mis 3 productos mas vendidos"), el orquestador descompone en sub-tareas paralelas. Cada sub-tarea tiene su propio loop de razonamiento y tools. Los resultados se fusionan en una respuesta unificada. Solo se activa cuando los sub-objetivos son verdaderamente independientes (el analisis del producto A no afecta al del producto B). Misma respuesta consolidada — solo mas rapida.
Risk AnalysisAnalisis de Riesgos
LLM Infinite LoopLoop Infinito del LLM
Impact: HighImpacto: Alto
Mitigation: MAX_ROUNDS=10 + system message forcing text-only response without toolDefinitions. Cost guard at 50K tokens. Tool loop detection: same tool+args in consecutive rounds -> block with "no new information available" message. D4: if tasks regularly need 10 rounds, the problem is tool design, not the limit.Mitigacion: MAX_ROUNDS=10 + mensaje de sistema forzando respuesta solo-texto sin toolDefinitions. Cost guard a 50K tokens. Deteccion de loop de tools: misma tool+args en rondas consecutivas -> bloquear con mensaje "no hay nueva informacion disponible". D4: si tareas regularmente necesitan 10 rondas, el problema es el diseno de tools, no el limite.
ConfirmationFlow State CorruptionCorrupcion de Estado en ConfirmationFlow
Impact: HighImpacto: Alto
Mitigation: OrchestrationSession fully serialized to DynamoDB before pausing. TTL 35min auto-expires stale sessions. On resume, verify consistency before executing tool. Resilient to hot deploys during confirmation wait.Mitigacion: OrchestrationSession serializada completamente a DynamoDB antes de pausar. TTL 35min auto-expira sesiones obsoletas. Al reanudar, verificar consistencia antes de ejecutar tool. Resistente a deploys en caliente durante espera de confirmacion.
Context Window OverflowDesbordamiento de Ventana de Contexto
Impact: MediumImpacto: Medio
Mitigation: ContextWindowManager.guard() triggers compaction at 92% before LLM call. Tool results >4000 tokens truncated to summary. Prompt caching on L1-L2 reduces size of static layers. Silent LLM truncation is the real danger — guard prevents it.Mitigacion: ContextWindowManager.guard() dispara compactacion al 92% antes del LLM call. Resultados de tools >4000 tokens truncados a resumen. Prompt caching en L1-L2 reduce tamano de capas estaticas. La truncacion silenciosa del LLM es el peligro real — el guard la previene.
Lambda Timeout in Long LoopsTimeout de Lambda en Loops Largos
Impact: MediumImpacto: Medio
Mitigation: Lambda configured at 60s. Each tool has internal 10s timeout. A 5-round loop with marketplace reads can exceed 60s. In Phase 4 WebSocket eliminates Lambda timeout as UX limiter. Orphan sessions (status='running', startedAt >65s) detectable — client retries using saved history.Mitigacion: Lambda configurada a 60s. Cada tool tiene timeout interno de 10s. Un loop de 5 rondas con lecturas de marketplace puede exceder 60s. En Fase 4 WebSocket elimina el timeout de Lambda como limitante de UX. Sesiones huerfanas (status='running', startedAt >65s) detectables — el cliente reintenta usando historial guardado.
Key DecisionsDecisiones Clave
Stateful orchestrator, not a procedural pipeline — The current pipeline is a function: fixed input, fixed steps, fixed output. The ReAct orchestrator is a state machine: the LLM decides what to do next based on history and available tools. This cannot be patched on top of ConversationFlowOrchestrator — it requires redesigning the orchestration entry point.Orquestador con estado, no pipeline procedural — El pipeline actual es una funcion: entrada fija, pasos fijos, salida fija. El orquestador ReAct es una maquina de estados: el LLM decide que hacer a continuacion basado en el historial y las herramientas disponibles. Esto no se puede parchar encima del ConversationFlowOrchestrator — requiere redisenar el entry point de orquestacion.
IEventEmitter separates orchestration from transport — The orchestrator emits events without knowing how they reach the client. Phase 0.3: RestResponseEventEmitter accumulates and responds at end. Phase 4: WebSocketEventEmitter streams in real-time. Changing transport doesn't touch the loop.IEventEmitter separa orquestacion de transporte — El orquestador emite eventos sin saber como llegan al cliente. Fase 0.3: RestResponseEventEmitter acumula y responde al final. Fase 4: WebSocketEventEmitter hace streaming en tiempo real. Cambiar el transporte no toca el loop.
ConfirmationFlow persists in DynamoDB, not in memory — A user confirmation can take minutes. Lambda cannot stay alive waiting. Session serialized before pause, restored on confirm. Also resilient to hot deploys during the wait.ConfirmationFlow persiste en DynamoDB, no en memoria — Una confirmacion de usuario puede tardar minutos. Lambda no puede mantenerse viva esperando. Sesion serializada antes de pausar, restaurada al confirmar. Tambien resistente a deploys en caliente durante la espera.
MAX_ROUNDS=10 is a fail-safe, not a design limit — System prompt and tools should be designed so the Coach resolves tasks in 3-5 rounds max. If a task regularly needs 10 rounds, the problem is tool design, not the limit.MAX_ROUNDS=10 es un fail-safe, no un limite de diseno — El system prompt y las tools deben estar disenados para que el Coach resuelva tareas en 3-5 rondas maximo. Si una tarea regularmente necesita 10 rondas, el problema es el diseno de tools, no el limite.
RagOrchestrator survives as KB context provider — The existing RAG pipeline doesn't disappear — it becomes a service invoked by the orchestrator to enrich context before the first LLM call. BrandHealthContextService remains as seller health provider. KB and brand health go from fixed pipeline steps to inputs for context composition.RagOrchestrator subsiste como proveedor de contexto KB — El pipeline RAG existente no desaparece — se convierte en un servicio invocado por el orquestador para enriquecer contexto antes del primer LLM call. BrandHealthContextService se mantiene como proveedor de salud del vendedor. KB y brand health pasan de ser pasos fijos del pipeline a ser insumos de la composicion del contexto.
Dependency Map (Ports)Mapa de Dependencias (Puertos)
ReActOrchestrator +-- ILLMClient <- LLMClientFactory (exists) +-- IToolExecutor <- ToolRegistry (Phase 1 — #3) +-- ISystemPromptComposer <- new (Phase 1 — requires UserProfile) +-- IContextWindowManager <- new (Phase 0.3) +-- IOrchestrationRepository <- new + DynamoDB adapter (Phase 0.3) +-- IEventEmitter <- RestResponseEventEmitter (Phase 0.3) | WebSocketEventEmitter (Phase 4) +-- ICreditsGate <- new interface over billing (Phase 1 — #13) +-- IOrchestrationTracer <- ConversationTrackingOrchestrator (exists) +-- IContextAssembler <- RagOrchestrator (Phase 0.2 — #5) +-- IGuardService <- GuardService (Phase 1 — #7)
MVP Scope
Phase 0.3: ReAct loop with conversation history, context compaction, MAX_ROUNDS + cost guard, REST response (no streaming). Phase 1: ConfirmationFlow + tools + prompt caching. Proactive mode deferred to Phase 4. Fase 0.3: Loop ReAct con historial de conversacion, compactacion de contexto, MAX_ROUNDS + cost guard, respuesta REST (sin streaming). Fase 1: ConfirmationFlow + tools + prompt caching. Modo proactivo diferido a Fase 4.
Inspired byInspirado en
ConversationFlowOrchestrator (existing pipeline). HandleConversationQueryUseCase (entry point). Claude Code port-based architecture pattern. ConversationFlowOrchestrator (pipeline existente). HandleConversationQueryUseCase (entry point). Patron de arquitectura basada en puertos de Claude Code.
📝 Project ChangelogChangelog del Proyecto
Tool Registry & Policy Engine
Tools — Mateo
The agent's hands. 36 primitive tools that the autonomous agent composes into open-ended workflows at runtime via the ReAct loop. Each tool is a ToolDefinition (name, inputSchema, riskLevel, marketplace, category, phase, estimatedTokens) registered in the ToolRegistry at Lambda startup. The orchestrator only sees IToolExecutor — it never touches the registry, policies, or handlers directly. ToolPolicyFilter gates access by marketplace and risk level. HookLifecycle (before_tool → execute → after_tool) captures observability, triggers proactive suggestions, and logs WRITE actions for impact measurement (#15). SubtaskRunner executes multiple read_only tools in parallel via Promise.all. SessionResultCache prevents redundant API calls within a session — if the Coach calls the same tool with the same arguments twice in the same loop, the second call returns the cached result instead of hitting the external API. Only applies to READ and ANALYSIS tools — WRITE tools are never cached because their effects may change system state. No separate "Skills" layer — the LLM composes tools directly. Las manos del agente. 36 herramientas primitivas que el agente autonomo compone en flujos abiertos en tiempo de ejecucion via el loop ReAct. Cada herramienta es una ToolDefinition (name, inputSchema, riskLevel, marketplace, category, phase, estimatedTokens) registrada en el ToolRegistry al arrancar Lambda. El orquestador solo ve IToolExecutor — nunca toca el registry, las politicas, ni los handlers directamente. ToolPolicyFilter controla acceso por marketplace y nivel de riesgo. HookLifecycle (before_tool → execute → after_tool) captura observabilidad, dispara sugerencias proactivas, y registra acciones WRITE para medicion de impacto (#15). SubtaskRunner ejecuta multiples tools read_only en paralelo via Promise.all. SessionResultCache evita llamadas redundantes a APIs dentro de una sesion — si el Coach llama a la misma tool con los mismos argumentos dos veces en el mismo loop, la segunda llamada retorna el resultado cacheado en vez de llamar a la API externa. Solo aplica a tools READ y ANALYSIS — tools WRITE nunca se cachean porque sus efectos pueden cambiar el estado del sistema. No hay capa de "Skills" separada — el LLM compone las tools directamente.
Beautonomous governance: ToolPolicyFilter implements Core's risk taxonomy and permission matrix — WRITE tools require role confirmation via ConfirmationFlow before execution. HookLifecycle is the enforcement point where Core's permission gates are invoked on every tool call.Governance de Beautonomous: ToolPolicyFilter implementa la taxonomía de riesgo y la matriz de permisos de Core — las WRITE tools requieren confirmación de rol vía ConfirmationFlow antes de ejecutarse. HookLifecycle es el punto de aplicación donde los gates de permisos de Core se invocan en cada tool call.
Current StateEstado Actual
ReusableReutilizable
ILLMClient + LLMClientFactory (extensible for tool_use), AnthropicClient (operational, needs tool_use support), ConversationTrackingOrchestrator (reusable as IOrchestrationTracer for ObservabilityHook), BrandHealthContextService (automatic context, unaffected by tools layer)ILLMClient + LLMClientFactory (extensible para tool_use), AnthropicClient (operacional, necesita soporte tool_use), ConversationTrackingOrchestrator (reutilizable como IOrchestrationTracer para ObservabilityHook), BrandHealthContextService (contexto automatico, no afectado por capa de tools)
Needs ImprovementNecesita Mejora
ILLMClient.generateAnswer(query, chunks) — RAG-specific signature incompatible with tool_use. Needs new generate(messages, tools) → ContentBlock[] method. AnthropicClient.chat() doesn't send tools[] or parse tool_use blocks.ILLMClient.generateAnswer(query, chunks) — firma RAG-especifica incompatible con tool_use. Necesita nuevo metodo generate(messages, tools) → ContentBlock[]. AnthropicClient.chat() no envia tools[] ni parsea bloques tool_use.
To BuildPor Construir
ToolDefinition domain types, IToolExecutor, IToolRegistry + ToolRegistry, IToolPolicyFilter + ToolPolicyFilter, IToolHook + ObservabilityHook, ToolExecutor with HookLifecycle + SubtaskRunner, SessionResultCache (per-session dedup for READ/ANALYSIS), all 36 tool handlers across 5 phases.ToolDefinition domain types, IToolExecutor, IToolRegistry + ToolRegistry, IToolPolicyFilter + ToolPolicyFilter, IToolHook + ObservabilityHook, ToolExecutor con HookLifecycle + SubtaskRunner, SessionResultCache (dedup por sesion para READ/ANALYSIS), los 36 handlers de tools en 5 fases.
| Tool | DescriptionDescripción | Risk |
|---|---|---|
| search_market_products | Search similar products in the marketplace. fields param controls result tokensBusca productos similares en el marketplace. Param fields controla tokens del resultado | read_only |
| get_competitor_product | Full listing of a competitor product by ID or URLListing completo de un producto competidor por ID o URL | read_only |
| get_market_pricing | Market price range for a product or categoryRango de precios del mercado para un producto o categoría | read_only |
| get_keyword_data | Search volume and competition for keywordsVolumen de búsqueda y competencia de palabras clave | read_only |
| analyze_product_image | Evaluates a product image against marketplace standardsEvalúa una imagen de producto contra los estándares del marketplace | read_only |
| enhance_product_image | Enhances an existing product photo — does not generate from scratchMejora una foto de producto existente — no genera desde cero | read_only |
| analyze_product_video | Technical checklist: duration, quality, guideline complianceChecklist técnica: duración, calidad, cumplimiento de guidelines | read_only |
| get_product_fee_estimate | Fee and cost estimation for a given product and priceEstimación de comisiones y costos para un producto y precio dados | read_only |
update_product_content, publish_product, update_product_images, update_product_video, update_price, update_stock, pause_product, activate_product, close_product, answer_question, hide_question, send_buyer_message, request_review
update_user_profile | create_campaign, update_campaign, pause_campaign, activate_campaign
Single port for orchestratorPuerto unico para orquestador
Marketplace + risk gatesGates marketplace + riesgo
before → execute → afterbefore → execute → after
Promise.all for read_onlyPromise.all para read_only
Same tool+args → cached (READ/ANALYSIS only)Misma tool+args → cacheado (solo READ/ANALYSIS)
WRITE only — snapshot before + log after → #15Solo WRITE — snapshot antes + log despues → #15
Scope Boundaries — What this layer does NOT doLimites de Alcance — Lo que esta capa NO hace
Tech Stack (TypeScript — AWS Lambda)Stack Tecnologico (TypeScript — AWS Lambda)
Data Models, API Signatures & Acceptance Criteria Modelos de Datos, APIs & Criterios de Aceptación
interface ToolDefinition {
name: string // Name the LLM uses to invoke
description: string // description_for_llm — optimized for correct selection
inputSchema: JSONSchema // Parameters (JSON Schema format)
riskLevel: ToolRiskLevel // 'read_only' | 'reversible' | 'irreversible'
marketplace: Marketplace[] // Compatible marketplaces. [] = all
category: ToolCategory // 'READ' | 'ANALYSIS' | 'SYSTEM' | 'WRITE'
phase: Phase // Implementation phase (1-5)
estimatedTokens?: number // Result cost estimate (for cost guard)
}
interface ToolContext {
userId: string
marketplace: Marketplace
plan: 'free' | 'pro'
conversationId: string
executionId?: string // From IOrchestrationTracer
}
interface ToolResult {
toolName: string
content: unknown // Raw result
truncated: boolean // true if truncated by ContextWindowManager
latencyMs: number
tokensEstimate: number
}
type PolicyDecision =
| { allowed: true; requiresConfirmation: false }
| { allowed: true; requiresConfirmation: true; preview: ConfirmationPreview }
| { allowed: false; reason: string }
// The ONLY port the orchestrator sees
interface IToolExecutor {
getToolDefinitions(context: ToolContext): ToolDefinition[]
execute(toolName: string, toolArgs: Record<string, unknown>,
context: ToolContext): Promise<ToolResult>
}
interface IToolRegistry {
register(def: ToolDefinition, handler: IToolHandler): void
registerRemote(def: ToolDefinition, dispatcher: IRemoteDispatcher): void
getDefinitions(context: ToolContext): ToolDefinition[]
getHandler(toolName: string): IToolHandler | IRemoteDispatcher
}
interface IToolPolicyFilter {
check(toolName: string, context: ToolContext): PolicyDecision
// Gates: 1. marketplace gate 2. risk gate
}
interface IToolHook {
beforeTool?(toolName: string, args: unknown, ctx: ToolContext): Promise<void>
afterTool?(toolName: string, result: ToolResult, ctx: ToolContext): Promise<void>
}
// Hooks: ObservabilityHook (after), ProactiveSuggestionHook (after),
// FeedbackCaptureHook (before+after, WRITE only → #15)
// Hooks never block execution — fire-and-forget
- ToolRegistry loads all tool definitions at Lambda startup with inputSchema validation
- IToolExecutor.getToolDefinitions() returns only tools available for user's marketplace and plan
- LLM selects correct tool >90% of the time via description_for_llm
- ToolPolicyFilter blocks all WRITE tools without confirmation — no bypass path
- Multiple read_only tools in same round execute via Promise.all (parallel)
- ObservabilityHook fires after every tool execution (fire-and-forget)
- Tool handler timeout of 10s prevents blocking the ReAct loop
- [Ph 1] SessionResultCache: same tool+args within a session returns cached result for READ/ANALYSIS tools. WRITE tools never cached. Cache scoped to session lifetime
- [Ph 2] FeedbackCaptureHook: beforeTool captures product metrics snapshot via Data Sync (#10) for WRITE tools; afterTool logs executed action to Feedback Loop (#15). Fire-and-forget — never blocks execution
- ToolRegistry carga todas las definiciones al arrancar Lambda con validacion de inputSchema
- IToolExecutor.getToolDefinitions() retorna solo tools disponibles para el marketplace y plan del usuario
- LLM selecciona la tool correcta >90% del tiempo via description_for_llm
- ToolPolicyFilter bloquea todas las tools WRITE sin confirmacion — sin bypass
- Multiples tools read_only en el mismo round se ejecutan via Promise.all (paralelo)
- ObservabilityHook se dispara tras cada ejecucion de tool (fire-and-forget)
- Timeout de handler de 10s previene bloquear el loop ReAct
- [Ph 1] SessionResultCache: misma tool+args dentro de una sesion retorna resultado cacheado para tools READ/ANALYSIS. Tools WRITE nunca se cachean. Cache con alcance de vida de la sesion
- [Ph 2] FeedbackCaptureHook: beforeTool captura snapshot de metricas del producto via Data Sync (#10) para tools WRITE; afterTool registra accion ejecutada en Feedback Loop (#15). Fire-and-forget — nunca bloquea ejecucion
36 tools · 4 categories (READ/ANALYSIS/SYSTEM/WRITE) · 5 phases · TypeScript · Internal use only (no REST)
How It WorksComo Funciona
ReActOrchestrator receives tool_use block from LLM
|
v
IToolExecutor.execute(toolName, toolArgs, context)
|
v
+-------------------------------------------------------+
| ToolPolicyFilter.check(toolName, context) |
| ├── marketplace gate: compatible? |
| └── risk gate: read_only or requires confirmation? |
+-------------------------------------------------------+
|
|── denied ──────────────> error to orchestrator
|
|── requiresConfirmation ─> PolicyDecision to orchestrator
| (ConfirmationFlow takes control)
|
|── allowed, read_only ─────────────────────────
| |
v |
+-----------------------------+ |
| IToolHook.beforeTool() | |
+-----------------------------+ |
| |
v |
+-----------------------------+ |
| IToolHandler.execute() | multiple read_only |
| (tool handler) | → Promise.all |
+-----------------------------+ |
| |
v |
+-----------------------------+ |
| IToolHook.afterTool() | ← ObservabilityHook |
| | ← ProactiveSuggHook |
+-----------------------------+ |
| |
└───────────────────────────────────────────────
|
v
ToolResult → append as tool_result → next round
The orchestrator calls IToolExecutor.execute() — it never touches ToolRegistry, ToolPolicyFilter, or handlers directly. ToolPolicyFilter checks marketplace compatibility and risk level. Denied tools return an error. Tools requiring confirmation return a PolicyDecision and the ConfirmationFlow (orchestrator #2) takes control. Allowed read_only tools pass through the HookLifecycle: beforeTool → handler execution → afterTool. When the LLM requests multiple read_only tools in the same round, SubtaskRunner executes them in parallel via Promise.all. Write tools are always sequential with mandatory confirmation. ObservabilityHook logs every execution to IOrchestrationTracer (fire-and-forget). ProactiveSuggestionHook evaluates Next Best Action after tool results.El orquestador llama a IToolExecutor.execute() — nunca toca ToolRegistry, ToolPolicyFilter ni handlers directamente. ToolPolicyFilter verifica compatibilidad de marketplace y nivel de riesgo. Tools denegadas retornan error. Tools que requieren confirmacion retornan PolicyDecision y el ConfirmationFlow (orquestador #2) toma control. Tools read_only permitidas pasan por el HookLifecycle: beforeTool → ejecucion del handler → afterTool. Cuando el LLM solicita multiples tools read_only en el mismo round, SubtaskRunner las ejecuta en paralelo via Promise.all. Tools de escritura son siempre secuenciales con confirmacion obligatoria. ObservabilityHook registra cada ejecucion en IOrchestrationTracer (fire-and-forget). ProactiveSuggestionHook evalua Next Best Action despues de resultados de tools.
Implementation Plan (5 Phases)Plan de Implementacion (5 Fases)
Phase 1: Infrastructure (prerequisite for all tools)Fase 1: Infraestructura (prerequisito para todas las tools)
ToolDefinition, ToolRiskLevel, ToolCategory domain types. IToolExecutor interface. IToolRegistry + ToolRegistry (register + lookup). IToolPolicyFilter + ToolPolicyFilter (marketplace + risk gates). IToolHook + ObservabilityHook. ToolExecutor implementation with HookLifecycle and SubtaskRunner (Promise.all for read_only). SessionResultCache: per-session deduplication for READ/ANALYSIS tools — same tool+args returns cached result, WRITE tools never cached, cache scoped to session lifetime. Integrate IToolExecutor into ReActOrchestrator. New ILLMClient method: generate(messages, tools) → ContentBlock[] — shared deliverable with #2, consumed by ReActOrchestrator. Update AnthropicClient to send tools[] and parse tool_use blocks.ToolDefinition, ToolRiskLevel, ToolCategory domain types. IToolExecutor interface. IToolRegistry + ToolRegistry (registro + lookup). IToolPolicyFilter + ToolPolicyFilter (gates marketplace + riesgo). IToolHook + ObservabilityHook. Implementacion ToolExecutor con HookLifecycle y SubtaskRunner (Promise.all para read_only). SessionResultCache: deduplicacion por sesion para tools READ/ANALYSIS — misma tool+args retorna resultado cacheado, tools WRITE nunca se cachean, cache con alcance de vida de sesion. Integrar IToolExecutor en ReActOrchestrator. Nuevo metodo ILLMClient: generate(messages, tools) → ContentBlock[] — entregable compartido con #2, consumido por ReActOrchestrator. Actualizar AnthropicClient para enviar tools[] y parsear bloques tool_use.
Phase 2: 10 READ tools + 1 SYSTEMFase 2: 10 tools READ + 1 SYSTEM
get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics — definitions + handlers connected to marketplace adapters (#12). update_user_profile (SYSTEM) — DynamoDB handler. ProactiveSuggestionHook with LLM evaluation via IProactiveSuggestionService (#6) (max 2 suggestions per turn).get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics — definiciones + handlers conectados a adaptadores de marketplace (#12). update_user_profile (SYSTEM) — handler DynamoDB. ProactiveSuggestionHook con evaluacion LLM via IProactiveSuggestionService (#6) (max 2 sugerencias por turno).
Phase 3: 8 ANALYSIS tools + first WRITEFase 3: 8 tools ANALYSIS + primera WRITE
search_market_products, get_competitor_product, get_market_pricing, get_keyword_data, analyze_product_image, enhance_product_image, analyze_product_video, get_product_fee_estimate. 7 of 8 ANALYSIS tools routed via Enrichment Layer (#11). update_product_content — first reversible tool with real ConfirmationFlow: agent proposes title change, loop pauses, seller accepts/rejects, loop resumes.search_market_products, get_competitor_product, get_market_pricing, get_keyword_data, analyze_product_image, enhance_product_image, analyze_product_video, get_product_fee_estimate. 7 de 8 ANALYSIS tools ruteadas via Enrichment Layer (#11). update_product_content — primera tool reversible con ConfirmationFlow real: agente propone cambio de titulo, loop se pausa, vendedor acepta/rechaza, loop continua.
Phase 4: 12 WRITE tools + streamingFase 4: 12 tools WRITE + streaming
publish_product, update_product_images, update_product_video, update_price, update_stock, pause_product, activate_product, close_product (irreversible), answer_question (irreversible), hide_question, send_buyer_message (irreversible), request_review. ProactiveSuggestionHook Phase 4: parallel LLM inference, deduplication via UserProfile.publish_product, update_product_images, update_product_video, update_price, update_stock, pause_product, activate_product, close_product (irreversible), answer_question (irreversible), hide_question, send_buyer_message (irreversible), request_review. ProactiveSuggestionHook Fase 4: inferencia LLM paralela, deduplicacion via UserProfile.
Phase 5: 4 Advertising toolsFase 5: 4 tools de Advertising
create_campaign, update_campaign, pause_campaign, activate_campaign. Advertising has its own lifecycle and complexity — implemented in a separate phase.create_campaign, update_campaign, pause_campaign, activate_campaign. Advertising tiene su propio ciclo de vida y complejidad — se implementa en fase separada.
Risk AnalysisAnalisis de Riesgos
LLM Selects Wrong ToolLLM Selecciona Tool Incorrecta
Impact: MediumImpacto: Medio
Mitigation: description_for_llm with trigger phrases, explicit exclusions ("does not include..."), and differentiation from similar tools. Evaluation suite of test prompts before production for each new tool.Mitigacion: description_for_llm con frases de detonacion, exclusiones explicitas ("no incluye..."), y diferenciacion de tools similares. Suite de evaluacion de prompts de prueba antes de produccion para cada nueva tool.
WRITE Tool Executed Without ConfirmationTool WRITE Ejecutada Sin Confirmacion
Impact: HighImpacto: Alto
Mitigation: ToolPolicyFilter forces requiresConfirmation: true for all riskLevel != read_only. No bypass path — handler is never invoked until orchestrator receives explicit user confirmation. Architectural constraint.Mitigacion: ToolPolicyFilter fuerza requiresConfirmation: true para todo riskLevel != read_only. Sin bypass — handler nunca se invoca hasta que el orquestador recibe confirmacion explicita del usuario. Restriccion arquitectonica.
External Tool Blocks ReAct LoopTool Externa Bloquea Loop ReAct
Impact: MediumImpacto: Medio
Mitigation: Each handler has 10s internal timeout. Read_only tools execute in parallel (Promise.all). In Phase 4, WebSocket streaming eliminates Lambda timeout as UX bottleneck.Mitigacion: Cada handler tiene timeout interno de 10s. Tools read_only se ejecutan en paralelo (Promise.all). En Fase 4, streaming WebSocket elimina timeout de Lambda como cuello de botella de UX.
Tool Proliferation Degrades SelectionProliferacion de Tools Degrada Seleccion
Impact: MediumImpacto: Medio
Mitigation: ToolPolicyFilter reduces visible catalog per user/phase — LLM only sees tools available for their plan and marketplace. Avoid redundant tools: unify with optional parameters instead of separate entries.Mitigacion: ToolPolicyFilter reduce catalogo visible por usuario/fase — LLM solo ve tools disponibles para su plan y marketplace. Evitar tools redundantes: unificar con parametros opcionales en vez de entradas separadas.
Key DecisionsDecisiones Clave
Primitive tools, not composed skills — 36 primitive tools, each doing one concrete thing. The LLM composes them at runtime in the ReAct loop. No separate "Skills" layer (YAML business capabilities) on top — the composition is the LLM's job.Tools primitivas, no skills compuestas — 36 tools primitivas, cada una haciendo una cosa concreta. El LLM las compone en tiempo de ejecucion en el loop ReAct. Sin capa de "Skills" separada (capacidades de negocio YAML) encima — la composicion es trabajo del LLM.
IToolExecutor as the only port for the orchestrator — ReActOrchestrator never calls ToolRegistry or ToolPolicyFilter directly. Policy check, hook lifecycle, and handler dispatch are encapsulated in ToolExecutor. Adding a new tool never touches the orchestrator.IToolExecutor como unico puerto para el orquestador — ReActOrchestrator nunca llama a ToolRegistry ni ToolPolicyFilter directamente. Policy check, hook lifecycle y dispatch al handler estan encapsulados en ToolExecutor. Agregar una nueva tool nunca toca el orquestador.
HookLifecycle as cross-cutting extension point — Observability and post-tool logic (Next Best Action) live in hooks, not in handlers or the orchestrator. Adding logging or proactive suggestions is a hook addition, not a handler change.HookLifecycle como punto de extension transversal — Observabilidad y logica post-tool (Next Best Action) viven en hooks, no en handlers ni en el orquestador. Agregar logging o sugerencias proactivas es una adicion de hook, no un cambio en handlers.
READ/ANALYSIS always local; WRITE candidates for separate Lambda — In Phases 1-3, all handlers run in this Lambda. In Phase 4+, WRITE handlers may move to a separate project if read/write SLAs diverge. IRemoteDispatcher in ToolRegistry prepares this transition.READ/ANALYSIS siempre locales; WRITE candidatas a Lambda separada — En Fases 1-3, todos los handlers corren en este Lambda. En Fase 4+, handlers WRITE pueden moverse a un proyecto separado si los SLAs de lectura/escritura divergen. IRemoteDispatcher en ToolRegistry prepara esta transicion.
Automatic contexts are NOT tools — KB and Brand Health are user prompt inputs via RAG (#5), UserProfile and critical alerts are system prompt L2 inputs (#4). None are tool results. Keeping them separate means the LLM doesn't decide "when to query the KB" — KB is always available. Simplifies the visible tool catalog.Contextos automaticos NO son tools — KB y Brand Health son inputs del user prompt via RAG (#5), UserProfile y alertas criticas son inputs L2 del system prompt (#4). Ninguno son resultados de tools. Mantenerlos separados significa que el LLM no decide "cuando consultar la KB" — la KB siempre esta disponible. Simplifica el catalogo visible de tools.
MVP Scope
Phase 1 infrastructure + Phase 2 (10 READ + 1 SYSTEM). 36 tools total across 5 phases. Internal use only — no REST endpoints exposed. ToolPolicyFilter gates by marketplace and plan. Infraestructura Fase 1 + Fase 2 (10 READ + 1 SYSTEM). 36 tools total en 5 fases. Uso interno solamente — sin endpoints REST expuestos. ToolPolicyFilter filtra por marketplace y plan.
Inspired byInspirado en
Claude Code primitive tools pattern — small, composable, open-ended. Anthropic tool_use API. Patron de tools primitivas de Claude Code — pequenas, componibles, posibilidades abiertas. API tool_use de Anthropic.
📝 Project ChangelogChangelog del Proyecto
Personality Engine
Intelligence — Mateo
The Coach's identity in every invocation. ISystemPromptComposer assembles the system prompt from 3 layers: L1 — Base identity (~500 tokens, static, always cached) + L2 — Session context (UserProfile + critical Critique alerts, ~150-300 tokens, cacheable) + L3 — Execution mode (write confirmation guardrails, ~100-150 tokens, conditional on write_capable). Returns SystemPromptBlock[] — not a string — so AnthropicClient can apply cache_control per block. Marketplace terminology lives in the KB semantic search, NOT in the system prompt. Today: COACH_SYSTEM_PROMPT is a static string literal (~500 tokens). The compositor doesn't exist yet — it's introduced when UserProfile needs dynamic injection (Phase 0.2). La identidad del Coach en cada invocacion. ISystemPromptComposer ensambla el system prompt desde 3 capas: L1 — Identidad base (~500 tokens, estatica, siempre cacheada) + L2 — Contexto de sesion (UserProfile + alertas criticas Critique, ~150-300 tokens, cacheable) + L3 — Modo de ejecucion (guardrails de confirmacion para escritura, ~100-150 tokens, condicional a write_capable). Retorna SystemPromptBlock[] — no un string — para que AnthropicClient pueda aplicar cache_control por bloque. La terminologia de marketplace vive en la busqueda semantica de la KB, NO en el system prompt. Hoy: COACH_SYSTEM_PROMPT es un string literal estatico (~500 tokens). El compositor aun no existe — se introduce cuando UserProfile necesita inyeccion dinamica (Fase 0.2).
Beautonomous governance: L3 execution mode block embeds Core's role-based governance rules at the LLM reasoning level — when write_capable=true, the system prompt includes the confirmation guardrails that enforce Core's ConfirmationFlow behavior before the LLM can propose any WRITE action.Governance de Beautonomous: el bloque L3 de modo de ejecución embebe las reglas de governance basadas en roles de Core a nivel de razonamiento LLM — cuando write_capable=true, el system prompt incluye los guardrails de confirmación que aplican el comportamiento del ConfirmationFlow de Core antes de que el LLM pueda proponer cualquier acción WRITE.
Current StateEstado Actual
Existing (L1 content)Existente (contenido L1)
COACH_SYSTEM_PROMPT in CoachPrompts.ts — static string literal (~500 tokens). Covers: identity/role, general rules (neutral Spanish, direct tone, no fabricated data), KB context usage, Brand Health vocabulary (Critique/Delicate/Good/Optimal), prompt injection guard, response format. FlashCoachPrompts.ts and FlashAnalysisPrompts.ts are separate flows with their own prompts — not managed by this system.COACH_SYSTEM_PROMPT en CoachPrompts.ts — string literal estatico (~500 tokens). Cubre: identidad/rol, reglas generales (espanol neutro, tono directo, no fabricar datos), uso de contexto KB, vocabulario Brand Health (Critique/Delicate/Good/Optimal), guard de prompt injection, formato de respuesta. FlashCoachPrompts.ts y FlashAnalysisPrompts.ts son flujos separados con sus propios prompts — no gestionados por este sistema.
Needs RefactorNecesita Refactor
COACH_SYSTEM_PROMPT as string → BASE_COACH_BLOCK with cache_control (Phase 1.9). ChatOptions.systemPrompt?: string → accept string | SystemPromptBlock[] for backward compatibility (Phase 1.9).COACH_SYSTEM_PROMPT como string → BASE_COACH_BLOCK con cache_control (Fase 1.9). ChatOptions.systemPrompt?: string → aceptar string | SystemPromptBlock[] para compatibilidad (Fase 1.9).
To BuildPor Construir
ISystemPromptComposer interface + types (Phase 0.2). SystemPromptComposer with L1 + L2 UserProfile (Phase 0.2). L2 critical Critique alerts from getHealthSummary() (Phase 1.8). Prompt caching in AnthropicClient (Phase 1.9). L3 write confirmation guardrails (Phase 3).ISystemPromptComposer interface + tipos (Fase 0.2). SystemPromptComposer con L1 + L2 UserProfile (Fase 0.2). L2 alertas criticas Critique desde getHealthSummary() (Fase 1.8). Prompt caching en AnthropicClient (Fase 1.9). L3 guardrails de confirmacion para escritura (Fase 3).
~500 tok · Static · Always cached~500 tok · Estatica · Siempre cacheada
~150-300 tok · UserProfile + Critique alerts~150-300 tok · UserProfile + alertas Critique
~100-150 tok · Write confirmation guardrails~100-150 tok · Guardrails de confirmacion
Domain portPuerto de dominio
Array for Anthropic APIArray para API Anthropic
L1 always, L2 if stableL1 siempre, L2 si estable
1200 hard cap1200 limite duro
Scope Boundaries — What this layer does NOT doLimites de Alcance — Lo que esta capa NO hace
Tech Stack (TypeScript — AWS Lambda)Stack Tecnologico (TypeScript — AWS Lambda)
Data Models, Interfaces & Acceptance Criteria Modelos de Datos, Interfaces & Criterios de Aceptación
interface SystemPromptContext {
userProfile?: UserProfileSummary // undefined until Phase 0.2
criticalAlerts?: HealthAlert[] // undefined until Phase 1.8
writeCapable?: boolean // false/undefined until Phase 3
}
interface UserProfileSummary {
marketplaces: Marketplace[]
categories: string[]
declaredGoals: string[]
}
interface HealthAlert {
domain: 'inventory' | 'advertising' | 'organic' | 'financial'
level: 'Critique' | 'Delicate'
product?: string
metric: string
}
interface ComposedSystemPrompt {
blocks: SystemPromptBlock[]
estimatedTokens: number
cachedBlockCount: number
}
interface SystemPromptBlock {
text: string
cache_control?: { type: 'ephemeral' }
}
// Token Budget (system prompt only):
// L1 Base: ~500 tokens (always)
// L2 Session: ~150-300 tokens (from Phase 0.2/1.8)
// L3 Execution: ~100-150 tokens (from Phase 3, write_capable only)
// Typical total: ~750-950 tokens | Hard cap: 1200 tokens
// Truncation priority: L3 first, then L2.alerts
// domain/common/services/ISystemPromptComposer.ts
interface ISystemPromptComposer {
compose(context: SystemPromptContext): ComposedSystemPrompt
}
// Returns blocks[] (not string) so AnthropicClient can apply
// cache_control per block. Other clients (Vertex, OpenRouter)
// concatenate blocks as plain string — no functional change.
// ChatOptions extension (Phase 1.9, backward compatible):
interface ChatOptions {
systemPrompt?: string | SystemPromptBlock[]
}
// Dependencies:
// SystemPromptComposer
// ├── IUserProfileRepository (reads UserProfile, Phase 0.2)
// └── IBrandHealthContextService (reads Critique alerts, Phase 1.8)
// The composer does NOT call the LLM. Does NOT know conversation
// history. Only composes text blocks from already-resolved data.
- System prompt assembled from blocks[] with per-block cache_control support
- L1 (base identity) always present at position 0, always with cache_control: ephemeral
- L2 injects UserProfile (marketplaces, categories, goals) when available
- L2 injects critical Critique alerts when active — Delicate-only sessions omit alert block
- L3 activates write confirmation guardrails only when writeCapable === true
- Total system prompt stays under 1200 tokens hard cap (typical ~750-950)
- AnthropicClient serializes blocks with cache_control; other clients concatenate to string
- System prompt ensamblado desde blocks[] con soporte de cache_control por bloque
- L1 (identidad base) siempre presente en posicion 0, siempre con cache_control: ephemeral
- L2 inyecta UserProfile (marketplaces, categorias, objetivos) cuando disponible
- L2 inyecta alertas criticas Critique cuando activas — sesiones solo-Delicate omiten bloque de alertas
- L3 activa guardrails de confirmacion de escritura solo cuando writeCapable === true
- System prompt total se mantiene bajo 1200 tokens limite duro (tipico ~750-950)
- AnthropicClient serializa bloques con cache_control; otros clientes concatenan a string
3 layers · L1 ~500t cached · L2 ~150-300t · L3 ~100-150t · 1200 hard cap · blocks[] for Anthropic prompt caching
How It Works — 3-Layer CompositionComo Funciona — Composicion de 3 Capas
AgentLoopOrchestrator
|
v
SystemPromptComposer.compose(context)
|
+-- [0] L1 BaseCoachBlock (always, cache_control: ephemeral)
| "You are Coach, expert in digital commerce..."
| ~500 tokens — current COACH_SYSTEM_PROMPT content
|
+-- [1] L2 SessionBlock (if userProfile or criticalAlerts)
| "SELLER PROFILE:
| Active marketplaces: MercadoLibre Argentina
| Categories: Electronics, Accessories
| Declared goals: scale to 500 sales/mo
| CRITICAL ALERTS:
| Critique — Inventory: 'BT Pro Headphones' out of stock 3d"
| cache_control: ephemeral if UserProfile unchanged
|
+-- [2] L3 ExecutionBlock (if writeCapable === true)
"EXECUTION CAPABILITIES:
You can propose and execute marketplace changes.
CONFIRMATION RULES (non-negotiable):
- Never execute without explicit seller confirmation
- Show current ('before') and proposed ('after')
- Irreversible actions require full action text"
No cache_control — varies per session
|
v
ComposedSystemPrompt { blocks[], estimatedTokens, cachedBlockCount }
|
v
AnthropicClient → system: [
{ type: "text", text: "...", cache_control: { type: "ephemeral" } }, // L1
{ type: "text", text: "...", cache_control: { type: "ephemeral" } }, // L2
{ type: "text", text: "..." } // L3
]
The AgentLoopOrchestrator calls SystemPromptComposer.compose() before each LLM invocation. The composer returns SystemPromptBlock[] — an array of typed objects, not a concatenated string. This is because the Anthropic API accepts the system field as an array where each block can carry cache_control individually. L1 (position 0) is always cached — it's the largest block (~500 tokens) and never varies between users. L2 is cacheable when UserProfile hasn't changed (rare — only changes when update_user_profile executes). L3 is never cached — it depends on whether write tools are available for this session. Other LLM clients (Vertex, OpenRouter) don't support prompt caching — they concatenate blocks to a plain string internally. Marketplace terminology (MeLi "publicacion" vs Amazon "listing") lives in the KB semantic search, not in the system prompt — the RAG brings it when relevant.El AgentLoopOrchestrator llama a SystemPromptComposer.compose() antes de cada invocacion LLM. El compositor retorna SystemPromptBlock[] — un array de objetos tipados, no un string concatenado. Esto es porque la API de Anthropic acepta el campo system como array donde cada bloque puede llevar cache_control individualmente. L1 (posicion 0) siempre se cachea — es el bloque mas grande (~500 tokens) y nunca varia entre usuarios. L2 es cacheable cuando UserProfile no ha cambiado (raro — solo cambia cuando update_user_profile se ejecuta). L3 nunca se cachea — depende de si hay tools write disponibles para esta sesion. Otros clientes LLM (Vertex, OpenRouter) no soportan prompt caching — concatenan bloques a string plano internamente. La terminologia de marketplace (MeLi "publicacion" vs Amazon "listing") vive en la busqueda semantica de la KB, no en el system prompt — el RAG la trae cuando es relevante.
Implementation Plan (phased with orchestrator)Plan de Implementacion (en fases con orquestador)
Phase 0.2: Introduce Composer + L2 UserProfileFase 0.2: Introducir Compositor + L2 UserProfile
Create ISystemPromptComposer interface + SystemPromptContext/ComposedSystemPrompt/SystemPromptBlock types in domain. Implement SystemPromptComposer with L1 (refactored COACH_SYSTEM_PROMPT as BASE_COACH_BLOCK) + L2 UserProfile sub-block (marketplaces, categories, goals). No L3 yet — tools don't exist. Wire into orchestrator.Crear ISystemPromptComposer interface + tipos SystemPromptContext/ComposedSystemPrompt/SystemPromptBlock en dominio. Implementar SystemPromptComposer con L1 (COACH_SYSTEM_PROMPT refactorizado como BASE_COACH_BLOCK) + sub-bloque L2 UserProfile (marketplaces, categorias, objetivos). Sin L3 aun — las tools no existen. Conectar al orquestador.
Phase 1.8: L2 Critical AlertsFase 1.8: L2 Alertas Criticas
Add criticalAlerts?: HealthAlert[] to SystemPromptContext. L2 includes Critique alerts sub-block when active. Delicate-only states are omitted — the Coach finds them via RAG. The alert block exists only for urgencies that need visibility regardless of the user's query.Agregar criticalAlerts?: HealthAlert[] al SystemPromptContext. L2 incluye sub-bloque de alertas Critique cuando estan activas. Estados solo-Delicate se omiten — el Coach los encuentra via RAG. El bloque de alertas existe solo para urgencias que necesitan visibilidad independientemente del query del usuario.
Phase 1.9: Prompt CachingFase 1.9: Prompt Caching
Extend ChatOptions.systemPrompt to accept string | SystemPromptBlock[] (backward compatible). AnthropicClient serializes blocks with cache_control — L1 always cached, L2 cached if UserProfile unchanged. VertexLLMClient and OpenRouterLLMClient concatenate blocks to string (no functional change, just no caching). L1 caching saves ~450 tokens cost per turn (~10% of normal price).Extender ChatOptions.systemPrompt para aceptar string | SystemPromptBlock[] (compatible hacia atras). AnthropicClient serializa bloques con cache_control — L1 siempre cacheado, L2 cacheado si UserProfile no cambio. VertexLLMClient y OpenRouterLLMClient concatenan bloques a string (sin cambio funcional, solo sin caching). Caching de L1 ahorra ~450 tokens de costo por turno (~10% del precio normal).
Phase 3: L3 Execution ModeFase 3: L3 Modo de Ejecucion
Add writeCapable flag to SystemPromptContext. AgentLoopOrchestrator determines it from ToolPolicyFilter results — if all tools are read_only, writeCapable is not set and L3 is omitted. L3 adds non-negotiable confirmation rules: never execute without explicit confirmation, show before/after, irreversible actions require full action text in confirmation.Agregar flag writeCapable al SystemPromptContext. AgentLoopOrchestrator lo determina de los resultados del ToolPolicyFilter — si todas las tools son read_only, writeCapable no se establece y L3 se omite. L3 agrega reglas de confirmacion no negociables: nunca ejecutar sin confirmacion explicita, mostrar antes/despues, acciones irreversibles requieren texto completo de la accion en confirmacion.
Risk AnalysisAnalisis de Riesgos
L3 Not Activated When Write Tools AvailableL3 No Activada Cuando Hay Tools Write
Impact: High — LLM may execute without asking for confirmation.Impacto: Alto — LLM puede ejecutar sin pedir confirmacion.
Mitigation: writeCapable is determined by AgentLoopOrchestrator from ToolPolicyFilter results — not by inference. Responsibility is well-separated: composer adds L3 if and only if it receives writeCapable: true explicitly.Mitigacion: writeCapable es determinado por AgentLoopOrchestrator de los resultados del ToolPolicyFilter — no por inferencia. La responsabilidad esta bien separada: el compositor agrega L3 si y solo si recibe writeCapable: true explicitamente.
Stale Critique Alerts in L2Alertas Critique Desactualizadas en L2
Impact: Medium — Coach prioritizes a resolved issue unnecessarily.Impacto: Medio — Coach prioriza innecesariamente un problema ya resuelto.
Mitigation: L2 regenerates when UserProfile changes. Future: BrandHealthContextService.getHealthSummary() with short TTL per session. For now, alerts are queried at session start — if drift occurs during session, RAG corrects context in the next turn.Mitigacion: L2 se regenera cuando UserProfile cambia. Futuro: BrandHealthContextService.getHealthSummary() con TTL corto por sesion. Por ahora, alertas se consultan al inicio de sesion — si hay drift durante la sesion, el RAG corrige el contexto en el siguiente turno.
System Prompt Exceeds BudgetSystem Prompt Excede Presupuesto
Impact: Low — L2 and L3 are conditional. Max case ~950 tokens leaves margin.Impacto: Bajo — L2 y L3 son condicionales. Caso maximo ~950 tokens deja margen.
Mitigation: Hard cap of 1200 tokens in the composer. If exceeded (rare), truncation priority: L3 first, then L2.alerts. L1 is never truncated.Mitigacion: Limite duro de 1200 tokens en el compositor. Si se excede (raro), prioridad de truncamiento: L3 primero, luego L2.alerts. L1 nunca se trunca.
Key DecisionsDecisiones Clave
Marketplace terminology in the KB, not in the system prompt — The KB semantic search has the terminology dictionary. RAG retrieves it when the query makes it relevant. Duplicating it in the system prompt would consume fixed tokens on every invocation for information the LLM can obtain contextually.Terminologia de marketplace en la KB, no en el system prompt — La busqueda semantica de la KB tiene el diccionario de terminologia. El RAG lo recupera cuando la query lo hace relevante. Duplicarlo en el system prompt consumiria tokens fijos en cada invocacion para informacion que el LLM puede obtener contextualmente.
ISystemPromptComposer in domain, implementation in application — The orchestrator depends on the interface. Changing the layers, their order, or caching logic doesn't touch the orchestrator.ISystemPromptComposer en dominio, implementacion en aplicacion — El orquestador depende de la interfaz. Cambiar las capas, su orden o la logica de caching no toca el orquestador.
blocks[] as return type, not string — AnthropicClient needs separate blocks to apply cache_control individually. A plain string would lose caching granularity. Clients that don't support caching concatenate blocks internally.blocks[] como tipo de retorno, no string — AnthropicClient necesita bloques separados para aplicar cache_control individualmente. Un string plano perderia la granularidad de caching. Clientes que no soportan caching concatenan bloques internamente.
L3 is capability declaration, not tone change — The Coach is always direct and technical. What changes when write tools are available are the capabilities and their guardrails. L3 declares that — it doesn't adjust the tone.L3 es declaracion de capacidades, no cambio de tono — El Coach siempre es directo y técnico. Lo que cambia cuando hay tools write disponibles son las capacidades y sus guardrails. L3 declara eso — no ajusta el tono.
MVP Scope
Phase 0.2: ISystemPromptComposer + L1 (refactored COACH_SYSTEM_PROMPT) + L2 UserProfile. Phase 1.8: L2 Critique alerts. Phase 1.9: Prompt caching (blocks[] + cache_control). Phase 3: L3 write guardrails. System prompt ~750-950 tokens typical, 1200 hard cap. Fase 0.2: ISystemPromptComposer + L1 (COACH_SYSTEM_PROMPT refactorizado) + L2 UserProfile. Fase 1.8: L2 alertas Critique. Fase 1.9: Prompt caching (blocks[] + cache_control). Fase 3: L3 guardrails de escritura. System prompt ~750-950 tokens tipico, 1200 limite duro.
Inspired byInspirado en
Anthropic prompt caching API, Claude system prompt patterns. Existing COACH_SYSTEM_PROMPT as L1 base content. API de prompt caching de Anthropic, patrones de system prompt de Claude. COACH_SYSTEM_PROMPT existente como contenido base de L1.
📝 Project ChangelogChangelog del Proyecto
Context Aggregator
Intelligence — Mateo
The Context Aggregator assembles RAG context from 2 automatic sources before every LLM call: KBContextProvider (top-K semantic chunks from BigQuery kb_embeddings via Cerebro #9) and BrandHealthContextService (intent-aware chunks from brand_health_embeddings via Data Sync #10). A single Vertex AI embedding call is shared across both searches, which run in parallel via Promise.allSettled. The IContextAssembler interface formalizes the existing RagOrchestrator pattern. IContextWindowManager manages a dynamic token budget over the 200K context window — calculating available space after system prompt, history, tool definitions, and expected response. HyDE (Hypothetical Document Embedding): when the user's query is short or ambiguous, the system can first generate a hypothetical answer internally, then use that answer as the search vector instead of the question itself — this finds better results because the KB contains answers and explanations, not questions, so a vector from a hypothetical answer is semantically closer to the stored documents. The hypothetical is never shown to the user — it's only used to improve search. Activated selectively, not on every query. Live seller data (metrics, inventory, products) is NOT pre-fetched — the LLM requests it on-demand via READ tools when needed. El Context Aggregator ensambla contexto RAG de 2 fuentes automaticas antes de cada llamada LLM: KBContextProvider (top-K chunks semanticos desde BigQuery kb_embeddings via Cerebro #9) y BrandHealthContextService (chunks intent-aware desde brand_health_embeddings via Data Sync #10). Una sola llamada de embedding Vertex AI se comparte entre ambas busquedas, que corren en paralelo via Promise.allSettled. La interfaz IContextAssembler formaliza el patron existente de RagOrchestrator. IContextWindowManager gestiona un presupuesto dinamico de tokens sobre el context window de 200K — calculando espacio disponible despues del system prompt, historial, definiciones de tools, y respuesta esperada. HyDE (Hypothetical Document Embedding): cuando la query del usuario es corta o ambigua, el sistema puede primero generar una respuesta hipotetica internamente, luego usar esa respuesta como vector de busqueda en lugar de la pregunta — esto encuentra mejores resultados porque la KB contiene respuestas y explicaciones, no preguntas, asi que un vector de una respuesta hipotetica es semanticamente mas cercano a los documentos almacenados. La hipotetica nunca se muestra al usuario — solo se usa para mejorar la busqueda. Se activa selectivamente, no en cada query. Los datos en vivo del vendedor (metricas, inventario, productos) NO se pre-fetchean — el LLM los solicita on-demand via READ tools cuando los necesita.
Beautonomous governance: IContextWindowManager is ConfirmationFlow-aware — when a WRITE action is pending confirmation, context assembly includes the pending action details so the LLM presents them accurately to the seller. The seller's consent is informed, not blind.Governance de Beautonomous: IContextWindowManager tiene conciencia del ConfirmationFlow — cuando una acción WRITE está pendiente de confirmación, el ensamblado de contexto incluye los detalles de la acción pendiente para que el LLM los presente con precisión al vendedor. El consentimiento del vendedor es informado, no ciego.
IContextAssembler — assembles KB + Brand HealthIContextAssembler — ensambla KB + Brand Health
BigQuery kb_embeddings, semantic similarityBigQuery kb_embeddings, similitud semantica
Intent-aware, brand_health_embeddingsIntent-aware, brand_health_embeddings
IContextWindowManager — dynamic token budgetIContextWindowManager — presupuesto dinamico de tokens
Vertex AI text-embedding-004, 1 call per queryVertex AI text-embedding-004, 1 llamada por query
Hypothetical answer for search (selective)Respuesta hipotetica para busqueda (selectivo)
Current StateEstado Actual
OperationalOperacional
KB RAG pipeline (embedding + BigQuery vector search + prompt injection). Brand Health intent-aware context (advertising, inventory, organic, financial intents). Parallel execution with shared embedding. Graceful degradation — if KB or Brand Health fails, Coach responds with remaining context.Pipeline KB RAG (embedding + busqueda vectorial BigQuery + inyeccion en prompt). Contexto Brand Health intent-aware (intents: advertising, inventory, organic, financial). Ejecucion paralela con embedding compartido. Degradacion graciosa — si KB o Brand Health falla, Coach responde con contexto restante.
Needs FormalizationNecesita Formalizacion
IContextAssembler interface — RagOrchestrator already implements the pattern but lacks a formal interface contract (Phase 0.2). Explicit ContextSource type for error tracking and sourcesFailed reporting.Interfaz IContextAssembler — RagOrchestrator ya implementa el patron pero carece de un contrato formal de interfaz (Fase 0.2). Tipo ContextSource explicito para tracking de errores y reporte de sourcesFailed.
To BuildPor Construir
IContextWindowManager interface + ContextWindowManager implementation (Phase 0.3). Token budget calculation: available = 200K - system - history - tools - 4000. toolDefinitionsTokens accounting (~1450 tokens for 36 tools, Phase 1). Dynamic toolResultsTokens in ReAct loop (Phase 2+). HyDE (Hypothetical Document Embedding): selective generation of hypothetical answers for search when query is short/ambiguous — requires additional LLM call pre-search (Phase 2+).Interfaz IContextWindowManager + implementacion ContextWindowManager (Fase 0.3). Calculo de presupuesto de tokens: available = 200K - system - history - tools - 4000. Contabilizacion de toolDefinitionsTokens (~1450 tokens para 36 tools, Fase 1). toolResultsTokens dinamicos en loop ReAct (Fase 2+). HyDE (Hypothetical Document Embedding): generacion selectiva de respuestas hipoteticas para busqueda cuando la query es corta/ambigua — requiere llamada LLM adicional pre-busqueda (Fase 2+).
Scope Boundaries — What this layer does NOT doLimites de Alcance — Lo que esta capa NO hace
Tech Stack (TypeScript — BigQuery + Vertex AI)Stack Tecnologico (TypeScript — BigQuery + Vertex AI)
Data Models, API Signatures & Acceptance Criteria Modelos de Datos, APIs & Criterios de Aceptación
interface IContextAssembler {
assemble(request: ContextAssemblyRequest): Promise<AssembledContext>
}
interface ContextAssemblyRequest {
userId: string; marketplace: Marketplace; query: string; conversationId: string
}
interface AssembledContext {
kbChunks: KBChunkModel[]; brandHealthChunks: BrandHealthChunkType[]
estimatedTokens: number; sourcesFailed: ContextSource[]
}
type ContextSource = 'kb' | 'brand_health'
interface IContextWindowManager {
allocate(available: TokenBudget): ContextAllocation
trim(context: AssembledContext, allocation: ContextAllocation): TrimmedContext
}
interface TokenBudget {
modelContextWindow: number // 200K
systemPromptTokens: number; historyTokens: number
toolDefinitionsTokens: number; expectedResponseTokens: number // 4000
}
interface ContextAllocation {
kbChunksTokens: number; brandHealthTokens: number; toolResultsTokens: number
}
interface TrimmedContext {
kbChunks: KBChunkModel[]; brandHealthChunks: BrandHealthChunkType[]; truncated: boolean
}
// Internal invocation by AgentLoopOrchestrator — NO REST endpoint // IContextAssembler (RagOrchestrator implements) assemble(request: ContextAssemblyRequest): Promise<AssembledContext> // IContextWindowManager (ContextWindowManager implements) allocate(budget: TokenBudget): ContextAllocation trim(context: AssembledContext, allocation: ContextAllocation): TrimmedContext // Flow: // 1. AgentLoopOrchestrator calls assemble() at start of each turn // 2. assemble() embeds query once, runs KB + BrandHealth searches in parallel // 3. allocate() calculates: available = 200K - system - history - tools - 4000 // 4. trim() fits assembled context within allocation, truncating if needed
- assemble() retrieves top-K KB chunks by semantic similarity from BigQuery kb_embeddings
- Brand health chunks filtered by detected intent (advertising, inventory, organic, financial)
- Single embedding call, two parallel searches via Promise.allSettled
- If KB or brand health fails, Coach responds with remaining context (graceful degradation invariant)
- ContextWindowManager calculates: available = 200K - system - history - tools - 4000
- Trimming priority: brand health first, then KB, then old tool results
- [Ph 2+] HyDE: when query is short/ambiguous, system generates hypothetical answer and uses it as search vector instead of the raw question. Hypothetical never shown to user. Activated selectively based on query characteristics
- assemble() recupera top-K chunks de KB por similitud semantica desde BigQuery kb_embeddings
- Chunks de brand health filtrados por intent detectado (advertising, inventory, organic, financial)
- Una sola llamada de embedding, dos busquedas paralelas via Promise.allSettled
- Si KB o brand health falla, Coach responde con contexto restante (invariante de degradacion graciosa)
- ContextWindowManager calcula: available = 200K - system - history - tools - 4000
- Prioridad de recorte: brand health primero, luego KB, luego tool results antiguos
- [Ph 2+] HyDE: cuando la query es corta/ambigua, el sistema genera una respuesta hipotetica y la usa como vector de busqueda en lugar de la pregunta cruda. La hipotetica nunca se muestra al usuario. Se activa selectivamente segun caracteristicas de la query
How It WorksComo Funciona
AgentLoopOrchestrator — start of turn
|
+-- SystemPromptComposer.compose() → system prompt blocks
|
v
ContextAssembler.assemble(query, userId, marketplace)
|
+-- EmbeddingClient.embed(query) ← 1 call, shared result
| |
| +-----+-----+
| | |
| KB search BrandHealth search ← parallel (Promise.allSettled)
| | |
| +-----------+
|
v
ContextWindowManager.allocate(budget)
|
v
ContextWindowManager.trim(assembledContext, allocation)
|
v
AssembledContext { kbChunks[], brandHealthChunks[], truncated }
|
v
buildUserPrompt(query, kbChunks, brandHealthChunks) → string for LLM
At the start of each turn, AgentLoopOrchestrator calls ContextAssembler.assemble(). The assembler embeds the user's query once via EmbeddingClient (Vertex AI text-embedding-004), then runs two parallel searches: KBContextProvider against BigQuery kb_embeddings and BrandHealthContextService against brand_health_embeddings (filtered by detected intent). Both searches use Promise.allSettled so a failure in one does not block the other. ContextWindowManager then calculates the available token budget (200K minus system prompt, history, tool definitions, and 4000 reserved for response) and trims the assembled context to fit. The final kbChunks and brandHealthChunks are injected into the user prompt by buildUserPrompt().Al inicio de cada turno, AgentLoopOrchestrator llama a ContextAssembler.assemble(). El assembler genera el embedding de la query del usuario una vez via EmbeddingClient (Vertex AI text-embedding-004), luego ejecuta dos busquedas paralelas: KBContextProvider contra BigQuery kb_embeddings y BrandHealthContextService contra brand_health_embeddings (filtrado por intent detectado). Ambas busquedas usan Promise.allSettled para que un fallo en una no bloquee a la otra. ContextWindowManager luego calcula el presupuesto de tokens disponible (200K menos system prompt, historial, definiciones de tools, y 4000 reservados para respuesta) y recorta el contexto ensamblado para que quepa. Los kbChunks y brandHealthChunks finales se inyectan en el user prompt por buildUserPrompt().
Implementation PlanPlan de Implementacion
Phase 0.2: Formalize IContextAssembler (Immediate)Fase 0.2: Formalizar IContextAssembler (Inmediata)
Extract IContextAssembler interface from existing RagOrchestrator. Define ContextAssemblyRequest and AssembledContext types. RagOrchestrator implements IContextAssembler — no functional change, only formalization of the existing pattern. Add ContextSource type and sourcesFailed tracking.Extraer interfaz IContextAssembler del RagOrchestrator existente. Definir tipos ContextAssemblyRequest y AssembledContext. RagOrchestrator implementa IContextAssembler — sin cambio funcional, solo formalizacion del patron existente. Agregar tipo ContextSource y tracking de sourcesFailed.
Phase 0.3: ContextWindowManager (Basic Budget)Fase 0.3: ContextWindowManager (Presupuesto Basico)
Define IContextWindowManager interface. Implement ContextWindowManager with allocate() and trim(). Budget calculation: available = 200K - systemPromptTokens - historyTokens - expectedResponseTokens (4000). Trimming priority: brand health chunks first, then KB chunks (by lowest similarity score). Integration with AgentLoopOrchestrator to pass token counts.Definir interfaz IContextWindowManager. Implementar ContextWindowManager con allocate() y trim(). Calculo de presupuesto: available = 200K - systemPromptTokens - historyTokens - expectedResponseTokens (4000). Prioridad de recorte: chunks de brand health primero, luego chunks de KB (por menor score de similitud). Integracion con AgentLoopOrchestrator para pasar conteos de tokens.
Phase 1: Tool Definitions in Budget (~1450 tokens for 36 tools)Fase 1: Definiciones de Tools en Presupuesto (~1450 tokens para 36 tools)
Add toolDefinitionsTokens to TokenBudget. Calculate tool schema token cost from Tool Registry (#3) — ~50 tokens per tool average, ~1450 total for 36 tools. Include in allocate() budget subtraction. Validate that RAG context still fits comfortably after tool definitions are accounted for.Agregar toolDefinitionsTokens a TokenBudget. Calcular costo de tokens de esquemas de tools del Tool Registry (#3) — ~50 tokens por tool promedio, ~1450 total para 36 tools. Incluir en sustraccion del presupuesto de allocate(). Validar que el contexto RAG aun cabe comodamente despues de contabilizar las definiciones de tools.
Phase 2+: Dynamic Tool Results in ReAct LoopFase 2+: Tool Results Dinamicos en Loop ReAct
Add toolResultsTokens to ContextAllocation — dynamically tracks accumulated tool_result tokens across ReAct rounds. ContextWindowManager recalculates available budget each round as tool results accumulate. Trim strategy: oldest tool results first (preserve most recent), then brand health, then KB. MAX_ROUNDS guard (10 steps) prevents unbounded context growth. HyDE (Hypothetical Document Embedding): when the user's query is short or ambiguous, the system generates a hypothetical answer via a lightweight LLM call and uses that answer as the search vector instead of the raw question. The KB contains answers and explanations, not questions — a vector from a hypothetical answer is semantically closer to the stored documents, finding better results. The hypothetical is never shown to the user. Activated selectively, not on every query — adds one LLM call of latency when used.Agregar toolResultsTokens a ContextAllocation — rastrea dinamicamente tokens de tool_result acumulados entre rondas ReAct. ContextWindowManager recalcula presupuesto disponible cada ronda a medida que se acumulan tool results. Estrategia de recorte: tool results mas antiguos primero (preservar los mas recientes), luego brand health, luego KB. Guard MAX_ROUNDS (10 pasos) previene crecimiento ilimitado de contexto. HyDE (Hypothetical Document Embedding): cuando la query del usuario es corta o ambigua, el sistema genera una respuesta hipotetica via una llamada LLM ligera y usa esa respuesta como vector de busqueda en lugar de la pregunta cruda. La KB contiene respuestas y explicaciones, no preguntas — un vector de una respuesta hipotetica es semanticamente mas cercano a los documentos almacenados, encontrando mejores resultados. La hipotetica nunca se muestra al usuario. Se activa selectivamente, no en cada query — agrega una llamada LLM de latencia cuando se usa.
Risk AnalysisAnalisis de Riesgos
Tool results fill context window in long sessionsTool results llenan context window en sesiones largas
Impact: MediumImpacto: Medio
Mitigation: ContextWindowManager reserves dynamic budget per round, trims brand health first, then KB, then oldest tool results. MAX_ROUNDS guard (10 steps) caps ReAct loop length. expectedResponseTokens (4000) always reserved.Mitigacion: ContextWindowManager reserva presupuesto dinamico por ronda, recorta brand health primero, luego KB, luego tool results mas antiguos. Guard MAX_ROUNDS (10 pasos) limita longitud del loop ReAct. expectedResponseTokens (4000) siempre reservado.
KB / Brand Health return irrelevant resultsKB / Brand Health devuelven resultados irrelevantes
Impact: MediumImpacto: Medio
Mitigation: Namespace filter (Phase 0.4) scopes BigQuery search when intent is clear. Intent detection selects relevant brand health subset. Trim removes lowest-similarity chunks first. Irrelevant context wastes tokens but does not cause errors.Mitigacion: Filtro de namespace (Fase 0.4) limita busqueda BigQuery cuando el intent es claro. Deteccion de intent selecciona subconjunto relevante de brand health. Trim remueve chunks de menor similitud primero. Contexto irrelevante desperdicia tokens pero no causa errores.
Embedding call takes longer than expectedLlamada de embedding tarda mas de lo esperado
Impact: Low-MediumImpacto: Bajo-Medio
Mitigation: Configurable timeout on EmbeddingClient. If embedding fails, Coach responds without RAG context — graceful degradation invariant ensures the user always gets a response, even without KB or brand health chunks.Mitigacion: Timeout configurable en EmbeddingClient. Si el embedding falla, Coach responde sin contexto RAG — el invariante de degradacion graciosa asegura que el usuario siempre recibe una respuesta, incluso sin chunks de KB o brand health.
Key DecisionsDecisiones Clave
Live data via tools, not pre-fetch — Seller metrics, inventory, and products are requested on-demand by the LLM via READ tools. Pre-fetching wastes tokens on data the user didn't ask about. The LLM decides what data it needs based on the user's query.Datos en vivo via tools, no pre-fetch — Metricas, inventario y productos del vendedor se solicitan on-demand por el LLM via READ tools. Pre-fetchear desperdicia tokens en datos que el usuario no pregunto. El LLM decide que datos necesita basado en la query del usuario.
Single embedding, two parallel searches — One Vertex AI embedding call per query, shared across KB and Brand Health searches. Promise.allSettled ensures both searches run in parallel and one failure doesn't block the other. Reduces latency and cost vs. separate embedding calls.Un solo embedding, dos busquedas paralelas — Una llamada de embedding Vertex AI por query, compartida entre busquedas de KB y Brand Health. Promise.allSettled asegura que ambas busquedas corren en paralelo y un fallo no bloquea a la otra. Reduce latencia y costo vs. llamadas de embedding separadas.
ContextWindowManager as separate service — Token budget management is decoupled from the orchestrator. The orchestrator passes token counts, ContextWindowManager calculates allocation and trims. This separation allows independent evolution of budget strategies without touching the orchestration logic.ContextWindowManager como servicio separado — La gestion de presupuesto de tokens esta desacoplada del orquestador. El orquestador pasa conteos de tokens, ContextWindowManager calcula asignacion y recorta. Esta separacion permite evolucion independiente de estrategias de presupuesto sin tocar la logica de orquestacion.
Graceful degradation as invariant — Never return an error to the user because of a context assembly failure. If embedding fails, respond without RAG. If KB fails, respond with Brand Health only (and vice versa). If both fail, respond with no RAG context. The user always gets a response.Degradacion graciosa como invariante — Nunca retornar un error al usuario por un fallo en el ensamblaje de contexto. Si el embedding falla, responder sin RAG. Si KB falla, responder solo con Brand Health (y viceversa). Si ambos fallan, responder sin contexto RAG. El usuario siempre recibe una respuesta.
File StructureEstructura de Archivos
src/domain/common/services/
IContextAssembler.ts
IContextWindowManager.ts
src/application/common/services/
RagOrchestrator.ts (implements IContextAssembler)
ContextWindowManager.ts (implements IContextWindowManager)
MVP Scope
KB RAG + Brand Health intent-aware + parallel execution + graceful degradation. IContextAssembler formalized (Phase 0.2), ContextWindowManager basic (Phase 0.3). KB RAG + Brand Health intent-aware + ejecucion paralela + degradacion graciosa. IContextAssembler formalizado (Fase 0.2), ContextWindowManager basico (Fase 0.3).
SourceFuente
RagOrchestrator + BrandHealthContextService RagOrchestrator + BrandHealthContextService
📝 Project ChangelogChangelog del Proyecto
Proactive Suggestions Engine
Intelligence — Mateo
Next Best Action suggestions via LLM evaluation — not hardcoded rules. After every tool call, the after_tool hook in HookLifecycle (#3) triggers IProactiveSuggestionService.afterTool(). The LLM receives the tool result, brand health summary, and conversation context, then evaluates whether there's something actionable the seller should consider. No thresholds in code — the LLM reasons contextually about what matters. Suggestions are ephemeral text appended to the Coach response as conversational questions (not sidebar cards, not a separate UI). Deduplication is cross-session via UserProfile.recentSuggestions[] with a 7-day window per (suggestionType, productId). The service is generic per tool — adding a new tool to the catalog requires zero changes. [v2.1] Pro plan only — Free users don't receive proactive suggestions (gated at API Gateway). Sugerencias Next Best Action via evaluacion LLM — no reglas hardcodeadas. Despues de cada tool call, el hook after_tool en HookLifecycle (#3) activa IProactiveSuggestionService.afterTool(). El LLM recibe el resultado de la tool, resumen de brand health y contexto de conversacion, y evalua si hay algo accionable que el vendedor deberia considerar. Sin umbrales en codigo — el LLM razona contextualmente sobre que importa. Las sugerencias son texto efimero agregado a la respuesta del Coach como preguntas conversacionales (no tarjetas en sidebar, no UI separada). La deduplicacion es cross-session via UserProfile.recentSuggestions[] con ventana de 7 dias por (suggestionType, productId). El servicio es generico por tool — agregar una tool nueva al catalogo requiere cero cambios. [v2.1] Solo plan Pro — usuarios Free no reciben sugerencias proactivas (gateado en API Gateway).
Beautonomous governance: proactive suggestions respect Core's permission matrix — suggestions implying WRITE actions are only surfaced to roles with write authorization. Pro-plan gating at API Gateway aligns with Core's resource allocation tiers. No suggestion can bypass Core's ConfirmationFlow if the seller acts on it.Governance de Beautonomous: las sugerencias proactivas respetan la matriz de permisos de Core — las sugerencias que implican acciones WRITE solo se muestran a roles con autorización de escritura. El gate de plan Pro en API Gateway se alinea con los tiers de asignación de recursos de Core. Ninguna sugerencia puede evitar el ConfirmationFlow de Core si el vendedor actúa sobre ella.
Domain port — afterTool() + afterToolWithContext()Puerto de dominio — afterTool() + afterToolWithContext()
LLM evaluation + deduplicationEvaluacion LLM + deduplicacion
Tool result + brand health + contextResultado de tool + brand health + contexto
7-day window per (type, productId)Ventana 7 dias por (tipo, productId)
Cross-session persistence (DynamoDB)Persistencia cross-session (DynamoDB)
Current StateEstado Actual
Nothing ExistsNada Existe
ProactiveSuggestionService does not exist in the codebase. The Coach responds to user questions and terminates — no post-tool evaluation, no suggestions appended to responses.ProactiveSuggestionService no existe en el codebase. El Coach responde preguntas del usuario y termina — sin evaluacion post-tool, sin sugerencias agregadas a respuestas.
Blocked By PrerequisitesBloqueado por Prerequisitos
ReAct Loop (Phase 0.3) — without the loop there are no tool calls. HookLifecycle (Phase 1) — after_tool hook is the only activation point for this service. Both must be operational before #6 can start.Loop ReAct (Fase 0.3) — sin el loop no hay tool calls. HookLifecycle (Fase 1) — el hook after_tool es el unico punto de activacion de este servicio. Ambos deben estar operacionales antes de que #6 pueda iniciar.
To BuildPor Construir
IProactiveSuggestionService interface + types (Phase 2). ProactiveSuggestionService with LLM evaluation + dedup (Phase 2). UserProfile.recentSuggestions[] extension (Phase 2). afterToolWithContext() with full session context + parallel streaming (Phase 4).Interfaz IProactiveSuggestionService + tipos (Fase 2). ProactiveSuggestionService con evaluacion LLM + dedup (Fase 2). Extension UserProfile.recentSuggestions[] (Fase 2). afterToolWithContext() con contexto completo de sesion + streaming paralelo (Fase 4).
Scope Boundaries — What this layer does NOT doLimites de Alcance — Lo que esta capa NO hace
Tech Stack (TypeScript — LLM Structured Output)Stack Tecnologico (TypeScript — LLM Structured Output)
Data Models, API Signatures & Acceptance Criteria Modelos de Datos, APIs & Criterios de Aceptación
interface IProactiveSuggestionService {
afterTool(input: SuggestionInput): Promise<Suggestion[]> // Phase 2
afterToolWithContext(input: SuggestionInput): Promise<Suggestion[]> // Phase 4
}
interface SuggestionInput {
toolName: string
result: ToolResult
userId: string
marketplace: Marketplace
brandHealthSummary?: HealthSummary
conversationContext: Message[] // last N messages
recentSuggestions: RecentSuggestion[] // for deduplication
}
interface Suggestion {
message: string // conversational question, inviting tone
priority: 'high' | 'normal'
suggestionType: string // dedup key: "stock", "pricing", "health", "reviews", etc.
productId?: string // if applies to a specific product
}
interface RecentSuggestion {
suggestionType: string
productId?: string
suggestedAt: string // ISO8601 — for 7-day window
}
// Stored in UserProfile.recentSuggestions[] (DynamoDB, last 30 entries)
// Internal invocation via HookLifecycle after_tool — NO REST endpoint
// Phase 2: lightweight LLM evaluation post-tool
afterTool(input: SuggestionInput): Promise<Suggestion[]>
// Phase 4: full session context + parallel to streaming
afterToolWithContext(input: SuggestionInput): Promise<Suggestion[]>
// Flow:
// 1. ToolExecutor completes tool → HookLifecycle.afterTool fires
// 2. ProactiveSuggestionService.afterTool() sends tool result to LLM
// 3. LLM returns structured JSON: { hasSuggestion, message, suggestionType, priority }
// 4. isDuplicate(suggestionType, productId, recentSuggestions) → skip if duplicate
// 5. Max 2 suggestions per turn, high priority first
// 6. Appended to Coach response as conversational questions
- afterTool() evaluates every tool result via LLM — no hardcoded rules or thresholds
- LLM returns structured JSON with hasSuggestion, message, suggestionType, priority
- Deduplication: same (suggestionType, productId) within 7 days = skip
- Max 2 suggestions per turn — high priority first
- If LLM output parse fails → silence (no error, Coach responds normally)
- Suggestions are conversational questions, not prescriptive alerts
- Adding a new tool to the catalog requires zero changes to this service
- afterTool() evalua cada resultado de tool via LLM — sin reglas hardcodeadas ni umbrales
- LLM retorna JSON estructurado con hasSuggestion, message, suggestionType, priority
- Deduplicacion: mismo (suggestionType, productId) dentro de 7 dias = omitir
- Maximo 2 sugerencias por turno — prioridad high primero
- Si el parse del output LLM falla → silencio (sin error, Coach responde normalmente)
- Las sugerencias son preguntas conversacionales, no alertas prescriptivas
- Agregar una tool nueva al catalogo requiere cero cambios en este servicio
How It WorksComo Funciona
LLM executes tool (any: get_product, get_orders, get_market_pricing, ...)
|
v
HookLifecycle.after_tool
|
v
ProactiveSuggestionService.afterTool(toolName, result, brandHealth, context)
|
+-- LLM evaluates: "Is there something actionable in this result?"
| → { hasSuggestion, message, suggestionType, priority, productId }
|
+-- isDuplicate(suggestionType, productId, recentSuggestions)?
| Yes → skip
| No → keep
|
+-- Sort by priority (high first)
|
+-- Take max 2
|
v
Suggestions[] → context.pendingSuggestions
|
v
AgentLoopOrchestrator builds final response:
[main LLM response]
[suggestions as conversational questions, if any]
|
v
User receives response + 0-2 questions at the end
After every tool execution, the after_tool hook in HookLifecycle (#3) fires ProactiveSuggestionService.afterTool(). The service sends the tool result, brand health summary, and recent conversation context to the LLM with a structured output prompt: "Is there something actionable the seller should consider?" The LLM returns JSON with hasSuggestion, message, suggestionType, priority, and optional productId. The service then checks isDuplicate() against UserProfile.recentSuggestions[] — if the same (suggestionType, productId) was suggested within 7 days, it's silently skipped. Remaining suggestions are sorted by priority (high first), capped at 2 per turn, and pushed to context.pendingSuggestions. The AgentLoopOrchestrator appends them as conversational questions at the end of the Coach response. If the LLM returns hasSuggestion: false or the parse fails, silence — no forced suggestion. The service never blocks the main response.Despues de cada ejecucion de tool, el hook after_tool en HookLifecycle (#3) dispara ProactiveSuggestionService.afterTool(). El servicio envia el resultado de la tool, resumen de brand health y contexto reciente de la conversacion al LLM con un prompt de output estructurado: "Hay algo accionable que el vendedor deberia considerar?" El LLM retorna JSON con hasSuggestion, message, suggestionType, priority, y productId opcional. El servicio luego verifica isDuplicate() contra UserProfile.recentSuggestions[] — si el mismo (suggestionType, productId) fue sugerido en los ultimos 7 dias, se omite silenciosamente. Las sugerencias restantes se ordenan por prioridad (high primero), se limitan a 2 por turno, y se agregan a context.pendingSuggestions. El AgentLoopOrchestrator las agrega como preguntas conversacionales al final de la respuesta del Coach. Si el LLM retorna hasSuggestion: false o el parse falla, silencio — sin sugerencia forzada. El servicio nunca bloquea la respuesta principal.
Implementation PlanPlan de Implementacion
Prerequisites: ReAct Loop (Phase 0.3) + HookLifecycle (Phase 1)Prerequisitos: Loop ReAct (Fase 0.3) + HookLifecycle (Fase 1)
The after_tool hook does not exist without the ReAct loop and HookLifecycle. Both must be operational before this service can be activated. Without tool calls, there's nothing to evaluate.El hook after_tool no existe sin el loop ReAct y HookLifecycle. Ambos deben estar operacionales antes de que este servicio pueda activarse. Sin tool calls, no hay nada que evaluar.
Phase 2: Lightweight LLM Evaluation Post-ToolFase 2: Evaluacion LLM Ligera Post-Tool
Define IProactiveSuggestionService interface in domain. Implement ProactiveSuggestionService with LLM evaluation prompt (structured JSON output: hasSuggestion, message, suggestionType, priority, productId). Implement isDuplicate() with 7-day window per (suggestionType, productId). Extend UserProfile to include recentSuggestions: RecentSuggestion[] (last 30 entries, DynamoDB). Wire into HookLifecycle.afterTool() — fire-and-forget. Max 2 suggestions per turn, high priority first.Definir interfaz IProactiveSuggestionService en dominio. Implementar ProactiveSuggestionService con prompt de evaluacion LLM (output JSON estructurado: hasSuggestion, message, suggestionType, priority, productId). Implementar isDuplicate() con ventana de 7 dias por (suggestionType, productId). Extender UserProfile para incluir recentSuggestions: RecentSuggestion[] (ultimos 30 registros, DynamoDB). Conectar a HookLifecycle.afterTool() — fire-and-forget. Maximo 2 sugerencias por turno, prioridad high primero.
Phase 4: Full Context + Parallel StreamingFase 4: Contexto Completo + Streaming Paralelo
Implement afterToolWithContext() — LLM sees all tool results from the session, full conversation history, and suggestion states (acted/ignored). Runs in parallel to response streaming — user already sees the response while the LLM evaluates if there's more to flag. Cross-tool pattern detection: connects results from multiple tools in the same session. Relevance gate calibrated to <40% of turns emitting a suggestion. Signal, not noise.Implementar afterToolWithContext() — el LLM ve todos los tool results de la sesion, historial completo de conversacion y estados de sugerencias (actuada/ignorada). Corre en paralelo al streaming de respuesta — el usuario ya ve la respuesta mientras el LLM evalua si hay algo mas que senalar. Deteccion de patrones cross-tool: conecta resultados de multiples tools en la misma sesion. Gate de relevancia calibrado a <40% de turnos con sugerencia emitida. Senal, no ruido.
Risk AnalysisAnalisis de Riesgos
LLM over-generates suggestionsLLM sobregenera sugerencias
Impact: HighImpacto: Alto
Mitigation: 7-day deduplication prevents repetition. Max 2 per turn limits volume. Phase 4 relevance gate calibrates emission rate. Acceptance metric: >30% of suggestions acted upon in Phase 2 — if below, the model is generating low-quality suggestions.Mitigacion: Deduplicacion de 7 dias previene repeticion. Maximo 2 por turno limita volumen. Gate de relevancia de Fase 4 calibra tasa de emision. Metrica de aceptación: >30% de sugerencias actuadas en Fase 2 — si esta debajo, el modelo esta generando sugerencias de baja calidad.
LLM fails to produce structured outputLLM no produce output estructurado
Impact: MediumImpacto: Medio
Mitigation: If JSON parse fails → silence (not error). The Coach still responds to the main question normally. ProactiveSuggestionService never blocks the response — it's an optional addition. Parse failures are logged for monitoring.Mitigacion: Si el parse de JSON falla → silencio (no error). El Coach sigue respondiendo a la pregunta principal normalmente. ProactiveSuggestionService nunca bloquea la respuesta — es una adicion opcional. Fallos de parse se registran para monitoreo.
Stale deduplication in long sessionsDeduplicacion stale en sesiones largas
Impact: LowImpacto: Bajo
Mitigation: Dedup is by (suggestionType, productId). Different product → evaluated normally. Same product + problem resolved + recurrence within 7 days is an acceptable edge case. Last 30 entries kept; older discarded on write.Mitigacion: Dedup es por (suggestionType, productId). Producto diferente → evaluado normalmente. Mismo producto + problema resuelto + recurrencia dentro de 7 dias es un edge case aceptable. Ultimos 30 registros mantenidos; los mas antiguos descartados al escribir.
Key DecisionsDecisiones Clave
Actionability criteria lives in the LLM, not in code — Business thresholds vary by context: marketplace, category, seller volume, session context. The LLM reasons over the full picture — a constant < 70 in code cannot. This makes the service domain-agnostic: works the same for MeLi, Amazon, or any future marketplace.El criterio de accionabilidad vive en el LLM, no en el codigo — Los umbrales de negocio varian por contexto: marketplace, categoria, volumen del vendedor, contexto de sesion. El LLM razona sobre el panorama completo — una constante < 70 en codigo no puede. Esto hace al servicio agnostico al dominio: funciona igual para MeLi, Amazon, o cualquier marketplace futuro.
Generic per tool — extensible by design — afterTool() receives the result of any tool. No tool-specific logic. Adding get_shipping_costs or get_advertising_metrics to the catalog requires zero changes to this service. Extensibility is in the generic contract, not in a list of cases.Generico por tool — extensible por diseno — afterTool() recibe el resultado de cualquier tool. Sin logica especifica por tool. Agregar get_shipping_costs o get_advertising_metrics al catalogo requiere cero cambios en este servicio. La extensibilidad esta en el contrato generico, no en una lista de casos.
suggestionType as dedup key, not toolName — (toolName, productId) would block any suggestion about that product for 7 days, even if the LLM detects a different problem next session. (suggestionType, productId) deduplicates by problem type — more precise.suggestionType como clave de dedup, no toolName — (toolName, productId) bloquearia cualquier sugerencia sobre ese producto por 7 dias, aunque en la siguiente sesion el LLM detecte un problema diferente. (suggestionType, productId) deduplica por tipo de problema — mas preciso.
Questions that invite, not alerts that prescribe — "Your stock is low and you should restock" is an alert. "This product has 3 units and sold 8 this week. Want to restock before running out?" is an invitation. The Coach proposes, never prescribes. This distinction determines if users perceive the service as useful or as noise.Preguntas que invitan, no alertas que prescriben — "Tu stock esta bajo y deberias reponer" es una alerta. "Este producto tiene 3 unidades y vendio 8 esta semana. Queres reponer antes de quedarte sin stock?" es una invitacion. El Coach propone, nunca prescribe. Esta distincion determina si los usuarios perciben el servicio como util o como ruido.
Silence if nothing actionable — The LLM can and should return hasSuggestion: false. No forced suggestions. The goal is keeping the signal-to-noise ratio high — one suggestion the user acts on is worth more than three they ignore.Silencio si no hay nada accionable — El LLM puede y debe retornar hasSuggestion: false. Sin sugerencias forzadas. El objetivo es mantener alta la relacion senal/ruido — una sugerencia en la que el usuario actua vale mas que tres que ignora.
File StructureEstructura de Archivos
src/domain/coach/services/
IProactiveSuggestionService.ts (interface + types)
src/application/coach/services/
ProactiveSuggestionService.ts (LLM evaluation + deduplication)
MVP Scope
IProactiveSuggestionService + LLM evaluation post-tool + dedup via UserProfile (Phase 2). afterToolWithContext() with full session context deferred to Phase 4. Pro plan only. IProactiveSuggestionService + evaluacion LLM post-tool + dedup via UserProfile (Fase 2). afterToolWithContext() con contexto completo de sesion diferido a Fase 4. Solo plan Pro.
SourceFuente
New — no ProactiveSuggestionService exists in codebase Nuevo — no existe ProactiveSuggestionService en el codebase
📝 Project ChangelogChangelog del Proyecto
Guardrails
Security — Mateo
Independent content validation layer for the Coach — two validation points, one before the LLM (InputGuard) and one after (OutputGuard). InputGuard detects prompt injection attempts and off-scope queries before they reach the LLM. OutputGuard detects data leaks (another user's data in the response) and dangerous content (from unsanitized tool outputs) before the response reaches the user. This is security validation, not business logic — the guardrails don't know what a good Coach answer looks like, they only know what a dangerous one looks like. Graceful degradation is an invariant: if a guard fails internally, it lets through — the guardrails never cut the service. Rejection messages are always friendly, never expose the technical reason, and always redirect to the Coach's domain. Capa independiente de validacion de contenido para el Coach — dos puntos de validacion, uno antes del LLM (InputGuard) y uno despues (OutputGuard). InputGuard detecta intentos de prompt injection y queries fuera de scope antes de que lleguen al LLM. OutputGuard detecta filtraciones de datos (datos de otro usuario en la respuesta) y contenido peligroso (de outputs de tools no sanitizados) antes de que la respuesta llegue al usuario. Esto es validacion de seguridad, no logica de negocio — los guardrails no saben como luce una buena respuesta del Coach, solo saben como luce una peligrosa. Degradacion graciosa es invariante: si un guard falla internamente, deja pasar — los guardrails nunca cortan el servicio. Los mensajes de rechazo son siempre amables, nunca exponen el motivo técnico, y siempre redirigen al dominio del Coach.
Beautonomous governance: Guardrails is Core's first enforcement point — InputGuard validates incoming requests against Core's security rules before the LLM processes them. A blocked input means no ConfirmationFlow is ever triggered and no WRITE is ever attempted. Security validation precedes all governance logic.Governance de Beautonomous: Guardrails es el primer punto de aplicación de Core — InputGuard valida las solicitudes entrantes contra las reglas de seguridad de Core antes de que el LLM las procese. Un input bloqueado significa que no se activa ningún ConfirmationFlow y ningún WRITE se intenta. La validación de seguridad precede a toda la lógica de governance.
Domain port — validateInput() + validateOutput()Puerto de dominio — validateInput() + validateOutput()
Pre-LLM: prompt injection + off-scopePre-LLM: prompt injection + off-scope
Post-LLM: data leak + dangerous contentPost-LLM: data leak + contenido peligroso
Coordinator: IGuardService → InputGuard + OutputGuardCoordinador: IGuardService → InputGuard + OutputGuard
Lightweight LLM classifier (Phase 2)Clasificador LLM ligero (Fase 2)
prompt_injection | off_scope | data_leak | dangerous_contentprompt_injection | off_scope | data_leak | dangerous_content
Current StateEstado Actual
ReusableReutilizable
AgentLoopOrchestrator already has the integration point (pre-loop and post-loop). ILLMClient for LLMGuardChecker. CloudWatch logging infrastructure for violation tracking.AgentLoopOrchestrator ya tiene el punto de integracion (pre-loop y post-loop). ILLMClient para LLMGuardChecker. Infraestructura de logging CloudWatch para tracking de violaciones.
To BuildPor Construir
IGuardService interface + types (GuardInput, GuardOutput, GuardResult, ViolationCategory). InputGuard with pattern matching (injection + off-scope). OutputGuard with data leak detection (userId comparison) + dangerous content (pattern matching). GuardService coordinator. LLMGuardChecker for ambiguous cases (Phase 2). Integration in AgentLoopOrchestrator pre/post loop.Interfaz IGuardService + tipos (GuardInput, GuardOutput, GuardResult, ViolationCategory). InputGuard con pattern matching (injection + off-scope). OutputGuard con deteccion de data leak (comparacion de userId) + contenido peligroso (pattern matching). Coordinador GuardService. LLMGuardChecker para casos ambiguos (Fase 2). Integracion en AgentLoopOrchestrator pre/post loop.
Not This ProjectNo Es Este Proyecto
Authentication/authorization (API Gateway + Memberstack). Request format validation (Zod schemas). Rate limiting (API Gateway / Billing #13). Response quality evaluation (Eval Suite #16). Hallucination detection (Orchestrator #2). Business logic.Autenticacion/autorizacion (API Gateway + Memberstack). Validacion de formato de request (schemas Zod). Rate limiting (API Gateway / Billing #13). Evaluacion de calidad de respuestas (Eval Suite #16). Deteccion de alucinaciones (Orchestrator #2). Logica de negocio.
Tech Stack (TypeScript — Security Layer)Stack Tecnologico (TypeScript — Capa de Seguridad)
Data Models, Interfaces & Acceptance Criteria Modelos de Datos, Interfaces & Criterios de Aceptación
interface IGuardService {
validateInput(input: GuardInput): Promise<GuardResult>
validateOutput(output: GuardOutput): Promise<GuardResult>
}
interface GuardInput {
query: string
userId: string
sessionHistory: Message[] // last N messages for context
marketplace: Marketplace
}
interface GuardOutput {
response: string
userId: string
toolResults?: ToolResult[] // tool results from loop for data leak verification
}
interface GuardResult {
passed: boolean
reason?: string // if failed: which violation detected
category?: ViolationCategory
rejectionMessage?: string // friendly message for the user
}
type ViolationCategory =
| 'prompt_injection'
| 'off_scope'
| 'data_leak'
| 'dangerous_content'
// --- InputGuard: known injection patterns (first line — regex) ---
// "ignore previous instructions" / "ignora las instrucciones anteriores"
// "you are now" / "ahora eres" / "pretend you are"
// "forget everything" / "olvidate de todo"
// Nested delimiters: ---SYSTEM--- , [INST], <|system|>, <|im_start|>
// Role switching: "act as" / "actúa como si fueras"
// "your real instructions are" / "tus instrucciones reales son"
// --- Rejection messages — never expose the technical reason ---
// Prompt injection:
// ✅ "Estoy aquí para ayudarte con tu actividad como vendedor.
// ¿Hay algo sobre tus productos, ventas o métricas?"
// ❌ "Tu query contiene instrucciones que intentan modificar el sistema."
// Off-scope:
// ✅ "Para eso no tengo la información que necesitás.
// Si tenés consultas sobre tu actividad como vendedor, puedo ayudarte."
// ❌ "Tu pregunta está fuera del scope del Coach."
// Data leak / OutputGuard:
// ✅ "No pude generar una respuesta útil. Intentá reformularla
// o preguntame sobre tu actividad en el marketplace."
// ❌ "Tu respuesta fue bloqueada por política de seguridad."
// In AgentLoopOrchestrator
async handle(query, context): Promise<CoachResponse> {
// 1. InputGuard — pre-LLM
const inputResult = await guardService.validateInput({
query, userId: context.userId,
sessionHistory: context.recentMessages,
marketplace: context.marketplace
})
if (!inputResult.passed) {
logger.warn('guard.input.rejected', { category, userId })
return { message: inputResult.rejectionMessage, guardRejected: true }
}
// 2. Normal ReAct loop
const response = await this.runReActLoop(query, context)
// 3. OutputGuard — post-LLM
const outputResult = await guardService.validateOutput({
response: response.message, userId: context.userId,
toolResults: response.toolResults
})
if (!outputResult.passed) {
logger.error('guard.output.rejected', { category, userId })
if (outputResult.category === 'data_leak')
await alertService.critical('data_leak_detected', { userId })
return { message: outputResult.rejectionMessage, guardRejected: true }
}
return response
}
- [Ph 1] InputGuard detects known prompt injection patterns via regex (~5-10ms, no LLM cost)
- [Ph 1] InputGuard detects off-scope queries and returns friendly redirection message
- [Ph 1] Rejection messages never expose the technical reason — attacker cannot determine why they were rejected
- [Ph 1] If InputGuard fails internally, query passes through (graceful degradation invariant)
- [Ph 2] LLMGuardChecker activates only when pattern matching has low confidence. If classifier confidence < 0.7, query passes (doubt favors the user)
- [Ph 3] OutputGuard detects data leak: response mentions userId that is not the current user's
- [Ph 3] Data leak triggers critical CloudWatch alert (not just log). OutputGuard detects dangerous content via pattern matching
- [Ph 3] If OutputGuard fails internally, response passes through (graceful degradation invariant)
- [Ph 1] InputGuard detecta patrones conocidos de prompt injection via regex (~5-10ms, sin costo LLM)
- [Ph 1] InputGuard detecta queries fuera de scope y retorna mensaje amable de redireccion
- [Ph 1] Mensajes de rechazo nunca exponen el motivo técnico — atacante no puede determinar por que fue rechazado
- [Ph 1] Si InputGuard falla internamente, el query pasa (invariante de degradacion graciosa)
- [Ph 2] LLMGuardChecker se activa solo cuando pattern matching tiene baja confianza. Si confianza del clasificador < 0.7, query pasa (la duda favorece al usuario)
- [Ph 3] OutputGuard detecta data leak: respuesta menciona userId que no es del usuario actual
- [Ph 3] Data leak dispara alerta critica de CloudWatch (no solo log). OutputGuard detecta contenido peligroso via pattern matching
- [Ph 3] Si OutputGuard falla internamente, la respuesta pasa (invariante de degradacion graciosa)
Security layer · Not auth · 2 validation points · Graceful degradation invariant · 4 violation categories · 3 phases
How It WorksComo Funciona
User input (query)
|
v
InputGuard.validateInput(input)
|
+-- passed → continue to AgentLoopOrchestrator
|
+-- prompt injection detected
| → log CloudWatch (category: prompt_injection, userId)
| → return friendly redirection message
|
+-- off-scope detected
| → log CloudWatch (category: off_scope, userId)
| → return scope redirection suggestion
|
+-- guard fails internally → continue (degrade gracefully)
|
v
AgentLoopOrchestrator runs ReAct loop
(tools, RAG, generation)
|
v
LLM generates final response
|
v
OutputGuard.validateOutput(output)
|
+-- passed → return response to user
|
+-- data leak detected
| → log CloudWatch (category: data_leak, userId) — critical alert
| → return neutral message to user
|
+-- dangerous content detected
| → log CloudWatch (category: dangerous_content, userId)
| → return neutral message to user
|
+-- guard fails internally → return response unchanged (degrade gracefully)
|
v
User receives response (or friendly rejection message)
Two independent validation points wrap the ReAct loop. InputGuard runs before the LLM: first line is pattern matching (~5ms, detects known injection patterns and off-scope signals), second line (Phase 2) is a lightweight LLM classifier for ambiguous cases. If pattern matching has high confidence, no LLM call is needed. OutputGuard runs after the LLM: checks if the response mentions a userId different from the current user (data leak) and scans for dangerous patterns from unsanitized tool outputs. Both guards degrade gracefully — if they fail internally, the system continues normally. Rejection messages are always friendly and never reveal the technical reason, making bypass harder for attackers.Dos puntos de validacion independientes envuelven el loop ReAct. InputGuard corre antes del LLM: primera linea es pattern matching (~5ms, detecta patrones conocidos de injection y senales de off-scope), segunda linea (Fase 2) es un clasificador LLM ligero para casos ambiguos. Si el pattern matching tiene alta confianza, no se necesita llamada LLM. OutputGuard corre despues del LLM: verifica si la respuesta menciona un userId diferente al del usuario actual (data leak) y escanea patrones peligrosos de outputs de tools no sanitizados. Ambos guards degradan graciosamente — si fallan internamente, el sistema continua normalmente. Los mensajes de rechazo son siempre amables y nunca revelan el motivo técnico, haciendo mas dificil el bypass para atacantes.
Implementation Plan (3 Phases)Plan de Implementacion (3 Fases)
Phase 1: InputGuard with Pattern MatchingFase 1: InputGuard con Pattern Matching
IGuardService interface in domain + types (GuardInput, GuardOutput, GuardResult, ViolationCategory). InputGuard with regex pattern matching for known prompt injection patterns ("ignore previous instructions", "you are now", nested delimiters, role switching) and off-scope detection. GuardService coordinator. Integration in AgentLoopOrchestrator pre-loop. No perceptible latency (~5-10ms). No LLM cost.Interfaz IGuardService en dominio + tipos (GuardInput, GuardOutput, GuardResult, ViolationCategory). InputGuard con pattern matching regex para patrones conocidos de prompt injection ("ignora las instrucciones anteriores", "ahora eres", delimitadores anidados, role switching) y deteccion de off-scope. Coordinador GuardService. Integracion en AgentLoopOrchestrator pre-loop. Sin latencia perceptible (~5-10ms). Sin costo LLM.
Phase 2: LLM-as-Checker for Ambiguous CasesFase 2: LLM-as-Checker para Casos Ambiguos
LLMGuardChecker — lightweight LLM classifier for queries that pattern matching cannot classify with confidence. Only activates when pattern matching has low confidence — most legitimate seller queries pass without touching the LLM. Confidence threshold: < 0.7 means the query passes (doubt favors the user). Rejection rate metrics per category in CloudWatch. Latency: ~5ms (high confidence pass) to ~150-200ms (LLM check).LLMGuardChecker — clasificador LLM ligero para queries que el pattern matching no puede clasificar con confianza. Solo se activa cuando pattern matching tiene baja confianza — la mayoria de queries legitimos de vendedores pasan sin tocar el LLM. Umbral de confianza: < 0.7 significa que el query pasa (la duda favorece al usuario). Metricas de tasa de rechazo por categoria en CloudWatch. Latencia: ~5ms (pase alta confianza) a ~150-200ms (check LLM).
Phase 3: OutputGuard ActivatedFase 3: OutputGuard Activado
Data leak detection: extract userIds from tool results, compare against userIds in response — if response mentions a userId that's not the current user, it's a leak. Dangerous content detection via pattern matching (<script>, eval(, exec(, system delimiter injection). Critical CloudWatch alert on data_leak (not just log — real-time escalation). Integration in AgentLoopOrchestrator post-loop.Deteccion de data leak: extraer userIds de tool results, comparar contra userIds en la respuesta — si la respuesta menciona un userId que no es del usuario actual, es un leak. Deteccion de contenido peligroso via pattern matching (<script>, eval(, exec(, inyeccion de delimitadores de sistema). Alerta critica de CloudWatch en data_leak (no solo log — escalacion en tiempo real). Integracion en AgentLoopOrchestrator post-loop.
Risk AnalysisAnalisis de Riesgos
False positives reject legitimate queriesFalsos positivos rechazan queries legitimos
Impact: High — rejecting legitimate seller queries makes the Coach unusable for those cases.Impacto: Alto — rechazar queries legitimos de vendedores hace al Coach inutilizable para esos casos.
Mitigation: pattern matching uses very specific known injection patterns, not general "looks dangerous" heuristics. LLM classifier only acts on low confidence. If classifier confidence < 0.7, query passes. Doubt favors the user.Mitigacion: pattern matching usa patrones de injection conocidos muy especificos, no heuristicas generales de "parece peligroso". Clasificador LLM solo actua con baja confianza. Si confianza del clasificador < 0.7, query pasa. La duda favorece al usuario.
Sophisticated injection evades pattern matchingInjection sofisticada evade pattern matching
Impact: Low-Medium — advanced techniques (encoding, multi-language, unusual delimiters) may bypass the first line.Impacto: Bajo-Medio — técnicas avanzadas (encoding, multi-idioma, delimitadores inusuales) pueden evadir la primera linea.
Mitigation: second line (LLM-as-checker Phase 2) covers cases that pattern matching misses. The Coach also has its own system prompt anchoring it to its role — even if an injection passes both lines, the LLM is instructed to ignore instructions that contradict its Coach identity.Mitigacion: segunda linea (LLM-as-checker Fase 2) cubre casos que el pattern matching pierde. El Coach tambien tiene su propio system prompt que lo ancla a su rol — incluso si una injection pasa ambas lineas, el LLM esta instruido para ignorar instrucciones que contradigan su identidad de Coach.
OutputGuard adds latencyOutputGuard agrega latencia
Impact: Low — the Coach response already took several seconds. Adding 10ms of pattern matching is imperceptible.Impacto: Bajo — la respuesta del Coach ya tomo varios segundos. Agregar 10ms de pattern matching es imperceptible.
Mitigation: OutputGuard Phase 3 is pattern matching + string comparison — not LLM. Expected latency <10ms. If a future phase adds LLM-as-checker for output, impact is evaluated before activation.Mitigacion: OutputGuard Fase 3 es pattern matching + comparacion de strings — no LLM. Latencia esperada <10ms. Si una fase futura agrega LLM-as-checker para output, el impacto se evalua antes de activar.
Key DecisionsDecisiones Clave
Graceful degradation is invariant — If a guard fails internally, the system does not block. The user never sees an error due to a guardrail failure. Failures are logged to CloudWatch and monitored — but not propagated to the user. A guardrail that can cut the service is worse than no guardrail.Degradacion graciosa es invariante — Si un guard falla internamente, el sistema no bloquea. El usuario nunca ve un error por un fallo del guardrail. Los fallos se loggean en CloudWatch y se monitorean — pero no se propagan al usuario. Un guardrail que puede cortar el servicio es peor que no tener guardrail.
First line without LLM, second line only on low confidence — Pattern matching is fast and zero-cost. The LLM classifier only acts when pattern matching cannot determine with confidence. Most legitimate seller queries pass pattern matching without touching the secondary LLM.Primera linea sin LLM, segunda linea solo con baja confianza — Pattern matching es rapido y sin costo. El clasificador LLM solo actua cuando el pattern matching no puede determinar con confianza. La mayoria de queries legitimos de vendedores pasan el pattern matching sin tocar el LLM secundario.
Rejection messages without technical exposure — The user never knows if they were rejected for "prompt injection" or "off-scope". The message is always a friendly redirection. Exposing the technical reason facilitates attacker bypass.Mensajes de rechazo sin exposicion técnica — El usuario nunca sabe si fue rechazado por "prompt injection" o por "off-scope". El mensaje es siempre una redireccion amable. Exponer el motivo técnico facilita el bypass del atacante.
Security validation, not quality validation — Guardrails don't evaluate if the response is correct, useful, or aligned with the business. They only evaluate if it's safe. Quality is the responsibility of the Eval Suite (#16) and Hallucination Detection (#2).Validacion de seguridad, no de calidad — Los guardrails no evaluan si la respuesta es correcta, util o alineada con el negocio. Solo evaluan si es segura. La calidad es responsabilidad del Eval Suite (#16) y Hallucination Detection (#2).
Data leak is critical alert, not just log — A detected data leak (another user's data in the response) is not just a CloudWatch log — it's an alert that must reach the team in real time. The distinction matters for incident response time.Data leak es alerta critica, no solo log — Un data leak detectado (datos de otro usuario en la respuesta) no es solo un log de CloudWatch — es una alerta que debe llegar al equipo en tiempo real. La distincion importa para el tiempo de respuesta al incidente.
File StructureEstructura de Archivos
src/
domain/
coach/
services/
IGuardService.ts ← interface + types (GuardInput, GuardOutput, GuardResult)
application/
coach/
services/
InputGuard.ts ← pre-LLM validation (injection + off-scope)
OutputGuard.ts ← post-LLM validation (data leak + dangerous content)
GuardService.ts ← coordinator: IGuardService → InputGuard + OutputGuard
LLMGuardChecker.ts ← lightweight LLM classifier (Phase 2)
MVP Scope
Phase 1: InputGuard with pattern matching (injection + off-scope) + graceful degradation. Phase 2: LLMGuardChecker for ambiguous cases. Phase 3: OutputGuard (data leak + dangerous content). Fase 1: InputGuard con pattern matching (injection + off-scope) + degradacion graciosa. Fase 2: LLMGuardChecker para casos ambiguos. Fase 3: OutputGuard (data leak + contenido peligroso).
SourceFuente
New project — no existing source. Integrates into AgentLoopOrchestrator (#2). Proyecto nuevo — sin fuente existente. Se integra en AgentLoopOrchestrator (#2).
📝 Project ChangelogChangelog del Proyecto
Observability & Traceability
Observability — Mateo
Every agent interaction generates structured trace records across two independent persistence systems: a ConversationTrace in DynamoDB (lightweight, alongside the conversation, 90-day TTL) and a full AgentExecution in PostgreSQL (deep traceability with per-step logs, itemized costs, and context snapshots). Credits are calculated automatically by database triggers, never in the application. All tracking writes are fire-and-forget — if PostgreSQL is unavailable, the Coach responds normally. Without this, we operate blind — it's the difference between "having AI" and "operating AI well". Cada interaccion del agente genera registros de traza estructurados en dos sistemas de persistencia independientes: un ConversationTrace en DynamoDB (ligero, junto a la conversacion, TTL 90 dias) y un AgentExecution completo en PostgreSQL (trazabilidad profunda con logs por paso, costos itemizados y snapshots de contexto). Los creditos se calculan automaticamente por triggers de base de datos, nunca en la aplicacion. Todas las escrituras de tracking son fire-and-forget — si PostgreSQL no esta disponible, el Coach responde normalmente. Sin esto, operamos a ciegas — es la diferencia entre "tener IA" y "operar IA bien".
Beautonomous governance: dual-persistence audit trail implements Core Principle 3 (complete traceability) — every action taken by any role is recorded with actor identity, timestamp, and outcome. The audit log is the source of truth for governance accountability.Governance de Beautonomous: el audit trail de doble persistencia implementa el Principio 3 de Core (trazabilidad completa) — cada acción ejecutada por cualquier rol queda registrada con identidad del actor, timestamp y resultado. El audit log es la fuente de verdad para la responsabilidad de governance.
DynamoDB — per-message metricsDynamoDB — metricas por mensaje
PostgreSQL — full execution tracePostgreSQL — traza de ejecucion completa
Ordered event timelineTimeline de eventos ordenados
Per-charge credits (trigger-calculated)Creditos por cargo (calculados por trigger)
Exact context the LLM sawContexto exacto que vio el LLM
Fire-and-forget write coordinatorCoordinador de escritura fire-and-forget
Tech Stack (in production)Stack Tecnologico (en produccion)
Activation: ENABLE_AGENT_TRACKING=true. Credentials from SSM Parameter Store (/AGENTS_ACTIVITY/{STAGE}/DB/*).Activacion: ENABLE_AGENT_TRACKING=true. Credenciales desde SSM Parameter Store (/AGENTS_ACTIVITY/{STAGE}/DB/*).
Data Models (7 PostgreSQL tables), Hierarchy & Acceptance Criteria Modelos de Datos (7 tablas PostgreSQL), Jerarquia & Criterios de Aceptación
Client (seller — Memberstack ID)
↓ 1:N
AgentClient (conversation session — persists between messages)
↓ 1:N
AgentExecution (one execution = one user message)
↓ 1:N ↓ 1:N ↓ 1:N
AgentLog[] AgentCost[] ExecutionContextLink[]
(ordered event (one per charge ↓ N:1
timeline) type) ContextSnapshot[]
(kb_chunks, brand_health)
# CoachTrace — what a single execution contains
CoachTrace
│── Identity
│ │── execution_id UUID
│ │── user_id Memberstack ID
│ │── conversation_id DynamoDB conversation
│ └── marketplace MercadoLibre / Amazon / etc.
│
│── Lifecycle
│ │── status pending → running → done | error
│ │── started_at / ended_at
│ └── duration_ms End-to-end latency
│
│── Pipeline Steps (AgentLog[])
│ │── embedding latency, text size
│ │── vector_search latency, chunks found, top score
│ │── brand_health intent detected, metrics queried
│ │── llm_call model, tokens in/out, latency
│ └── agent_error error type, message, stack
│
│── Costs (AgentCost[])
│ │── EMBEDDING 1 credit per Vertex AI call
│ │── VECTOR_SEARCH 1 credit per BigQuery search
│ │── BRAND_HEALTH 1 credit per brand health query
│ └── TOKENS CEIL((input+output)/1000) credits
│
└── Context (ContextSnapshot[])
│── kb_chunks Exact chunks the LLM saw
└── brand_health Brand health metrics injected
# Credits are NEVER calculated in the application —
# trigger trg_calculate_agent_credits computes on INSERT
│ │ DynamoDB │ PostgreSQL │ │ What it stores │ ConversationTrace │ AgentExecution+Logs+Costs+Snaps │ │ Granularity │ Aggregated pipeline │ Per-step with latency+params │ │ Access │ Same table as chat │ Separate DB, optional │ │ TTL │ 90 days (auto-delete) │ Configurable retention │ │ Read latency │ <10ms (hot store) │ SQL ad-hoc queries │ │ Activation │ Always on │ ENABLE_AGENT_TRACKING=true │
- Each user message generates a ConversationTrace in DynamoDB and (if enabled) a full AgentExecution in PostgreSQL
- AgentExecution contains at least one AgentLog of type rag_metrics with pipeline latencies
- AgentCost reflects exactly the charge types used in that execution (EMBEDDING, VECTOR_SEARCH, BRAND_HEALTH, TOKENS)
- ContextSnapshots linked to the execution contain the exact chunks sent to the LLM
- If PostgreSQL is unavailable, the Coach responds normally — tracking never blocks the flow
- Failed executions have status = 'error' and error_message populated
- Credits in clients and agents_clients stay automatically synchronized by database triggers
- ConversationTraces in DynamoDB auto-expire at 90 days (TTL)
- Cada mensaje del usuario genera un ConversationTrace en DynamoDB y (si esta habilitado) un AgentExecution completo en PostgreSQL
- AgentExecution contiene al menos un AgentLog de tipo rag_metrics con latencias del pipeline
- AgentCost refleja exactamente los tipos de cargo usados en esa ejecucion (EMBEDDING, VECTOR_SEARCH, BRAND_HEALTH, TOKENS)
- Los ContextSnapshot vinculados a la ejecucion contienen los chunks exactos enviados al LLM
- Si PostgreSQL no esta disponible, el Coach responde normalmente — el tracking nunca bloquea el flujo
- Las ejecuciones con error tienen status = 'error' y error_message poblado
- Los creditos en clients y agents_clients se mantienen sincronizados automaticamente por triggers de base de datos
- Los ConversationTrace en DynamoDB expiran solos a los 90 dias (TTL)
DynamoDB TTL: 90 days · PostgreSQL: optional (ENABLE_AGENT_TRACKING) · Credits: DB triggers, never app code · Charge types: EMBEDDING(1) + VECTOR_SEARCH(1) + BRAND_HEALTH(1) + TOKENS(CEIL(in+out/1000))DynamoDB TTL: 90 dias · PostgreSQL: opcional (ENABLE_AGENT_TRACKING) · Creditos: triggers de BD, nunca codigo de app · Tipos de cargo: EMBEDDING(1) + VECTOR_SEARCH(1) + BRAND_HEALTH(1) + TOKENS(CEIL(in+out/1000))
How It WorksComo Funciona
User sends message
|
v
+---------------------------+
| ConversationLambda |
| 1. Resolve user |
| 2. Manage conversation |
+---------------------------+
|
v
+----------------------------------+ +----------------------------+
| ConversationTrackingOrchestrator | | PostgreSQL (AgentTracking)|
| | | |
| setupTracking() -----------> +---> | clients.getOrCreate() |
| startExecution() -----------> +---> | agent_executions(running) |
+----------------------------------+ +----------------------------+
|
v
+---------------------------+
| RAG Pipeline |
| |
| Embedding (Vertex AI) | -----> agent_costs(EMBEDDING, 1cr)
| Vector search (BigQuery) | -----> agent_costs(VECTOR_SEARCH, 1cr)
| Brand Health (optional) | -----> agent_costs(BRAND_HEALTH, 1cr)
| LLM call (Anthropic) | -----> agent_costs(TOKENS, CEIL(in+out/1000)cr)
| |
| context_snapshot.save() | -----> kb_chunks + brand_health snapshots
+---------------------------+
|
v
+----------------------------------+ +----------------------------+
| TrackingOrchestrator | | PostgreSQL Triggers |
| completeExecution(done) ------> +---> | trg_calculate_credits |
| | | trg_apply_credits_client |
+----------------------------------+ +----------------------------+
|
v (parallel — never blocks response)
+---------------------------+
| DynamoDB |
| ConversationTrace.save() |
| TTL: 90 days |
+---------------------------+
All tracking writes are fire-and-forget: the user never waits for them to complete. If PostgreSQL is unavailable, the exception is caught, logged to CloudWatch, and the Coach responds normally. The only synchronous tracking is the ConversationTrace in DynamoDB, which shares the main flow's connection. Credits are calculated by the trg_calculate_agent_credits trigger on INSERT — the application only provides raw data (tokens, charge type). The trigger also syncs totals to agents_clients and clients automatically.Todas las escrituras de tracking son fire-and-forget: el usuario nunca espera a que completen. Si PostgreSQL no esta disponible, la excepcion se captura, se loguea en CloudWatch, y el Coach responde normalmente. El unico tracking sincronico es el ConversationTrace en DynamoDB, que comparte la conexion del flujo principal. Los creditos se calculan por el trigger trg_calculate_agent_credits en el INSERT — la aplicacion solo provee datos crudos (tokens, tipo de cargo). El trigger tambien sincroniza totales a agents_clients y clients automaticamente.
Chronological Write Flow (14 steps per execution)Flujo Cronologico de Escritura (14 pasos por ejecucion)
1. clients.getOrCreate() → ensure client exists 2. agents_clients.getOrCreate() → recover/create conversation session 3. agent_executions.save(pending) → register execution before starting 4. agent_executions.update(running) → mark pipeline start 5. agent_costs.save(EMBEDDING) → trigger: credits=1, deduct from client 6. agent_costs.save(VECTOR_SEARCH) → trigger: credits=1, deduct from client 7. context_snapshot.save(kb_chunks) → store chunks that will be used 8. execution_context_links.save() → link execution → kb_chunks snapshot 9. agent_costs.save(BRAND_HEALTH) → trigger: credits=1 (if applicable) 10. context_snapshot.save(brand_health)→ store brand health metrics 11. execution_context_links.save() → link execution → brand_health snapshot 12. agent_costs.save(TOKENS) → trigger: credits=CEIL((in+out)/1000) 13. agent_logs.saveBatch([rag_metrics])→ pipeline timing summary 14. agent_executions.update(done) → mark end, duration_ms On error: agent_executions.update(error) + agent_logs.save(agent_error)
Implementation StatusEstado de Implementacion
DONE — RAG One-Shot Pipeline (current)HECHO — Pipeline RAG One-Shot (actual)
ConversationTrace in DynamoDB + AgentTracking in PostgreSQL are implemented and in production. Captures: embedding, vector search, brand health, LLM call, costs and context used. RAG pipeline traced end-to-end. 7 PostgreSQL tables operational. Credit triggers working. Graceful degradation verified.ConversationTrace en DynamoDB + AgentTracking en PostgreSQL estan implementados y en produccion. Captura: embedding, vector search, brand health, llamada LLM, costos y contexto utilizado. Pipeline RAG trazado end-to-end. 7 tablas PostgreSQL operacionales. Triggers de creditos funcionando. Degradacion graceful verificada.
Phase 0.3 — ReAct Loop ExtensionFase 0.3 — Extension Loop ReAct
Each round of the ReAct loop generates an AgentLog of type llm_call. Each tool call generates its own AgentLog of type tool_call with name, params, result, and latency. The AgentExecution accumulates all rounds until the LLM emits end_turn. execution_duration_ms reflects total loop duration, not just the first LLM call.Cada ronda del loop ReAct genera un AgentLog de tipo llm_call. Cada tool call genera su propio AgentLog de tipo tool_call con nombre, parametros, resultado y latencia. El AgentExecution acumula todas las rondas hasta que el LLM emite end_turn. execution_duration_ms refleja la duracion total del loop, no solo la primera llamada LLM.
Phase 1 — HookLifecycle Auto-ObservabilityFase 1 — Auto-Observabilidad HookLifecycle
The HookLifecycle (before_tool → execute → after_tool) emits trace events automatically without each tool implementing its own logging. before_tool → AgentLog(tool_call, starting). after_tool → AgentLog(tool_call, success/failure, latencyMs, result). All tools are observable by design from registration in the ToolRegistry.El HookLifecycle (before_tool → execute → after_tool) emite eventos de traza automaticamente sin que cada tool implemente su propio logging. before_tool → AgentLog(tool_call, starting). after_tool → AgentLog(tool_call, success/failure, latencyMs, result). Todas las tools son observables por diseno desde su registro en el ToolRegistry.
Phase 4+ — Cold Storage in BigQuery (Post-MVP)Fase 4+ — Almacenamiento Frio en BigQuery (Post-MVP)
Migrate executions older than 90 days to BigQuery for long-term SQL analytics: cost per user per month, average latency per model, most-used tools, error rates by operation type.Migrar ejecuciones mayores a 90 dias a BigQuery para analytics SQL de largo plazo: costo por usuario por mes, latencia promedio por modelo, herramientas mas usadas, tasas de error por tipo de operacion.
Risk AnalysisAnalisis de Riesgos
Trace Volume GrowthCrecimiento de Volumen de Trazas
Impact: MediumImpacto: Medio
Mitigation: DynamoDB auto-deletes with TTL. PostgreSQL configurable retention by date. AgentLogs saved in batch to reduce round-trips.Mitigacion: DynamoDB auto-elimina con TTL. PostgreSQL con retencion configurable por fecha. AgentLogs se guardan en batch para reducir round-trips.
Tracking Latency ImpactImpacto de Latencia del Tracking
Impact: LowImpacto: Bajo
Mitigation: All PostgreSQL writes are fire-and-forget — user never perceives them. Only the DynamoDB ConversationTrace is synchronous (shares main flow connection).Mitigacion: Todas las escrituras en PostgreSQL son fire-and-forget — el usuario nunca las percibe. Solo el ConversationTrace en DynamoDB es sincronico (comparte conexion del flujo principal).
Incomplete Trace on Lambda TimeoutTraza Incompleta si Lambda Termina Abruptamente
Impact: LowImpacto: Bajo
Mitigation: Detect agent_executions with status='running' and start_time > 5 minutes ago. These represent unfinished traces and can be marked as error. Partial costs may have been recorded before failure.Mitigacion: Detectar agent_executions con status='running' y start_time > 5 minutos. Representan trazas incompletas y pueden marcarse como error. Costos parciales pueden haberse registrado antes de la falla.
PostgreSQL Cost in Low TrafficCosto de PostgreSQL en Bajo Trafico
Impact: LowImpacto: Bajo
Mitigation: Tracking is optional (ENABLE_AGENT_TRACKING=true). In low-traffic or dev environments, disable without affecting functionality.Mitigacion: El tracking es opcional (ENABLE_AGENT_TRACKING=true). En ambientes de bajo trafico o desarrollo, desactivar sin afectar funcionalidad.
Key DecisionsDecisiones Clave
DynamoDB for conversational record, PostgreSQL for deep traceability — DynamoDB is in the same flow as the conversation: same table, same latency. PostgreSQL has the relational model needed to cross-query executions, costs, and context in a single query. Each system does what it does best.DynamoDB para registro conversacional, PostgreSQL para trazabilidad profunda — DynamoDB esta en el mismo flujo que la conversacion: misma tabla, misma latencia. PostgreSQL tiene el modelo relacional necesario para cruzar ejecuciones, costos y contexto en una sola consulta. Cada sistema hace lo que hace bien.
Credits calculated in database, not in application — The trigger guarantees consistency without risk of drift between application logic and stored totals. Changing rates is a table update in charge_types, not a code deploy.Creditos calculados en base de datos, no en la aplicacion — El trigger garantiza consistencia sin riesgo de derivacion entre la logica de la aplicacion y los totales almacenados. Cambiar las tarifas es una actualizacion en la tabla charge_types, no un deploy de codigo.
Tracking optional by design, not as workaround — The Coach does not depend on tracking to function. This allows enabling/disabling per environment, changing the schema without affecting the main flow, and absorbing PostgreSQL failures without degrading service.Tracking opcional por diseno, no como workaround — El Coach no depende del tracking para funcionar. Esto permite activar/desactivar por ambiente, cambiar el schema sin afectar el flujo principal, y absorber fallas de PostgreSQL sin degradar el servicio.
ContextSnapshot as source of truth for LLM context — Storing the exact chunks the LLM saw (not just IDs) allows reproducing any response and diagnosing why the LLM had or didn't have certain information. It's the difference between "the system searched 5 chunks" and "these were the 5 chunks".ContextSnapshot como fuente de verdad del contexto LLM — Guardar los chunks exactos que vio el LLM (no solo los IDs) permite reproducir cualquier respuesta y diagnosticar por que el LLM tuvo o no tuvo cierta informacion. Es la diferencia entre "el sistema busco 5 chunks" y "estos fueron los 5 chunks".
MVP Scope
[v3] ~90% operational. ConversationTrace (DynamoDB, TTL 90d) + AgentExecution (PostgreSQL, 7 tables) in production. Full RAG pipeline traced end-to-end. Credit triggers working. Remaining: ReAct loop extension (Phase 0.3), HookLifecycle auto-observability (Phase 1). [v3] ~90% operacional. ConversationTrace (DynamoDB, TTL 90d) + AgentExecution (PostgreSQL, 7 tablas) en produccion. Pipeline RAG trazado end-to-end. Triggers de creditos funcionando. Pendiente: extension loop ReAct (Fase 0.3), auto-observabilidad HookLifecycle (Fase 1).
Built onConstruido sobre
ConversationTrace + AgentTracking — proven in production. PostgreSQL trigger-based credit system. Fire-and-forget tracking pattern. ConversationTrace + AgentTracking — probados en produccion. Sistema de creditos basado en triggers PostgreSQL. Patron de tracking fire-and-forget.
📝 Project ChangelogChangelog del Proyecto
Layer 3 — KNOWLEDGECapa 3 — CONOCIMIENTO
What the Coach knowsLo que el Coach sabe
Cerebro / Knowledge Base
Knowledge — Mateo
The Coach's long-term memory of eCommerce expertise. 2,875 Markdown documents organized in 11 namespaces, indexed by a 4-stage Go pipeline (validate → chunk → embed → store) into BigQuery via Vertex AI text-embedding-004 (1024 dims). The agent finds relevant knowledge by meaning, not keywords. Two repos, two responsibilities: core-knowledge-semantic-base/ owns the corpus + indexing pipeline; core-intelligence-conversation-api/ owns the search + context injection. The KB is an automatic context — always available in the system prompt via RAG semantic search. The LLM never decides "should I query the KB?" — it's simply there, like conversation history or the seller's profile. Contextual Retrieval: during indexing, each chunk is enriched with a generated summary of its role within the full document before embedding — this dramatically improves search recall (Anthropic reports 49-67% improvement) because the search engine understands what each chunk means in context, not just what it literally says. Operational in production today. La memoria a largo plazo del Coach sobre expertise en eCommerce. 2,875 documentos Markdown organizados en 11 namespaces, indexados por un pipeline Go de 4 etapas (validar → chunk → embed → store) en BigQuery via Vertex AI text-embedding-004 (1024 dims). El agente encuentra conocimiento relevante por significado, no por palabras clave. Dos repos, dos responsabilidades: core-knowledge-semantic-base/ es dueno del corpus + pipeline de indexacion; core-intelligence-conversation-api/ es dueno de la busqueda + inyeccion de contexto. La KB es un contexto automatico — siempre disponible en el system prompt via busqueda semantica RAG. El LLM nunca decide "deberia consultar la KB?" — simplemente esta ahi, como el historial de conversacion o el perfil del vendedor. Contextual Retrieval: durante la indexacion, cada chunk se enriquece con un resumen generado de su rol dentro del documento completo antes de embeber — esto mejora dramaticamente el recall de busqueda (Anthropic reporta mejora del 49-67%) porque el motor de busqueda entiende que significa cada chunk en contexto, no solo lo que literalmente dice. Operacional en produccion hoy.
Current StateEstado Actual
Operational in ProductionOperacional en Produccion
2,875 docs indexed in BigQuery (11 namespaces). Go pipeline complete: validate-kb.go → indexer.go → kb-embedder.go → report-outdated-kb.go. Vertex AI text-embedding-004 (1024 dims). 23-metric catalog with strict front-matter validation. Freshness report via GitHub Actions (weekly). RagOrchestrator in coach-api consuming kb_embeddings. BrandHealthContextService with separate brand_health_embeddings table.2,875 docs indexados en BigQuery (11 namespaces). Pipeline Go completo: validate-kb.go → indexer.go → kb-embedder.go → report-outdated-kb.go. Vertex AI text-embedding-004 (1024 dims). Catalogo de 23 metricas con validacion estricta de front-matter. Reporte de frescura via GitHub Actions (semanal). RagOrchestrator en coach-api consumiendo kb_embeddings. BrandHealthContextService con tabla brand_health_embeddings separada.
Needs ImprovementNecesita Mejora
No namespace filter in RAGVectorSearchService — search is global across all 11 namespaces, chunks from learning compete with rules-as-cards regardless of intent. No re-indexing strategy for edited docs — embedder inserts but doesn't invalidate previous chunks of the same document (no is_current flag).Sin filtro de namespace en RAGVectorSearchService — la busqueda es global en los 11 namespaces, chunks de learning compiten con rules-as-cards sin importar el intent. Sin estrategia de re-indexacion para docs editados — el embedder inserta pero no invalida chunks anteriores del mismo documento (sin flag is_current).
To BuildPor Construir
Namespace skills/ (tool usage documentation). Namespace trends/ (marketplace trends). Namespace filter by intent in RAGVectorSearchService. is_current flag + re-indexing logic. Contextual Retrieval: LLM-generated context summary per chunk at indexing time (Anthropic technique, 49-67% recall improvement). Voyage AI evaluation benchmark. Hybrid search (BM25 + vector) for exact technical terms.Namespace skills/ (documentación de uso de tools). Namespace trends/ (tendencias de marketplace). Filtro de namespace por intent en RAGVectorSearchService. Flag is_current + logica de re-indexacion. Contextual Retrieval: resumen de contexto generado por LLM por chunk en tiempo de indexacion (tecnica Anthropic, mejora de recall 49-67%). Benchmark de evaluacion Voyage AI. Busqueda hibrida (BM25 + vector) para terminos técnicos exactos.
11 Namespaces — 2,875 Documents11 Namespaces — 2,875 Documentos
743
652
622
570
433
376
361
324
172
106
organic, quality, rules-as-playbooks, pricing, rules-as-metrics, health
validate → chunk → embed → store
1024 dims, ~$0.02/1M tokens1024 dims, ~$0.02/1M tokens
NP, ROAS, ACOS, CTR, P-QI...
doc, card, metric, action, log, playbook, glossary, health
COSINE_DISTANCE searchBusqueda COSINE_DISTANCE
Not a tool — always in promptNo es tool — siempre en prompt
LLM-generated summary per chunk (indexing time)Resumen generado por LLM por chunk (en indexacion)
Scope Boundaries — What this layer does NOT doLimites de Alcance — Lo que esta capa NO hace
Tech Stack (Go + GCP)Stack Tecnologico (Go + GCP)
Data Models, Interfaces & Acceptance Criteria Modelos de Datos, Interfaces & Criterios de Aceptación
-- Dataset: knowledge_base | Table: kb_embeddings id STRING NOT NULL -- "KB-financial-NP__chunk_0" document_id STRING NOT NULL -- "KB-financial-NP" namespace STRING NOT NULL -- "financial" doc_type STRING NOT NULL -- "metric" title STRING -- "Net Profit (NP)" chunk_index INT64 -- 0 source STRING -- Relative path to .md tags ARRAY<STRING> -- ["financial", "metric"] text STRING -- Chunk content (≤ 1000 chars) embedding ARRAY<FLOAT64> -- Vertex AI text-embedding-004 (1024 dims) created_at TIMESTAMP NOT NULL updated_at TIMESTAMP NOT NULL language STRING NOT NULL -- "es" -- YAML Front-Matter (required fields): -- id, namespace, type, title, version (SemVer), last_reviewed (ISO date), -- language, tags, metric_refs -- Conditional: condition+severity+action_hint (card), formula+unit+thresholds (metric), -- source (learning), metric_refs required (health)
// Consumed by RagOrchestrator in core-intelligence-conversation-api
RAGEmbeddingService
└── VertexEmbeddingClient.embed(userQuery)
→ ARRAY<FLOAT64> [1024 dims, text-embedding-004]
RAGVectorSearchService
└── BigQuery COSINE_DISTANCE against kb_embeddings
→ top-K chunks (KBChunkModel[])
RAGChunkRankingService.rank(chunks, query) → top-5 reranked
RAGLLMService.generateAnswer(query, chunks, options)
→ injection as <knowledge_base> context in the LLM prompt
// Embedding coherence: coach-api MUST use the same model
// (text-embedding-004) as the KB indexer. Changing the model
// requires re-indexing all 2,875 documents.
- 2,875 documents indexed and searchable in BigQuery kb_embeddings
- Go pipeline validates front-matter: required fields, unique IDs, metric_refs against 23-metric catalog
- Semantic search returns relevant chunks in <500ms via COSINE_DISTANCE
- Freshness report alerts on docs exceeding namespace thresholds (60/90/180 days)
- Namespace filter in RAGVectorSearchService scopes search by intent
- Re-indexing marks old chunks as stale when document is updated (is_current flag)
- LLM correctly cites KB information in responses (human-verifiable)
- [Ph 1] Contextual Retrieval: each chunk includes LLM-generated summary of its role in the full document before embedding. Runs once at indexing/re-indexing time, not per query. Search recall improves measurably vs baseline
- 2,875 documentos indexados y buscables en BigQuery kb_embeddings
- Pipeline Go valida front-matter: campos requeridos, IDs unicos, metric_refs contra catalogo de 23 metricas
- Busqueda semantica retorna chunks relevantes en <500ms via COSINE_DISTANCE
- Reporte de frescura alerta sobre docs que exceden umbrales por namespace (60/90/180 dias)
- Filtro de namespace en RAGVectorSearchService limita busqueda por intent
- Re-indexacion marca chunks viejos como stale cuando se actualiza el documento (flag is_current)
- LLM cita informacion del KB correctamente en respuestas (verificable por humano)
- [Ph 1] Contextual Retrieval: cada chunk incluye resumen generado por LLM de su rol en el documento completo antes de embeber. Se ejecuta una vez en indexacion/re-indexacion, no por query. El recall de busqueda mejora mediblemente vs baseline
2,875 docs · 11 namespaces · 23 metrics · Go 1.24.0 · Vertex AI 004 (1024 dims) · BigQuery · Automatic context (not a tool)
How It Works — Two Flows, One ContractComo Funciona — Dos Flujos, Un Contrato
INDEXING FLOW (Go pipeline) QUERY FLOW (coach-api)
core-knowledge-semantic-base/ core-intelligence-conversation-api/
================================ ================================
New/edited .md document User query
| |
v v
+---------------------------+ VertexEmbeddingClient.embed(query)
| 1. VALIDATE | → ARRAY<FLOAT64> [1024 dims]
| validate-kb.go | |
| - Front-matter complete | v
| - ID unique (KB-ns-slug)| BigQuery COSINE_DISTANCE
| - metric_refs valid | → kb_embeddings (top-K)
| - Word count ≤ 1500 | |
+---------------------------+ v
| RAGChunkRankingService
v → top-5 reranked chunks
+---------------------------+ |
| 2. CHUNK + JSONL | v
| indexer.go | Inject as <knowledge_base>
| - Split by paragraphs | context in LLM prompt
| - Limit: 1000 chars | (automatic — not a tool)
| - Output: JSONL |
+---------------------------+
|
v
+---------------------------+
| 2b. CONTEXTUAL ENRICH | Contextual Retrieval
| contextual-enricher.go | (Anthropic technique)
| - LLM summarizes chunk | +49-67% recall improvement
| role in full document |
| - Prepends context to |
| chunk before embedding |
+---------------------------+
|
v
+---------------------------+
| 3. EMBED | Contract: both sides use
| kb-embedder.go | text-embedding-004 (1024 dims)
| - Vertex AI 004 | Changing model = re-index all
| - Batch 100 chunks |
| - Insert BigQuery |
+---------------------------+
|
v
+---------------------------+
| 4. FRESHNESS REPORT |
| report-outdated-kb.go | Thresholds:
| - GitHub Actions weekly | health: 60d | learning: 180d
| - Alert stale docs | all others: 90d
+---------------------------+
The KB operates as two completely separate flows connected by a BigQuery contract. The Indexing Flow (Go pipeline in core-knowledge-semantic-base/) takes Markdown documents with YAML front-matter, validates structure (required fields, unique IDs, metric_refs against the 23-metric catalog, word count ≤1500), chunks by paragraphs (1000 char limit, no overlap), enriches each chunk with Contextual Retrieval — an LLM generates a summary of the chunk's role within the full document and prepends it to the chunk before embedding (e.g., a chunk saying "processing time is 1-3 business days" gets context that it belongs to the logistics section and describes the period between sale confirmation and dispatch), generates embeddings via Vertex AI text-embedding-004 in batches of 100, and inserts into BigQuery kb_embeddings. This contextual enrichment runs once at indexing time, not per query — Anthropic reports 49-67% improvement in retrieval recall with this technique. A weekly GitHub Actions job reports stale documents with differentiated thresholds (health 60d, learning 180d, others 90d). The Query Flow (in core-intelligence-conversation-api/) embeds the user's query with the same text-embedding-004 model, runs cosine similarity search in BigQuery, re-ranks the top-K chunks, and injects them into the LLM prompt as automatic context. The LLM never invokes the KB as a tool — it's always available in the system prompt.La KB opera como dos flujos completamente separados conectados por un contrato de BigQuery. El Flujo de Indexacion (pipeline Go en core-knowledge-semantic-base/) toma documentos Markdown con front-matter YAML, valida estructura (campos requeridos, IDs unicos, metric_refs contra catalogo de 23 metricas, word count ≤1500), divide por parrafos (limite 1000 chars, sin overlap), enriquece cada chunk con Contextual Retrieval — un LLM genera un resumen del rol del chunk dentro del documento completo y lo antepone al chunk antes de embeber (ej., un chunk que dice "el tiempo de gestion es de 1 a 3 dias habiles" recibe contexto de que pertenece a la seccion de logistica y describe el periodo entre confirmacion de venta y despacho), genera embeddings via Vertex AI text-embedding-004 en batches de 100, e inserta en BigQuery kb_embeddings. Este enriquecimiento contextual se ejecuta una vez en tiempo de indexacion, no por query — Anthropic reporta mejora del 49-67% en recall de recuperacion con esta técnica. Un job semanal de GitHub Actions reporta documentos vencidos con umbrales diferenciados (health 60d, learning 180d, otros 90d). El Flujo de Consulta (en core-intelligence-conversation-api/) embebe la query del usuario con el mismo modelo text-embedding-004, ejecuta busqueda de similitud coseno en BigQuery, re-rankea los top-K chunks, y los inyecta en el prompt del LLM como contexto automatico. El LLM nunca invoca la KB como tool — siempre esta disponible en el system prompt.
Implementation Plan (improvements over existing system)Plan de Implementacion (mejoras sobre sistema existente)
Phase 1: Namespace Filtering + Re-indexing (Week 3-4)Fase 1: Filtrado por Namespace + Re-indexacion (Semana 3-4)
Add namespace filter to RAGVectorSearchService — scope BigQuery search by namespace when the user's intent is clear (e.g., ads question → filter ads + learning). Add is_current flag to kb_embeddings schema. Implement re-indexing logic: when a document is edited, mark old chunks is_current=false, index new chunks. Update query to filter is_current=true. Contextual Retrieval: add LLM-generated context enrichment to the indexing pipeline — before embedding, each chunk receives a summary of its purpose within the full document (e.g., "this chunk belongs to the logistics section and explains the processing time between sale confirmation and dispatch"). The enriched chunk+context is what gets embedded. Runs once at index time per chunk, uses a lightweight LLM call. Anthropic reports 49-67% improvement in retrieval recall with this technique.Agregar filtro de namespace a RAGVectorSearchService — limitar busqueda BigQuery por namespace cuando el intent del usuario es claro (ej., pregunta de publicidad → filtrar ads + learning). Agregar flag is_current al schema de kb_embeddings. Implementar logica de re-indexacion: al editar un documento, marcar chunks viejos is_current=false, indexar nuevos chunks. Actualizar query para filtrar is_current=true. Contextual Retrieval: agregar enriquecimiento de contexto generado por LLM al pipeline de indexacion — antes de embeber, cada chunk recibe un resumen de su proposito dentro del documento completo (ej., "este chunk pertenece a la seccion de logistica y explica el tiempo de gestion entre confirmacion de venta y despacho"). El chunk enriquecido+contexto es lo que se embebe. Se ejecuta una vez por chunk en tiempo de indexacion, usa una llamada LLM ligera. Anthropic reporta mejora del 49-67% en recall de recuperacion con esta técnica.
Phase 2: New Namespaces + Content (Week 4-5)Fase 2: Nuevos Namespaces + Contenido (Semana 4-5)
Create skills/ namespace — documentation of how the agent should use and interpret each tool from Tool Registry (#3). Create trends/ namespace — marketplace trends, seasonal patterns, policy changes. Expose KB search as automatic context provider for Context Aggregator (#5) — KB is always available in the user prompt via RAG top-K semantic search. Validate LLM correctly cites KB information in responses.Crear namespace skills/ — documentación de como el agente debe usar e interpretar cada tool del Tool Registry (#3). Crear namespace trends/ — tendencias de marketplace, patrones estacionales, cambios de politica. Exponer busqueda de KB como proveedor de contexto automatico para Context Aggregator (#5) — KB siempre disponible en el user prompt via busqueda semantica RAG top-K. Validar que el LLM cite correctamente informacion del KB en respuestas.
Phase 3: Search Quality (post-MVP)Fase 3: Calidad de Busqueda (post-MVP)
Evaluate Voyage AI vs text-embedding-004 with a real benchmark suite of eCommerce domain queries. Add doc_type filter to search — allow filtering by card, metric, playbook when the intent justifies it. Evaluate hybrid search (BM25 + vector) for improved recall on exact technical terms (e.g., "ACOS", metric slugs). Consider Cohere Rerank as dedicated re-ranker if heuristic ranking is insufficient.Evaluar Voyage AI vs text-embedding-004 con suite de benchmark real de queries del dominio eCommerce. Agregar filtro de doc_type a la busqueda — permitir filtrar por card, metric, playbook cuando el intent lo justifica. Evaluar busqueda hibrida (BM25 + vector) para mejorar recall en terminos técnicos exactos (ej., "ACOS", slugs de metricas). Considerar Cohere Rerank como re-ranker dedicado si el ranking heuristico es insuficiente.
Risk AnalysisAnalisis de Riesgos
Stale KnowledgeConocimiento Desactualizado
Impact: High — outdated marketplace policies can harm the seller.Impacto: Alto — politicas de marketplace desactualizadas pueden perjudicar al vendedor.
Mitigation: report-outdated-kb.go runs weekly via GitHub Actions. Differentiated thresholds: health 60d, learning 180d, others 90d. SemVer versioning + last_reviewed field in every document.Mitigacion: report-outdated-kb.go corre semanalmente via GitHub Actions. Umbrales diferenciados: health 60d, learning 180d, otros 90d. Versionado SemVer + campo last_reviewed en cada documento.
Irrelevant Chunks Contaminate ContextChunks Irrelevantes Contaminan Contexto
Impact: Medium — semantic false positives degrade LLM reasoning quality.Impacto: Medio — falsos positivos semanticos degradan la calidad del razonamiento del LLM.
Mitigation: RAGChunkRankingService re-ranks before injection. Namespace filtering reduces search space. Future: dedicated re-ranker (Cohere Rerank) if heuristic is insufficient.Mitigacion: RAGChunkRankingService re-rankea antes de inyeccion. Filtrado por namespace reduce espacio de busqueda. Futuro: re-ranker dedicado (Cohere Rerank) si la heuristica es insuficiente.
Embedding Model DesynchronizationDesincronizacion del Modelo de Embedding
Impact: High — changing embedding model without re-indexing returns degraded results.Impacto: Alto — cambiar modelo de embedding sin re-indexar retorna resultados degradados.
Mitigation: Both repos MUST use text-embedding-004. Any model change requires re-indexing all 2,875 documents before deployment. Pipeline already supports full re-index.Mitigacion: Ambos repos DEBEN usar text-embedding-004. Cualquier cambio de modelo requiere re-indexar los 2,875 documentos antes del deployment. El pipeline ya soporta re-indexacion completa.
BigQuery Scalability at High VolumeEscalabilidad de BigQuery a Volumen Alto
Impact: Low at MVP — ~30K chunks estimated, well under 500ms.Impacto: Bajo en MVP — ~30K chunks estimados, muy por debajo de 500ms.
Mitigation: If corpus grows to >100K chunks, evaluate BigQuery VECTOR_SEARCH with ANN index or migrate to dedicated vector store (Pinecone, Weaviate). Migration is transparent if IVectorSearchRepository interface is maintained.Mitigacion: Si el corpus crece a >100K chunks, evaluar BigQuery VECTOR_SEARCH con indice ANN o migrar a vector store dedicado (Pinecone, Weaviate). Migracion transparente si se mantiene la interfaz IVectorSearchRepository.
Key DecisionsDecisiones Clave
Markdown + YAML in Git, not a CMS — Documents are .md files with YAML front-matter in a Git repo. Versioning is natural (diff per PR), team contribution has zero CMS overhead, and the indexing pipeline triggers on push. Adding or editing knowledge is a PR, not a database operation.Markdown + YAML en Git, no un CMS — Los documentos son archivos .md con front-matter YAML en un repo Git. El versionado es natural (diff por PR), la contribucion del equipo tiene cero overhead de CMS, y el pipeline de indexacion se dispara al hacer push. Agregar o editar conocimiento es un PR, no una operacion de base de datos.
Vertex AI text-embedding-004, not OpenAI — The project runs on Google Cloud (BigQuery, Vertex AI, GCP auth). Keeping the stack on a single provider reduces authentication complexity and egress costs. The product guide mentioned OpenAI text-embedding-3-small, but the real implementation uses text-embedding-004 (1024 dims). Evaluate Voyage AI post-MVP if relevance evidence warrants it.Vertex AI text-embedding-004, no OpenAI — El proyecto corre en Google Cloud (BigQuery, Vertex AI, GCP auth). Mantener el stack en un unico proveedor reduce complejidad de autenticacion y costos de egress. La guia de producto mencionaba OpenAI text-embedding-3-small, pero la implementacion real usa text-embedding-004 (1024 dims). Evaluar Voyage AI post-MVP si la evidencia de relevancia lo justifica.
BigQuery as vector store, not Pinecone/Weaviate — BigQuery already handles business analytics data. Adding vector search via COSINE_DISTANCE avoids a new service dependency. At MVP scale (~30K chunks), performance is more than sufficient. Migrate only if latency scales beyond >100K chunks.BigQuery como vector store, no Pinecone/Weaviate — BigQuery ya maneja datos de analytics del negocio. Agregar busqueda vectorial via COSINE_DISTANCE evita una nueva dependencia de servicio. A escala MVP (~30K chunks), el rendimiento es mas que suficiente. Migrar solo si la latencia escala mas alla de >100K chunks.
KB is automatic context, not a tool — Making it a tool would require the LLM to decide when to query the KB, add a round to the loop, and complicate the system prompt. As automatic context, the KB enriches every turn at zero round cost. The LLM doesn't need to ask for it — it's always there.La KB es contexto automatico, no una tool — Hacerla tool requeriria que el LLM decida cuando consultar la KB, sumaria un round al loop, y complicaria el system prompt. Como contexto automatico, la KB enriquece cada turno a costo cero de rounds. El LLM no necesita pedirla — siempre esta ahi.
11 namespaces as semantic segmentation — Namespace granularity allows filtering search by intent. An ads question searches ads + learning, not compliance or returns_claims. The index is better leveraged with namespace filters than with global search.11 namespaces como segmentacion semantica — La granularidad de namespaces permite filtrar busqueda por intent. Una pregunta de publicidad busca en ads + learning, no en compliance ni returns_claims. El indice se aprovecha mejor con filtros de namespace que con busqueda global.
MVP Scope
Existing 2,875 docs + Go pipeline operational. MVP adds: namespace filter in search, is_current re-indexing, skills/ namespace (tool documentation), trends/ namespace. Post-MVP: Voyage AI evaluation, hybrid search, doc_type filtering. 2,875 docs existentes + pipeline Go operacional. MVP agrega: filtro de namespace en busqueda, re-indexacion is_current, namespace skills/ (documentación de tools), namespace trends/. Post-MVP: evaluacion Voyage AI, busqueda hibrida, filtrado doc_type.
Inspired byInspirado en
Direct reuse from core-knowledge-semantic-base. Production-proven Go pipeline + BigQuery vector search. Reuso directo de core-knowledge-semantic-base. Pipeline Go probado en produccion + busqueda vectorial BigQuery.
📝 Project ChangelogChangelog del Proyecto
Data Sync
Data — Andres
TWO-PIPELINE DATA SYSTEM. The existing batch pipeline ("Complete Data") is preserved and extended — it remains the full historical data source per user, feeding Silver and Gold layers. Its extraction periodicity is configurable (default: 1h active products, 6h all). A NEW "Fast Data" layer exposes on-demand reads via FastAPI directly from Parquet files in GCS — no Redis, no intermediate cache. Fast Data serves the 11 tools defined in the #3 Tool Registry contract (10 READ + 1 ANALYSIS). Every write-tool execution requires a pre-read snapshot captured to GCS before changes. Both pipelines integrate with Open Metadata for lineage and data dictionaries. Both feed Cerebro KB (#9) via embedding sub-pipelines. Gold layer produces the "Brand Health" report — calculation rules migrate from a legacy project (TBD). Auth token management delegates to Marketplace Provider (#12). Infrastructure stays on GCP, managed as IaC in "#14 DevOps (IaC)". SISTEMA DE DATOS CON DOS PIPELINES. El pipeline batch existente ("Datos Completos") se conserva y extiende — sigue siendo la fuente de datos historica completa por usuario, alimentando capas Silver y Gold. Su periodicidad de extraccion es configurable (por defecto: 1h para productos activos, 6h para todos). Una NUEVA capa de "Datos Rapidos" expone lecturas on-demand via FastAPI directamente desde archivos Parquet en GCS — sin Redis, sin cache intermedia. Datos Rapidos sirve las 11 tools definidas en el contrato del Tool Registry (#3) (10 READ + 1 ANALYSIS). Cada ejecucion de tool de escritura requiere un snapshot pre-lectura capturado en GCS antes de los cambios. Ambos pipelines se integran con Open Metadata para linaje y diccionarios de datos. Ambos alimentan Cerebro KB (#9) via sub-pipelines de embeddings. La capa Gold produce el reporte "Brand Health" — las reglas de calculo se migran de un proyecto legacy (TBD). La gestion de tokens de auth se delega al Marketplace Provider (#12). La infraestructura se mantiene en GCP, gestionada como IaC en "#14 DevOps (IaC)".
Beautonomous governance: Fast Data pre-reads support Core's ConfirmationFlow preview step — before any WRITE executes, the current marketplace state is captured so the seller sees exactly what will change. This makes every confirmation dialog accurate and trustworthy.Governance de Beautonomous: las pre-lecturas de Datos Rápidos apoyan el paso de preview del ConfirmationFlow de Core — antes de ejecutar cualquier WRITE, el estado actual del marketplace se captura para que el vendedor vea exactamente qué cambiará. Esto hace que cada diálogo de confirmación sea preciso y confiable.
Two Pipelines ArchitectureArquitectura de Dos Pipelines
Fast Data (NEW) — read layer onlyDatos Rapidos (NUEVO) — solo capa de lectura
- FastAPI reads directly from Parquet files in GCS — no Redis, no intermediate cacheFastAPI lee directamente desde archivos Parquet en GCS — sin Redis, sin cache intermedia
- Contract: 10 READ + 1 ANALYSIS tools defined in #3 Tool RegistryContrato: 10 tools READ + 1 ANALYSIS definidas en #3 Tool Registry
- Pre-write snapshot: captures current Parquet state to GCS before any write-tool executesSnapshot pre-escritura: captura estado Parquet actual en GCS antes de cada tool de escritura
- Reads Bronze+Silver layers; snapshots stored under bronze/snapshots/Lee capas Bronze+Silver; snapshots almacenados en bronze/snapshots/
- Bronze embeddings → Cerebro KB (#9) → Context Aggregator (#5) → Orchestrator (#2)Embeddings Bronze → Cerebro KB (#9) → Context Aggregator (#5) → Orquestador (#2)
Complete Data (EXISTS — preserved)Datos Completos (EXISTE — se conserva)
- Batch pipeline per marketplace (Airflow DAGs, configurable — default 1h active, 6h all)Pipeline batch por marketplace (Airflow DAGs, configurable — por defecto 1h activos, 6h todos)
- Full historical data — required for Silver and Gold layersDatos historicos completos — necesarios para capas Silver y Gold
- Gold layer: Brand Health report (rules from legacy, TBD)Capa Gold: reporte Brand Health (reglas de legacy, TBD)
- Persistent data (no TTL)Datos persistentes (sin TTL)
- Gold embeddings → Cerebro KB (#9) → Context Aggregator (#5) → Orchestrator (#2)Embeddings Gold → Cerebro KB (#9) → Context Aggregator (#5) → Orquestador (#2)
EXISTS (MeLi partial, Shopify). Extend for Amazon + Fast DataEXISTE (MeLi parcial, Shopify). Extender para Amazon + Datos Rapidos
Both pipelines → Cerebro KB (#9) → #5 → #2Ambos pipelines → Cerebro KB (#9) → #5 → #2
Fast Data aligned with Tool Registry (#3), not #12 directlyDatos Rapidos alineados con Tool Registry (#3), no #12 directamente
Complete Data pipeline (batch)Pipeline Datos Completos (batch)
FastAPI reads Parquet from GCS (no Redis)FastAPI lee Parquet desde GCS (sin Redis)
Medallion: Bronze / Silver / GoldMedallion: Bronze / Silver / Gold
Legacy rules migration (TBD)Migracion reglas legacy (TBD)
Bronze → KB (#9) + Gold → KB (#9)Bronze → KB (#9) + Gold → KB (#9)
FastAPI — serves all 3 layersFastAPI — sirve las 3 capas
Tech Stack (GCP — IaC via #14 DevOps (IaC))Stack Tecnologico (GCP — IaC via #14 DevOps (IaC))
Data Models, API Signatures & Acceptance Criteria Modelos de Datos, APIs & Criterios de Aceptación
# ═══ COMPLETE DATA PIPELINE (Medallion — GCS) ═══
# BRONZE — Raw marketplace responses (per user, per marketplace)
# gs://shopilot-data/bronze/{marketplace}/{user_id}/{date}.parquet
# Retention: persistent (Coldline after 90d)
# SILVER — Normalized + validated (unified schemas)
products.parquet:
├── user_id, product_id, marketplace, title, price, stock
├── status, visits_30d, sales_30d, conversion_30d
├── health_score: float (0-100), last_updated, synced_at
# GOLD — Pre-computed aggregates + Brand Health
daily_summary.parquet:
├── user_id, date, marketplace
├── total/active/paused_products, total_orders, total_revenue
├── total_visits, avg_conversion_rate, out_of_stock_count
├── new_questions_count, competitor_price_changes
brand_health.parquet: # Rules from legacy (TBD)
├── user_id, marketplace, calculated_at
├── overall_score: float (0-100)
├── dimension_scores: { products, pricing, stock, reputation, ... }
├── alerts: [{ type, severity, message }]
# ═══ FAST DATA LAYER (FastAPI reads from Parquet — no Redis) ═══
# FastAPI reads directly from GCS Parquet files via pyarrow.
# No intermediate cache. Data freshness = last Complete Data sync cycle.
# Pre-write snapshot: gs://shopilot-data/bronze/snapshots/{tool}/{user_id}/{ts}.parquet
# Tool contract (#3 Tool Registry):
# READ : get_product, get_product_metrics, get_orders,
# get_buyer_questions, get_product_reviews,
# get_category_requirements, get_campaigns,
# get_campaign_metrics, get_store, get_store_metrics
# ANALYSIS: get_product_fee_estimate
# Data Lake Structure (GCS):
# gs://shopilot-data/
# ├── bronze/ Raw (batch) + pre-write snapshots
# │ └── snapshots/ Pre-write state captures (one Parquet per tool+ts)
# ├── silver/ Normalized unified
# ├── gold/ Brand Health + aggregations
# └── embeddings/ Generated for Cerebro KB (#9)
# Data API (FastAPI on Cloud Run) — serves ALL layers via pyarrow Parquet reads:
# Fast Data (reads from Bronze/Silver Parquet — contract per #3 Tool Registry):
# GET /data/{user_id}/fast/get_product?sku=X&marketplace=Y
# GET /data/{user_id}/fast/get_product_metrics?sku=X&marketplace=Y
# GET /data/{user_id}/fast/get_orders?from=DATE&to=DATE
# GET /data/{user_id}/fast/get_buyer_questions?sku=X
# GET /data/{user_id}/fast/get_product_reviews?sku=X
# GET /data/{user_id}/fast/get_category_requirements?category_id=X
# GET /data/{user_id}/fast/get_campaigns?marketplace=Y
# GET /data/{user_id}/fast/get_campaign_metrics?campaign_id=X
# GET /data/{user_id}/fast/get_store?marketplace=Y
# GET /data/{user_id}/fast/get_store_metrics?marketplace=Y
# GET /data/{user_id}/fast/get_product_fee_estimate?sku=X&price=N (ANALYSIS)
# GET /data/{user_id}/snapshot/{tool}/{ts} -> pre-write snapshot from GCS
# Silver layer (normalized):
# GET /data/{user_id}/products?status=active&marketplace=X
# GET /data/{user_id}/orders?from=DATE&to=DATE
# Gold layer (aggregated):
# GET /data/{user_id}/summary?days=30
# GET /data/{user_id}/brand-health -> Brand Health report
# GET /data/{user_id}/metrics/daily
# GET /data/{user_id}/anomalies
# Embedding triggers:
# POST /data/{user_id}/embed/fast -> Bronze fast data -> Cerebro KB
# POST /data/{user_id}/embed/health -> Gold Brand Health -> Cerebro KB
# Airflow DAGs (Complete Data — configurable period):
# meli_sync: configurable (default: 1h active, 6h all)
# amazon_sync: configurable (default: 1h active, 6h all)
# shopify_sync: configurable (default: 1h active, 6h all)
# brand_health: configurable (default: 1h post-sync)
# embedding_sync: configurable (default: 1h post brand_health)
# snapshot_cleanup: daily (removes pre-write snapshots older than 24h)
# openmetadata_sync: configurable (default: 6h, lineage + dictionaries)
- [Complete] MeLi + Amazon + Shopify DAGs run every hour without errors for 50 users
- [Complete] Brand Health Gold report computes correctly (legacy rules migrated)
- [Fast] On-demand read for any Marketplace Provider Tool responds in <500ms
- [Fast] Pre-write snapshot cached before every write Tool execution
- [Fast] TTL cleanup removes expired fast data without manual intervention
- [Both] Open Metadata lineage and data dictionaries generated for both pipelines
- [Both] Embedding sub-pipelines feed Cerebro KB (#9): Bronze fast → real-time, Gold → Brand Health
- [API] Data API serves Bronze, Silver, and Gold layers. <200ms p95 for Gold queries
- [Auth] Token management via Marketplace Provider (#12) — local Auth Vault scheme replaced
- [Completo] DAGs MeLi + Amazon + Shopify corren cada hora sin errores para 50 usuarios
- [Completo] Reporte Brand Health Gold se calcula correctamente (reglas legacy migradas)
- [Rapido] Lectura on-demand para cualquier Tool del Marketplace Provider responde en <500ms
- [Rapido] Snapshot pre-escritura cacheado antes de cada ejecucion de Tool de escritura
- [Rapido] Limpieza TTL elimina datos rapidos expirados sin intervencion manual
- [Ambos] Linaje y diccionarios Open Metadata generados para ambos pipelines
- [Ambos] Sub-pipelines de embeddings alimentan Cerebro KB (#9): Bronze rapido → tiempo real, Gold → Brand Health
- [API] Data API sirve capas Bronze, Silver y Gold. <200ms p95 para queries Gold
- [Auth] Gestion de tokens via Marketplace Provider (#12) — esquema local Auth Vault reemplazado
Complete: 1h sync · Fast: <500ms on-demand, TTL cleanup · API: all 3 layers, <200ms Gold · Embeddings: Bronze+Gold → KB
How It Works — Two PipelinesComo Funciona — Dos Pipelines
╔══════════════════════════════════════════════════════════════════╗
║ FAST DATA LAYER (read-only, on-demand via FastAPI) ║
╠══════════════════════════════════════════════════════════════════╣
║ No Redis. No intermediate cache. FastAPI reads Parquet (GCS). ║
║ Tools: #3 Tool Registry contract (10 READ + 1 ANALYSIS) ║
╚══════════════════════════════════════════════════════════════════╝
Tool Registry (#3) READ/ANALYSIS tool call
|
v
FastAPI Data API ──> pyarrow reads GCS Parquet (Bronze/Silver)
|
+──> Returns data directly to caller (<500ms)
|
| [write-tool path: pre-read before execution]
+──> Snapshot to GCS bronze/snapshots/{tool}/{user}/{ts}.parquet
| +── snapshot_cleanup DAG: daily (24h retention)
|
+──> Open Metadata (lineage + dictionary)
|
+──> Embedding Pipeline ──> Cerebro KB (#9)
#9 ──> Context Aggregator (#5) ──> Orchestrator (#2)
╔══════════════════════════════════════════════════════════════════╗
║ PIPELINE 2: COMPLETE DATA (batch, persistent) — EXISTS ║
╚══════════════════════════════════════════════════════════════════╝
Apache Airflow DAGs (1h active, 6h all)
|
+── Auth via Marketplace Provider (#12)
+── Extract ──> Transform ──> Load
|
v
DATA LAKE (GCS — Medallion)
+── bronze/ Raw marketplace responses (persistent)
+── silver/ Normalized unified schemas
+── gold/ Brand Health + aggregations
| +── brand_health.parquet (legacy rules TBD)
| +── daily_summary.parquet
| +── competitor_prices.parquet
|
+──> Open Metadata (lineage + dictionary)
|
+──> Embedding Pipeline ──> Cerebro KB (#9)
#9 ──> Context Aggregator (#5) ──> Orchestrator (#2)
╔══════════════════════════════════════════════════════════════════╗
║ DATA API (FastAPI on Cloud Run) — serves ALL layers ║
╠══════════════════════════════════════════════════════════════════╣
║ Bronze: /data/{user}/fast/{tool} (near real-time) ║
║ Silver: /data/{user}/products (normalized) ║
║ Gold: /data/{user}/brand-health (aggregated) ║
╚══════════════════════════════════════════════════════════════════╝
Data Sync has two complementary components. The Fast Data layer is a FastAPI service that reads Parquet files directly from GCS via pyarrow — no Redis, no intermediate cache. It exposes the 11 tools defined in the #3 Tool Registry contract (10 READ + 1 ANALYSIS). Before any write-tool executes, a snapshot of the current Parquet state is captured to GCS (bronze/snapshots/) for audit and rollback. The Complete Data pipeline (existing) runs Airflow DAGs with configurable periodicity (default 1h active, 6h all), feeding the full Medallion architecture. Gold layer produces the Brand Health report (rules migrating from legacy). Both components register lineage and data dictionaries in Open Metadata. Both generate embeddings: Bronze fast data feeds the Orchestrator with near-real-time context, Gold Brand Health feeds it with deep analytical context. Auth token management via Marketplace Provider (#12).Data Sync tiene dos componentes complementarios. La capa de Datos Rapidos es un servicio FastAPI que lee archivos Parquet directamente desde GCS via pyarrow — sin Redis, sin cache intermedia. Expone las 11 tools definidas en el contrato del Tool Registry (#3) (10 READ + 1 ANALYSIS). Antes de que cualquier tool de escritura se ejecute, se captura un snapshot del estado Parquet actual en GCS (bronze/snapshots/) para auditoria y rollback. El pipeline de Datos Completos (existente) ejecuta DAGs Airflow con periodicidad configurable (por defecto 1h activos, 6h todos), alimentando la arquitectura Medallion completa. La capa Gold produce el reporte Brand Health (reglas migrando de legacy). Ambos componentes registran linaje y diccionarios de datos en Open Metadata. Ambos generan embeddings: datos rapidos Bronze alimentan al Orquestador con contexto casi-en-tiempo-real, Brand Health Gold lo alimenta con contexto analitico profundo. Gestion de tokens via Marketplace Provider (#12).
Implementation PlanPlan de Implementacion
Phase 1: Fast Data Pipeline + Tool Alignment (Week 1-2)Fase 1: Pipeline Datos Rapidos + Alineacion con Tools (Semana 1-2)
Build the FastAPI Data API: exposes the 11 tools defined in #3 Tool Registry contract via direct pyarrow reads from GCS Parquet. No Redis. READ tools: get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics. ANALYSIS: get_product_fee_estimate. Pre-write snapshot: write current Parquet state to bronze/snapshots/ before every write-tool. snapshot_cleanup DAG (daily, 24h retention). Auth token management via Marketplace Provider (#12).Construir FastAPI Data API: expone las 11 tools definidas en el contrato del Tool Registry (#3) via lecturas directas con pyarrow desde Parquet en GCS. Sin Redis. Tools READ: get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics. ANALYSIS: get_product_fee_estimate. Snapshot pre-escritura: escribe estado Parquet actual en bronze/snapshots/ antes de cada tool de escritura. DAG snapshot_cleanup (diario, retencion 24h). Gestion de tokens via Marketplace Provider (#12).
Phase 2: Verify + Adapt Complete Data DAGs (Week 2-3)Fase 2: Verificar + Adaptar DAGs de Datos Completos (Semana 2-3)
Verify existing MeLi DAGs still work in production, adapt from daily batch to hourly incremental. Add Amazon SP-API + Shopify GraphQL DAGs following same pipeline structure. Migrate Auth Vault token resolution to Marketplace Provider scheme. Validate Bronze layer Parquet output for all 3 marketplaces.Verificar que DAGs existentes de MeLi aun funcionan en produccion, adaptar de batch diario a incremental cada hora. Agregar DAGs de Amazon SP-API + Shopify GraphQL siguiendo misma estructura de pipeline. Migrar resolucion de tokens de Auth Vault al esquema del Marketplace Provider. Validar salida Parquet de capa Bronze para los 3 marketplaces.
Phase 3: Silver + Gold + Brand Health (Week 4-5)Fase 3: Silver + Gold + Brand Health (Semana 4-5)
Silver layer: unify all marketplace data into normalized schemas. Gold layer: Brand Health report — analyze legacy project and migrate calculation rules (overall score, dimension scores, alerts). Bronze+Silver for Fast Data with temporal cleanup strategy. Pre-compute daily_summary and competitor_prices aggregations.Capa Silver: unificar todos los datos de marketplace en schemas normalizados. Capa Gold: reporte Brand Health — analizar proyecto legacy y migrar reglas de calculo (score general, scores por dimension, alertas). Bronze+Silver para Datos Rapidos con estrategia de limpieza temporal. Pre-computar agregaciones de daily_summary y competitor_prices.
Phase 4: Open Metadata + Embeddings + API (Week 6-7)Fase 4: Open Metadata + Embeddings + API (Semana 6-7)
Integrate Open Metadata for both pipelines: lineage tracking and data dictionaries. Build embedding sub-pipelines: Bronze fast data → Cerebro KB for real-time context, Gold Brand Health → Cerebro KB for analytical context. Extend Data API to serve all 3 layers (Bronze fast reads, Silver normalized, Gold aggregated). Redis cache for API with 1h TTL.Integrar Open Metadata para ambos pipelines: tracking de linaje y diccionarios de datos. Construir sub-pipelines de embeddings: datos rapidos Bronze → Cerebro KB para contexto en tiempo real, Brand Health Gold → Cerebro KB para contexto analitico. Extender Data API para servir las 3 capas (lecturas rapidas Bronze, Silver normalizado, Gold agregado). Cache Redis para API con TTL de 1h.
Risk AnalysisAnalisis de Riesgos
Fast Data TTL MisconfigurationMisconfiguracion TTL de Datos Rapidos
Impact: HighImpacto: Alto
Mitigation: TTL too short = excessive API calls to marketplace (rate limiting risk). TTL too long = stale data served as "real-time". Default 15min with per-data-type overrides. Monitor cache hit ratio — target >70%. Cleanup DAG runs every 15min to enforce TTL expiration. Pre-write snapshots have separate TTL (24h) for auditability.Mitigacion: TTL muy corto = llamadas excesivas a API de marketplace (riesgo de rate limiting). TTL muy largo = datos obsoletos servidos como "tiempo real". Default 15min con overrides por tipo de dato. Monitorear ratio de cache hit — objetivo >70%. DAG de limpieza corre cada 15min para forzar expiracion TTL. Snapshots pre-escritura tienen TTL separado (24h) para auditabilidad.
Brand Health Legacy MigrationMigracion Legacy de Brand Health
Impact: MediumImpacto: Medio
Mitigation: Legacy calculation rules are defined and tested in TypeScript — structured clearly by dimension. Not undocumented. TS source files will be provided as input resources. Approach: spike analysis of legacy TS code as first step of Phase 3, document rules per dimension (products, pricing, stock, reputation), implement in Gold layer, validate parity with legacy output before replacing.Mitigacion: Las reglas de calculo legacy estan definidas y probadas en TypeScript — estructuradas claramente por dimension. No estan sin documentar. Los archivos fuente TS se proveeran como recursos de entrada. Enfoque: spike de analisis del codigo legacy TS como primer paso de la Fase 3, documentar reglas por dimension (productos, pricing, stock, reputacion), implementar en capa Gold, validar paridad con output legacy antes de reemplazar.
Embedding Pipeline LatencyLatencia del Pipeline de Embeddings
Impact: MediumImpacto: Medio
Mitigation: Embedding generation adds latency to both pipelines. For Fast Data: generate embeddings async (don't block the read response). For Complete Data: embeddings run as a post-sync DAG step. If embedding service is down, data pipelines continue — embeddings are eventually consistent.Mitigacion: La generacion de embeddings agrega latencia a ambos pipelines. Para Datos Rapidos: generar embeddings async (no bloquear la respuesta de lectura). Para Datos Completos: embeddings corren como paso post-sync del DAG. Si el servicio de embeddings cae, los pipelines de datos continuan — los embeddings son eventualmente consistentes.
Sync Failure Corrupting DataFalla de Sync Corrompiendo Datos
Impact: HighImpacto: Alto
Mitigation: Append-only writes — a failed sync never overwrites previous data. Each Parquet file is dated and immutable. Airflow retries with exponential backoff (3 attempts). Alert on 3 consecutive failures.Mitigacion: Escrituras append-only — un sync fallido nunca sobreescribe datos anteriores. Cada archivo Parquet tiene fecha y es inmutable. Airflow reintenta con backoff exponencial (3 intentos). Alertar en 3 fallas consecutivas.
Key DecisionsDecisiones Clave
Two pipelines, not one — The existing batch pipeline cannot serve real-time Tool reads. A separate Fast Data pipeline handles on-demand reads aligned with Marketplace Provider Tools. The batch pipeline is preserved as the authoritative historical source for Silver and Gold layers.Dos pipelines, no uno — El pipeline batch existente no puede servir lecturas en tiempo real de Tools. Un pipeline separado de Datos Rapidos maneja lecturas on-demand alineadas con los Tools del Marketplace Provider. El pipeline batch se conserva como la fuente historica autoritativa para capas Silver y Gold.
Pre-read before every write Tool — Every write operation in Marketplace Provider requires caching current state first. This enables audit trails (before/after), rollback capability, and feeds the ConfirmationFlow preview in the Orchestrator (#2).Pre-lectura antes de cada Tool de escritura — Cada operacion de escritura en Marketplace Provider requiere cachear el estado actual primero. Esto habilita trails de auditoria (antes/despues), capacidad de rollback, y alimenta el preview del ConfirmationFlow en el Orquestador (#2).
Embeddings as a cross-cutting output — Both pipelines produce embeddings for Cerebro KB. Bronze fast data gives the Orchestrator "real-time" simple context. Gold Brand Health gives it deep analytical context. This separation ensures the agent always has both fresh and historical data.Embeddings como output transversal — Ambos pipelines producen embeddings para Cerebro KB. Los datos rapidos Bronze dan al Orquestador contexto simple "en tiempo real". El Brand Health Gold le da contexto analitico profundo. Esta separacion asegura que el agente siempre tenga datos frescos e historicos.
Fast Data aligned with Tool Registry (#3), not Marketplace Provider (#12) directly — The Fast Data pipeline reads are aligned 1:1 with Tools defined in Tool Registry (#3). This decouples Data Sync from specific marketplace adapters. #3 defines what Tools exist; #10 generates the corresponding fast queries.Datos Rapidos alineados con Tool Registry (#3), no Marketplace Provider (#12) directamente — Las lecturas del pipeline de Datos Rapidos estan alineadas 1:1 con los Tools definidos en Tool Registry (#3). Esto desacopla Data Sync de los adaptadores de marketplace especificos. #3 define que Tools existen; #10 genera las consultas rapidas correspondientes.
Open Metadata for governance — Both pipelines register lineage and data dictionaries in Open Metadata. This provides visibility into data flow, schema evolution, and dependencies across the system.Open Metadata para gobernanza — Ambos pipelines registran linaje y diccionarios de datos en Open Metadata. Esto provee visibilidad del flujo de datos, evolucion de schemas, y dependencias a traves del sistema.
MVP Scope
Fast Data pipeline aligned with #12 Tools + pre-write snapshots. Complete Data DAGs (MeLi+Amazon+Shopify). Brand Health in Gold (legacy rules TBD). Embedding sub-pipelines to Cerebro KB. API serving all 3 layers. Open Metadata integration. Pipeline Datos Rapidos alineado con Tools de #12 + snapshots pre-escritura. DAGs Datos Completos (MeLi+Amazon+Shopify). Brand Health en Gold (reglas legacy TBD). Sub-pipelines de embeddings a Cerebro KB. API sirviendo las 3 capas. Integracion Open Metadata.
Inspired byInspirado en
Data Orchestrator (existing). Marketplace Provider Tool alignment. Legacy Brand Health system. Orquestador de Datos (existente). Alineacion con Tools del Marketplace Provider. Sistema legacy de Brand Health.
📝 Project ChangelogChangelog del Proyecto
Enrichment Layer
Knowledge — Mateo
External capabilities gateway for the Coach. Two domains: Market Intelligence (competitors, pricing, keywords) + Content Analysis (image analysis, video analysis, image enhancement). 7 of 8 ANALYSIS tools depend on this service. The Tool Registry does not know which external API is behind each tool — it only knows IEnrichmentService. Adding a new adapter does not require changes to Tool Registry or Coach. Repo: core-knowledge-enrichment.
Gateway de capacidades externas del Coach. Dos dominios: Market Intelligence (competidores, precios, keywords) + Content Analysis (analisis de imagen, analisis de video, mejora de imagen). 7 de 8 ANALYSIS tools dependen de este servicio. El Tool Registry no sabe que API externa esta detras de cada tool — solo conoce IEnrichmentService. Agregar un nuevo adapter no requiere cambios en Tool Registry ni Coach. Repo: core-knowledge-enrichment.
Beautonomous governance: Enrichment data feeds ANALYSIS tools only — read-only, never triggering ConfirmationFlow. Tool Registry's ToolPolicyFilter ensures all ANALYSIS tools sourced from Enrichment are gated by Core's permission matrix before the Coach can invoke them.Governance de Beautonomous: los datos de Enrichment alimentan solo tools ANALYSIS — siempre de solo lectura, nunca activan el ConfirmationFlow. El ToolPolicyFilter del Tool Registry garantiza que todas las tools ANALYSIS provenientes de Enrichment estén controladas por la matriz de permisos de Core antes de que el Coach las invoque.
IEnrichmentService — internal router + cache, single contract with Tool RegistryIEnrichmentService — router interno + cache, contrato unico con Tool Registry
MeLi Search API + Items API (public, free)MeLi Search API + Items API (publica, gratuita)
Rainforest API (Amazon proxy) + Amazon Ads APIRainforest API (proxy Amazon) + Amazon Ads API
Claude Vision / GPT-4V for image and video analysisClaude Vision / GPT-4V para analisis de imagen y video
Specialized APIs (Magnific, Topaz) for image enhancementAPIs especializadas (Magnific, Topaz) para mejora de imagen
Mandatory cache with TTL per data typeCache obligatorio con TTL por tipo de dato
Current StateEstado Actual
OperationalOperativo
Nothing — new project.Nada — proyecto nuevo.
To BuildPor Construir
IEnrichmentService + EnrichmentService (Phase 1). MeliMarketIntelligenceAdapter (Phase 1). VisionLLMContentAdapter (Phase 1). RedisEnrichmentCache (Phase 1). AmazonMarketIntelligenceAdapter (Phase 2). ExternalEnhancementAdapter (Phase 2). get_keyword_data (Phase 2).IEnrichmentService + EnrichmentService (Fase 1). MeliMarketIntelligenceAdapter (Fase 1). VisionLLMContentAdapter (Fase 1). RedisEnrichmentCache (Fase 1). AmazonMarketIntelligenceAdapter (Fase 2). ExternalEnhancementAdapter (Fase 2). get_keyword_data (Fase 2).
Not This ProjectNo Es Este Proyecto
Seller data (orders, stock, metrics) → Data Sync (#10). WRITE operations on marketplace → Marketplace Provider (#12). Image generation from scratch → out of scope. Seller marketplace authentication → Marketplace Provider (#12 — TokenManager).Datos del vendedor (ordenes, stock, metricas) → Data Sync (#10). Operaciones WRITE en marketplace → Marketplace Provider (#12). Generacion de imagenes desde cero → fuera del plan. Autenticacion con marketplace del vendedor → Marketplace Provider (#12 — TokenManager).
Tech Stack (TypeScript — Data Layer)Stack Tecnologico (TypeScript — Capa de Datos)
Deep SpecSpec Detallada
interface IEnrichmentService {
executeAnalysis(tool: AnalysisTool, params: Record<string, unknown>): Promise<EnrichmentResult>;
}
type AnalysisTool =
| 'search_market_products' | 'get_competitor_product'
| 'get_market_pricing' | 'get_keyword_data'
| 'analyze_product_image' | 'enhance_product_image'
| 'analyze_product_video';
interface IMarketIntelligenceAdapter {
searchProducts(params: MarketSearchParams): Promise<MarketProduct[]>;
getProductDetail(externalId: string, marketplace: Marketplace): Promise<MarketProduct>;
getKeywordData(keyword: string, marketplace: Marketplace): Promise<KeywordData>;
}
interface IContentAnalysisAdapter {
analyzeImage(imageUrl: string, context: ImageAnalysisContext): Promise<ImageAnalysisResult>;
enhanceImage(imageUrl: string, params: EnhancementParams): Promise<EnhancementResult>;
analyzeVideo(videoUrl: string, context: VideoAnalysisContext): Promise<VideoAnalysisResult>;
}
// + MarketProduct, MarketSearchParams, KeywordData,
// ImageAnalysisResult, ImageIssue, EnhancementParams,
// EnhancementResult, VideoAnalysisResult, EnrichmentCacheConfig
// Internal invocation by Tool Registry (#3) — NO public REST endpoint
// Tool Registry calls IEnrichmentService.executeAnalysis()
// for the 7 ANALYSIS tools
executeAnalysis(tool: AnalysisTool, params: Record<string, unknown>)
→ Promise<EnrichmentResult>
// EnrichmentResult { data, source, cached, latencyMs }
- 7 ANALYSIS tools resolve via IEnrichmentService (only
get_product_fee_estimategoes direct)7 ANALYSIS tools se resuelven via IEnrichmentService (sologet_product_fee_estimateva directo) - Mandatory Redis cache with TTL per tool (15min search, 30min detail, 1h image/video, 24h keywords, 0 enhance)Cache Redis obligatorio con TTL por tool (15min busqueda, 30min detalle, 1h imagen/video, 24h keywords, 0 enhance)
- MeLi adapter works without OAuth (public Search API)Adapter MeLi funciona sin OAuth (Search API publica)
- External provider failure → EnrichmentResult with error, Coach reasons about it, never blocks responseSi proveedor externo falla → EnrichmentResult con error, Coach razona al respecto, nunca bloquea respuesta
- Adding a new adapter does not require changing Tool Registry or CoachAgregar nuevo adapter no requiere cambiar Tool Registry ni Coach
enhance_product_imageenhances real photos, does NOT generate from scratchenhance_product_imagemejora fotos reales, NO genera desde cero- External API credentials in SSM (not Marketplace Provider — those are seller OAuth tokens managed by TokenManager)Credenciales de APIs externas en SSM (no Marketplace Provider — esos son tokens OAuth del vendedor gestionados por TokenManager)
get_market_pricingcomputes distribution (min, max, p25, p75, median) over search results, not an external APIget_market_pricingcalcula distribucion (min, max, p25, p75, median) sobre resultados de busqueda, no es API externa
How It WorksComo Funciona
Coach (LLM loop)
| needs external data
v
Tool Registry (#3) -> handler ANALYSIS tool
|
v
IEnrichmentService.executeAnalysis(tool, params)
|
+-- RedisEnrichmentCache -> hit? return cached
|
+-- Market Intelligence --> MeLi Search API (free)
| --> Rainforest API (Amazon)
| --> Amazon Ads API / Helium 10
|
+-- Content Analysis --> Vision LLM (Claude / GPT-4V)
--> Enhancement API (Magnific, Topaz)
|
v
EnrichmentResult { data, source, cached, latencyMs }
File StructureEstructura de Archivos
core-knowledge-enrichment/ +-- src/ | +-- domain/interfaces/ | | +-- IMarketIntelligenceAdapter.ts | | +-- IContentAnalysisAdapter.ts | | +-- IEnrichmentService.ts | +-- domain/models/ | | +-- MarketProduct.ts, KeywordData.ts | | +-- ImageAnalysisResult.ts, VideoAnalysisResult.ts | | +-- EnrichmentResult.ts | +-- application/ | | +-- EnrichmentService.ts (router + cache) | +-- infrastructure/ | +-- market/ (MeliAdapter, AmazonAdapter) | +-- content/ (VisionLLMAdapter, EnhancementAdapter) | +-- cache/ (RedisEnrichmentCache.ts) +-- test/
Implementation PlanPlan de Implementacion
Phase 0 — ResearchFase 0 — Investigacion
Evaluate platforms and providers before writing code. Market Intelligence: compare MeLi Search API (free, rate limits?), Rainforest API (pricing tiers, reliability), Amazon Ads API (access, latency), Helium 10 / Jungle Scout (API availability, cost). Content Analysis: compare Claude Vision vs GPT-4V (cost per image, accuracy on marketplace photos), evaluate image enhancement APIs (Magnific AI, Topaz Photo AI, Remove.bg — pricing, quality, latency). Cache: confirm Redis (Cloud Memorystore) specs and pricing for required TTLs. Deliverable: comparison table with recommendation per domain + estimated monthly cost.Evaluar plataformas y proveedores antes de escribir codigo. Market Intelligence: comparar MeLi Search API (free, rate limits?), Rainforest API (pricing tiers, reliability), Amazon Ads API (acceso, latencia), Helium 10 / Jungle Scout (disponibilidad API, costo). Content Analysis: comparar Claude Vision vs GPT-4V (costo por imagen, accuracy en fotos de marketplace), evaluar APIs de mejora de imagen (Magnific AI, Topaz Photo AI, Remove.bg — pricing, calidad, latencia). Cache: confirmar Redis (Cloud Memorystore) specs y pricing para TTLs requeridos. Entregable: tabla comparativa con recomendacion por dominio + costo estimado mensual.
Phase 1 — First ANALYSIS tools activeFase 1 — Primeras ANALYSIS tools activas
MeliMarketIntelligenceAdapter (MeLi Search + Items API, no cost). VisionLLMContentAdapter (analyzeImage + analyzeVideo via Claude Vision). Redis cache with basic TTL. Tools operative: search_market_products, get_competitor_product, get_market_pricing, analyze_product_image, analyze_product_video.MeliMarketIntelligenceAdapter (MeLi Search + Items API, sin costo). VisionLLMContentAdapter (analyzeImage + analyzeVideo via Claude Vision). Redis cache con TTL basico. Tools operativas: search_market_products, get_competitor_product, get_market_pricing, analyze_product_image, analyze_product_video.
Phase 2 — Amazon + EnhancementFase 2 — Amazon + Mejora
AmazonMarketIntelligenceAdapter (Rainforest API). get_keyword_data operative (Amazon Ads API or Helium 10). ExternalEnhancementAdapter (enhance_product_image via external API). Cache with key normalization.AmazonMarketIntelligenceAdapter (Rainforest API). get_keyword_data operativo (Amazon Ads API o Helium 10). ExternalEnhancementAdapter (enhance_product_image via API externa). Cache con normalizacion de keys.
Phase 3+ — ExtensibilityFase 3+ — Extensibilidad
Support for new adapters via registry (no EnrichmentService modification). Rate limiting per provider. Fallback between providers.Soporte para nuevos adapters via registro (sin modificar EnrichmentService). Rate limiting por proveedor. Fallback entre proveedores.
Risk AnalysisAnalisis de Riesgo
External APIs with variable latency (200ms–3s) — Mandatory cache reduces calls, TTL per data volatility. On failure → clear error, Coach continues.APIs externas con latencia variable (200ms–3s) — Cache obligatorio reduce llamadas, TTL por volatilidad del dato. Si falla → error claro, Coach continua.
Paid API costs scale with usage — Phase 0 Research evaluates pricing. MeLi is free. Rainforest and image APIs have tiers. Monitor with Billing & Credit Economy (#13).Costos de APIs de pago escalan con uso — Phase 0 Research evalua pricing. MeLi es gratuita. Rainforest y APIs de imagen tienen tiers. Monitorear con Billing & Credit Economy (#13).
Inconsistent visual analysis quality between LLMs — Phase 0 Research compares Claude Vision vs GPT-4V on real marketplace photos.Calidad de analisis visual inconsistente entre LLMs — Phase 0 Research compara Claude Vision vs GPT-4V en fotos de marketplace reales.
Key DecisionsDecisiones Clave
D1: One project, two domains (market + content) — same pattern (adapters, cache, routing), same client (Tool Registry).Un proyecto, dos dominios (market + content) — mismo patron (adapters, cache, routing), mismo cliente (Tool Registry).
D2: Enhancement vs generation — enhance_product_image enhances real photos, does NOT generate from scratch (product decision).Mejora vs generacion — enhance_product_image mejora fotos reales, NO genera desde cero (decision de producto).
D3: The Coach decides, Enrichment provides — never makes business decisions, only returns data.El Coach decide, el Enrichment provee — nunca toma decisiones de negocio, solo devuelve datos.
D4: Credentials separate from Marketplace Provider — external APIs (vision LLM, market intelligence) use SSM directly; seller OAuth tokens are managed by TokenManager in #12.Credenciales separadas del Marketplace Provider — APIs externas (vision LLM, market intelligence) usan SSM directamente; tokens OAuth del vendedor son gestionados por TokenManager en #12.
D5: Mandatory cache as part of the contract — Tool Registry can call without worrying about latency.Cache obligatorio como parte del contrato — Tool Registry puede llamar sin preocuparse de latencia.
MVP Scope
Phase 0 Research + Phase 1: MeLi market intelligence + image/video analysis via Claude Vision + Redis cache. 5 of 7 ANALYSIS tools operative. Phase 0 Research + Fase 1: MeLi market intelligence + analisis de imagen/video via Claude Vision + Redis cache. 5 de 7 ANALYSIS tools operativas.
SourceFuente
New project — no existing source. Proyecto nuevo — sin fuente existente.
📝 Project ChangelogChangelog del Proyecto
Layer 4 — ACTIONCapa 4 — ACCIÓN
What the Coach can do in the marketplaceLo que el Coach puede hacer en el marketplace
Marketplace Provider
Action — Andrés
Absorbs former #10 Auth & Credentials Vault — token management is now an internal moduleAbsorbe el antiguo #10 Auth & Credentials Vault — gestion de tokens es ahora un modulo interno
Unified execution and credential management layer for all marketplace operations. Abstracts marketplace APIs behind a single IMarketplaceAdapter contract using Strategy pattern — each marketplace is a pluggable adapter implementing the same interface. Each request carries two fields: userId and marketplaceSlug — the adapter resolves OAuth2 tokens internally via an ITokenManager module. No auth token ever crosses the public interface. DynamoDB stores encrypted OAuth2 credentials (AES-256-GCM), AWS Secrets Manager holds static secrets (client_id, client_secret, encryption keys). A cron refreshes tokens proactively 30min before expiry. MVP ships all 3 marketplaces from Phase 1: MeLi REST + Amazon SP-API + Shopify GraphQL, covering 17 write tools across 4 domains (Catalog, Engagement, Advertising, Enrollment). SKU is the primary identifier — each adapter internally resolves SKU to the marketplace-native product ID. Reads are limited to capturing pre-transaction state for rollback; the primary read path lives in Data Sync (#10). Includes explicit onboarding flow for first-time marketplace connection.
Capa unificada de ejecucion y gestion de credenciales para todas las operaciones de marketplace. Abstrae las APIs de marketplaces detras de un solo contrato IMarketplaceAdapter usando patron Strategy — cada marketplace es un adaptador pluggable que implementa la misma interfaz. Cada request lleva dos campos: userId y marketplaceSlug — el adaptador resuelve tokens OAuth2 internamente via un modulo ITokenManager. Ningun token cruza la interfaz publica. DynamoDB almacena credenciales OAuth2 encriptadas (AES-256-GCM), AWS Secrets Manager guarda secretos estaticos (client_id, client_secret, keys de encriptacion). Un cron renueva tokens proactivamente 30min antes de expirar. El MVP incluye los 3 marketplaces desde Fase 1: MeLi REST + Amazon SP-API + Shopify GraphQL, cubriendo 17 write tools en 4 dominios (Catalogo, Engagement, Advertising, Enrolamiento). SKU es el identificador primario — cada adaptador resuelve internamente SKU al ID nativo del marketplace. Las lecturas se limitan a capturar el estado pre-transaccion para rollback; la lectura principal vive en Data Sync (#10). Incluye flujo de onboarding explicito para la primera conexion al marketplace.
Beautonomous governance: all 17 WRITE tools execute through Core's ConfirmationFlow (PENDING → CONFIRMED/REJECTED/EXPIRED) and are gated by Core's permission matrix — El Artesano proposes, El Mago or El Capitán confirms. No marketplace write operation can execute without an explicit human confirmation.Governance de Beautonomous: las 17 WRITE tools se ejecutan a través del ConfirmationFlow de Core (PENDING → CONFIRMED/REJECTED/EXPIRED) y están controladas por la matriz de permisos de Core — El Artesano propone, El Mago o El Capitán confirman. Ninguna operación WRITE en marketplace puede ejecutarse sin una confirmación humana explícita.
Raw HTTP — no SDK availableHTTP directo — no hay SDK
@sp-api-sdk + LWA OAuth2@sp-api-sdk + LWA OAuth2
@shopify/shopify-api (GraphQL)@shopify/shopify-api (GraphQL)
Strategy — 23 methodsStrategy — 23 metodos
DynamoDB + AES-256 (ex #10)DynamoDB + AES-256 (ex #10)
MeLi + Amazon LWA + ShopifyMeLi + Amazon LWA + Shopify
SKU → marketplace product IDSKU → ID del marketplace
Connect → OAuth → first syncConectar → OAuth → primer sync
Recommended Tech StackStack Tecnologico Recomendado
MeLi has no maintained SDK (archived 2022) — raw HTTP via axios. Amazon @sp-api-sdk is the most active TS SDK. Shopify REST deprecated Oct 2024 — GraphQL only. TypeScript chosen for consistency with core-intelligence services and strong typing of adapter interfaces.MeLi no tiene SDK mantenido (archivado 2022) — HTTP directo via axios. Amazon @sp-api-sdk es el SDK TS mas activo. Shopify REST deprecado Oct 2024 — solo GraphQL. TypeScript elegido por consistencia con servicios core-intelligence y tipado fuerte de interfaces de adaptadores.
Data Models, API Contracts & Acceptance Criteria Modelos de Datos, Contratos de API & Criterios de Aceptación
// MarketplaceRequest — NO auth_token (adapter resolves internally via TokenManager)
interface MarketplaceRequest {
userId: string; // Shopilot user ID
marketplaceSlug: MarketplaceSlug; // 'mercadolibre' | 'amazon' | 'shopify'
}
interface NormalizedProduct {
sku: string; // Primary key — seller's SKU
productId: string; // Marketplace-native (MLA123, ASIN, GID)
marketplace: MarketplaceSlug;
country: string; // ISO 3166-1 alpha-2
title: string;
description: string;
price: Money; // { amount: number, currency: string }
stock: number;
condition: 'new' | 'used' | 'refurbished';
status: 'active' | 'paused' | 'closed';
category: Category; // { id, name, path }
images: string[];
video: string | null;
attributes: Record<string, unknown>;
url: string;
raw: Record<string, unknown>; // Raw unnormalized response
lastSynced: Date;
}
interface MarketplaceAction {
id: string; // UUID
userId: string;
sku: string;
marketplace: MarketplaceSlug;
actionType: 'create' | 'update' | 'delete';
domain: 'catalog' | 'engagement' | 'advertising' | 'enrollment';
fieldChanged: string | null;
beforeValue: unknown; // Snapshot from cache for rollback
afterValue: unknown;
riskLevel: 'reversible' | 'irreversible';
rollbackToken: string | null;
status: 'pending' | 'confirmed' | 'executed' | 'rolled_back' | 'failed';
executedAt: Date;
executionTimeMs: number;
apiResponseCode: number;
}
interface IMarketplaceAdapter {
// —— ENROLLMENT (3) ——
connectMarketplace(req: MarketplaceRequest, credentials: OAuthTokens): Promise<ConnectionResult>;
disconnectMarketplace(req: MarketplaceRequest): Promise<ConnectionResult>;
getConnectionStatus(req: MarketplaceRequest): Promise<ConnectionStatus>;
// —— CATALOG — Create/Modify/Delete (9) ——
publishProduct(req: MarketplaceRequest, sku: string, draft: ProductDraft): Promise<MarketplaceAction>; // IRREVERSIBLE
updateProductContent(req: MarketplaceRequest, sku: string, content: ContentUpdate): Promise<MarketplaceAction>; // REVERSIBLE
updateProductImages(req: MarketplaceRequest, sku: string, images: string[]): Promise<MarketplaceAction>; // REVERSIBLE
updateProductVideo(req: MarketplaceRequest, sku: string, video: string): Promise<MarketplaceAction>; // REVERSIBLE
updatePrice(req: MarketplaceRequest, sku: string, price: Money): Promise<MarketplaceAction>; // REVERSIBLE
updateStock(req: MarketplaceRequest, sku: string, qty: number, locationId?: string): Promise<MarketplaceAction>; // REVERSIBLE
pauseProduct(req: MarketplaceRequest, sku: string): Promise<MarketplaceAction>; // REVERSIBLE
activateProduct(req: MarketplaceRequest, sku: string): Promise<MarketplaceAction>; // REVERSIBLE
closeProduct(req: MarketplaceRequest, sku: string): Promise<MarketplaceAction>; // IRREVERSIBLE
// —— ENGAGEMENT (4) ——
answerQuestion(req: MarketplaceRequest, questionId: string, answer: string): Promise<MarketplaceAction>; // IRREVERSIBLE
hideQuestion(req: MarketplaceRequest, questionId: string): Promise<MarketplaceAction>; // REVERSIBLE (MeLi)
sendBuyerMessage(req: MarketplaceRequest, orderId: string, msg: string): Promise<MarketplaceAction>; // IRREVERSIBLE
requestReview(req: MarketplaceRequest, orderId: string): Promise<MarketplaceAction>; // IRREVERSIBLE
// —— ADVERTISING (4) ——
createCampaign(req: MarketplaceRequest, draft: CampaignDraft): Promise<MarketplaceAction>; // REVERSIBLE
updateCampaign(req: MarketplaceRequest, campaignId: string, changes: CampaignUpdate): Promise<MarketplaceAction>; // REVERSIBLE
pauseCampaign(req: MarketplaceRequest, campaignId: string): Promise<MarketplaceAction>; // REVERSIBLE
activateCampaign(req: MarketplaceRequest, campaignId: string): Promise<MarketplaceAction>; // REVERSIBLE
// —— PRE-TRANSACTION READ (1) ——
snapshotProduct(req: MarketplaceRequest, sku: string): Promise<NormalizedProduct>;
// —— INFRA (1) ——
getRateLimits(): RateLimitInfo;
// INTERNAL: Token resolved by adapter, not caller
// Each adapter constructor receives ITokenManager
// On each API call: token = await tokenManager.getToken(req.userId, req.marketplaceSlug)
}
// Replaces ICredentialsVault from eliminated #10
interface ITokenManager {
getToken(userId: string, marketplace: MarketplaceSlug): Promise<string>;
storeCredentials(userId: string, marketplace: MarketplaceSlug, tokens: OAuthTokens): Promise<void>;
revokeCredentials(userId: string, marketplace: MarketplaceSlug): Promise<void>;
getConnectedMarketplaces(userId: string): Promise<MarketplaceConnection[]>;
forceRefresh(userId: string, marketplace: MarketplaceSlug): Promise<string>;
}
// —— DynamoDB Table: marketplace_credentials ——
// PK: userId#marketplace (e.g., "user_123#mercadolibre")
// SK: "CREDENTIAL"
{
pk: string; // userId#marketplace
sk: 'CREDENTIAL';
accessToken: string; // encrypted (AES-256-GCM)
refreshToken: string; // encrypted (AES-256-GCM)
expiresAt: number; // Unix timestamp
scopes: string[]; // Granted permissions
marketplaceUserId: string;
marketplaceNickname: string;
marketplaceCountry: string; // AR, MX, BR, US...
connectedAt: string; // ISO 8601
lastRefreshedAt: string;
lastUsedAt: string;
status: 'active' | 'expired' | 'revoked' | 'disconnected';
refreshFailures: number; // Consecutive failure counter
encryptionKeyId: string; // AWS Secrets Manager key ref
ttl: number; // DynamoDB TTL (24 months)
}
// —— AWS Secrets Manager stores ——
// shopilot/marketplace/mercadolibre → { clientId, clientSecret, redirectUri }
// shopilot/marketplace/amazon → { clientId, clientSecret, redirectUri }
// shopilot/marketplace/shopify → { apiKey, apiSecret, redirectUri }
// shopilot/encryption/token-key → AES-256-GCM encryption key
// —— Token Refresh Cron (EventBridge every 5min) ——
// Query: expiresAt < NOW() + 30min AND status = 'active'
// For each: call marketplace OAuth refresh endpoint
// On success: update DynamoDB, reset refreshFailures = 0
// On failure: increment refreshFailures
// If refreshFailures >= 3: status = 'expired', notify user, pause Data Sync (#10)
// —— Onboarding & Auth ——
POST /auth/connect/:marketplace // Starts OAuth2 flow, returns redirect URL
GET /auth/callback/:marketplace // OAuth2 callback → exchange code → store encrypted tokens
GET /auth/marketplaces/:userId // List connected marketplaces + status
DELETE /auth/disconnect/:userId/:marketplace // Revoke tokens at provider + DynamoDB cleanup
POST /auth/refresh/:userId/:marketplace // Force manual token refresh
// —— Internal (called by adapter, not exposed to frontend) ——
GET /internal/token/:userId/:marketplace // Returns fresh decrypted token (<50ms from cache)
// —— Marketplace Operations (called by Orchestrator tools) ——
POST /marketplace/execute // { action, req, params } → MarketplaceAction
GET /marketplace/snapshot/:userId/:marketplace/:sku // Pre-transaction state capture
// Response shape for all write operations:
interface ExecuteResponse {
action: MarketplaceAction;
warnings: string[]; // e.g., "Rate limit at 80%"
}
- publishProduct(sku, draft) creates product in target marketplace and returns MarketplaceAction with productId in <2s
- updatePrice(sku, price) changes price, stores beforeValue for rollback, verifies change applied
- snapshotProduct(sku) captures full pre-transaction state in <500ms
- Rate limiting respects MeLi 1,500 req/min, Amazon per-endpoint limits, Shopify cost-point bucket — zero 429 errors
- Token resolution is internal — adapter calls tokenManager.getToken() automatically, caller never provides auth_token
- If OAuth2 token expired, TokenManager auto-refreshes before retry. MeLi token refresh serialized with mutex to prevent concurrent invalidation
- requestReview on MeLi returns NotSupportedError with descriptive message
- connectMarketplace completes OAuth2 flow and stores encrypted credentials in DynamoDB
- All 17 write tools work against MeLi + Amazon + Shopify (minus N/A per matrix)
- Complete onboarding: user clicks “Connect MeLi” → OAuth2 → tokens stored encrypted → first Data Sync (#10) triggers automatically
- Token auto-refresh works without user intervention (cron every 5min, pre-refresh 30min before expiry)
- If refresh fails 3 consecutive times, user gets reconnection notification and Data Sync pauses gracefully
- getToken() returns valid decrypted token in <100ms (DynamoDB direct, no cache layer)
- Credentials encrypted at rest (AES-256-GCM) — not readable in DynamoDB directly
- Disconnecting marketplace revokes tokens at provider level and stops Data Sync
- publishProduct(sku, draft) crea producto en marketplace destino y retorna MarketplaceAction con productId en <2s
- updatePrice(sku, price) cambia precio, guarda beforeValue para rollback, verifica que el cambio se aplico
- snapshotProduct(sku) captura estado pre-transaccion completo en <500ms
- Rate limiting respeta MeLi 1,500 req/min, Amazon limites por endpoint, Shopify bucket de cost-points — cero errores 429
- Resolucion de tokens es interna — el adaptador llama tokenManager.getToken() automaticamente, el caller nunca provee auth_token
- Si token OAuth2 expiro, TokenManager auto-renueva antes de reintentar. Token refresh de MeLi serializado con mutex para prevenir invalidacion concurrente
- requestReview en MeLi retorna NotSupportedError con mensaje descriptivo
- connectMarketplace completa flujo OAuth2 y almacena credenciales encriptadas en DynamoDB
- Los 17 write tools funcionan contra MeLi + Amazon + Shopify (menos N/A segun matriz)
- Onboarding completo: usuario clickea “Conectar MeLi” → OAuth2 → tokens almacenados encriptados → primer Data Sync (#10) se dispara automaticamente
- Auto-refresh de tokens funciona sin intervencion del usuario (cron cada 5min, pre-refresh 30min antes de expirar)
- Si refresh falla 3 veces consecutivas, usuario recibe notificacion de reconexion y Data Sync se pausa graciosamente
- getToken() retorna token descifrado valido en <100ms (DynamoDB directo, sin capa de cache)
- Credenciales encriptadas en reposo (AES-256-GCM) — no legibles directamente en DynamoDB
- Desconectar marketplace revoca tokens a nivel proveedor y detiene Data Sync
MeLi rate: 1,500 req/min per seller · Amazon: per-endpoint burst/restore · Shopify: cost-point leaky bucket (1000pts, 50pts/s) · MeLi token: 6h expiry · Refresh cron: 5min · getToken: <100ms (DynamoDB direct)
17 Write Tools — Marketplace Support Matrix 17 Write Tools — Matriz de Soporte por Marketplace
| DomainDominio | Write Tool | MeLi | Amazon | Shopify | RiskRiesgo |
|---|---|---|---|---|---|
| Catalog | publish_product | ✓ | ✓ | ✓ | IRREVERSIBLE |
| update_product_content | ✓ | ✓ | ✓ | REVERSIBLE | |
| update_product_images | ✓ | ✓ | ✓ | REVERSIBLE | |
| update_product_video | ✓ | ✓ (A+) | ✓ | REVERSIBLE | |
| update_price | ✓ | ✓ | ✓ | REVERSIBLE | |
| update_stock | ✓ | ✓ | ✓ | REVERSIBLE | |
| pause_product | ✓ | ✓ | ✓ | REVERSIBLE | |
| activate_product | ✓ | ✓ | ✓ | REVERSIBLE | |
| close_product | ✓ | ✓ | ✓ | IRREVERSIBLE | |
| Engagement | answer_question | ✓ | ✓ | N/A | IRREVERSIBLE |
| hide_question | ✓ | N/A | N/A | REVERSIBLE | |
| send_buyer_message | ✓ | ✓ | ✓ | IRREVERSIBLE | |
| request_review | ✗ | ✓ | ✓ | IRREVERSIBLE | |
| Advertising | create_campaign | ✓ | ✓ (SP-Ads) | ✓ | REVERSIBLE |
| update_campaign | ✓ | ✓ | ✓ | REVERSIBLE | |
| pause_campaign | ✓ | ✓ | ✓ | REVERSIBLE | |
| activate_campaign | ✓ | ✓ | ✓ | REVERSIBLE |
MeLi does not support request_review — adapter returns NotSupportedError with descriptive message. IRREVERSIBLE = requires user approval | REVERSIBLE = pauses for user confirmation.MeLi no soporta request_review — adaptador retorna NotSupportedError con mensaje descriptivo. IRREVERSIBLE = requiere aprobacion del usuario | REVERSIBLE = pausa para confirmacion del usuario.
How It WorksComo Funciona
WRITE OPERATION ONBOARDING FLOW
============== ===============
Orchestrator tool call: 1. User clicks "Connect MeLi"
updatePrice({ userId, marketplaceSlug }, |
sku="PROD-001", price=29.99) v
| 2. POST /auth/connect/mercadolibre
v → returns OAuth2 redirect URL
+---------------------------+ |
| MarketplaceProvider | v
| 1. Resolve token | 3. User accepts permissions
| tokenManager | |
| .getToken(userId, | v
| marketplaceSlug) | 4. GET /auth/callback/mercadolibre
| 2. Resolve SKU → ID | exchange code → tokens
| 3. Snapshot pre-state | |
| 4. Route to adapter | v
+----------+----------------+ 5. TokenManager.storeCredentials()
| encrypt(AES-256-GCM) → DynamoDB
+------+------+--------+ |
v v v v
+-----------+ +---------+ +-----------+ 6. Trigger first Data Sync (#10)
| MeLi | | Amazon | | Shopify |
| Adapter | | Adapter | | Adapter |
| | | | | | TOKEN REFRESH (automatic)
| REST API | | SP-API | | GraphQL | =========================
| OAuth2 | | LWA | | OAuth2 |
| 1.5K/min | | varies | | cost-pts | EventBridge cron (every 5min):
+-----------+ +---------+ +-----------+ Query: expiresAt < NOW()+30min
| | | → refresh at marketplace
v v v → update DynamoDB
+---------------------------------------+ If fails 3x: expire + notify
| MarketplaceAction |
| actionType: update |
| domain: catalog | getToken() FLOW:
| beforeValue: {price: 39.99} | 1. DynamoDB (direct read)
| afterValue: {price: 29.99} | 2. Decrypt AES-256-GCM
| rollbackToken: "rt_abc123" | 3. Latency: <100ms
+---------------------------------------+ (no cache layer)
The Strategy pattern routes each call to the correct adapter based on marketplaceSlug. Tokens are resolved internally by the adapter via TokenManager.getToken() — the caller never provides auth credentials. Before each write, snapshotProduct() captures current state for rollback. IRREVERSIBLE operations require user approval; REVERSIBLE operations pause for confirmation. The adapter returns NotSupportedError when an operation is unavailable (e.g., requestReview on MeLi). MeLi token refresh is serialized with a mutex to prevent concurrent invalidation (only the last refresh_token is valid).El patron Strategy enruta cada llamada al adaptador correcto basado en marketplaceSlug. Los tokens se resuelven internamente por el adaptador via TokenManager.getToken() — el caller nunca provee credenciales de autenticacion. Antes de cada write, snapshotProduct() captura el estado actual para rollback. Las operaciones IRREVERSIBLE requieren aprobacion del usuario; las REVERSIBLE pausan para confirmacion. El adaptador retorna NotSupportedError cuando una operacion no esta disponible (ej: requestReview en MeLi). El token refresh de MeLi se serializa con mutex para prevenir invalidacion concurrente (solo el ultimo refresh_token es valido).
Marketplace Developer CredentialsCredenciales de Desarrollador por Marketplace
| Marketplace | ProcessProceso | TimeTiempo | Required DocsDocs Requeridos |
|---|---|---|---|
| MercadoLibre | Create app on developers.mercadolibre.com, request write permissionsCrear app en developers.mercadolibre.com, solicitar permisos de escritura | 1-2 weekssemanas | Company name, callback URL, usage descriptionNombre empresa, URL callback, descripcion de uso |
| Amazon SP-API | Register on developer.amazonservices.com, LWA client ID (no longer requires AWS IAM)Registrar en developer.amazonservices.com, LWA client ID (ya no requiere AWS IAM) | 2-4 weekssemanas | Company name, address, tax data, use caseNombre empresa, direccion, datos fiscales, caso de uso |
| Shopify | Create app in Partners Dashboard, request write scopesCrear app en Partners Dashboard, solicitar scopes de escritura | 1-3 daysdias | Company name, callback URL, privacy policyNombre empresa, URL callback, politica de privacidad |
Responsible: El Capitan reviews process status weekly. Timeline: Start all 3 in parallel at Week 0 (pre-sprint). Blocker if not completed before Week 2. Amazon SP-API no longer requires AWS IAM or Signature v4 (removed Oct 2023) — auth is standard OAuth2/LWA.Responsable: El Capitan revisa estado de procesos semanalmente. Cronograma: Iniciar los 3 en paralelo en Week 0 (pre-sprint). Blocker si no se completan antes de Week 2. Amazon SP-API ya no requiere AWS IAM ni Signature v4 (eliminado Oct 2023) — auth es OAuth2/LWA estandar.
API Documentation MonitoringMonitoreo de Documentación de APIs
Owned by #16 Eval Suite (api_monitor pipeline). Daily changelog checks + canary tests against live marketplace endpoints. Breaking changes create a Linear issue tagged api-change with the affected adapter and recommended action. This project consumes the alerts and acts on them via adapter patches.Responsabilidad de #16 Eval Suite (pipeline api_monitor). Chequeos diarios de changelogs + canary tests contra endpoints de marketplaces en vivo. Los cambios incompatibles generan un issue en Linear con tag api-change, el adaptador afectado y la accion recomendada. Este proyecto consume las alertas y actua sobre ellas via patches de adaptadores.
Implementation PlanPlan de Implementacion
Phase 0: Developer Credentials + Setup (Week 0)Fase 0: Credenciales de Desarrollador + Setup (Semana 0)
Start developer account applications on all 3 marketplaces in parallel. El Capitan as weekly tracking owner. Prioritize MeLi (fastest, 1-2 weeks). Set up TypeScript project scaffold, DynamoDB table, AWS Secrets Manager secrets. CDK stacks defined in #14.Iniciar tramites de developer accounts en los 3 marketplaces en paralelo. El Capitan como responsable de seguimiento semanal. Priorizar MeLi (mas rapido, 1-2 semanas). Configurar scaffold del proyecto TypeScript, tabla DynamoDB, secretos en AWS Secrets Manager. Stacks CDK definidos en #14.
Phase 1: TokenManager + OAuth2 Flows (Week 1-2)Fase 1: TokenManager + Flujos OAuth2 (Semana 1-2)
Implement ITokenManager with DynamoDB backend + AES-256-GCM encryption. Build OAuth2FlowManager for all 3 marketplaces: MeLi standard OAuth2, Amazon LWA, Shopify OAuth2. Implement /auth/connect, /auth/callback, /auth/disconnect endpoints. Build token refresh cron (EventBridge every 5min, pre-refresh 30min). getToken() reads directly from DynamoDB (<100ms, no cache layer). MeLi token refresh serialized with mutex. This was formerly #10 — now an internal module.Implementar ITokenManager con backend DynamoDB + encriptacion AES-256-GCM. Construir OAuth2FlowManager para los 3 marketplaces: MeLi OAuth2 estandar, Amazon LWA, Shopify OAuth2. Implementar endpoints /auth/connect, /auth/callback, /auth/disconnect. Construir cron de token refresh (EventBridge cada 5min, pre-refresh 30min). getToken() lee directamente de DynamoDB (<100ms, sin capa de cache). Token refresh de MeLi serializado con mutex. Esto era el antiguo #10 — ahora es un modulo interno.
Phase 2: IMarketplaceAdapter + MeLi Adapter (Week 2-3)Fase 2: IMarketplaceAdapter + Adaptador MeLi (Semana 2-3)
Define IMarketplaceAdapter interface with 23 methods (17 write + 3 enrollment + 2 read + 1 infra). Implement SKUResolver (SKU → productId). Implement MeLiAdapter via raw axios (no maintained SDK exists). MarketplaceRequest carries userId + marketplaceSlug only — adapter resolves tokens internally via TokenManager.Definir interfaz IMarketplaceAdapter con 23 metodos (17 escritura + 3 enrolamiento + 2 lectura + 1 infra). Implementar SKUResolver (SKU → productId). Implementar MeLiAdapter via axios directo (no hay SDK mantenido). MarketplaceRequest solo lleva userId + marketplaceSlug — el adaptador resuelve tokens internamente via TokenManager.
Phase 3: Amazon + Shopify Adapters (Week 3-4)Fase 3: Adaptadores Amazon + Shopify (Semana 3-4)
Implement AmazonAdapter using @sp-api-sdk with LWA auth (standard OAuth2, no longer requires AWS SigV4). Implement ShopifyAdapter using @shopify/shopify-api with GraphQL exclusively (REST deprecated Oct 2024). Handle NotSupportedError for N/A operations. Both normalize to MarketplaceAction.Implementar AmazonAdapter usando @sp-api-sdk con auth LWA (OAuth2 estandar, ya no requiere AWS SigV4). Implementar ShopifyAdapter usando @shopify/shopify-api con GraphQL exclusivamente (REST deprecado Oct 2024). Manejar NotSupportedError para operaciones N/A. Ambos normalizan a MarketplaceAction.
Phase 4: Rate Limiting + Rollback + Onboarding (Week 5-6)Fase 4: Rate Limiting + Rollback + Onboarding (Semana 5-6)
Redis cache for pre-transaction snapshots via snapshotProduct(). Per-marketplace rate limiter: MeLi 1,500 req/min token bucket, Amazon per-endpoint burst/restore, Shopify cost-point leaky bucket. Rollback tokens for REVERSIBLE operations. Onboarding flow: connectMarketplace() → OAuth → store credentials → trigger first Data Sync (#10). Integration with Observability (#8).Cache Redis para snapshots pre-transaccion via snapshotProduct(). Rate limiter por marketplace: MeLi 1,500 req/min token bucket, Amazon burst/restore por endpoint, Shopify leaky bucket cost-point. Rollback tokens para operaciones REVERSIBLE. Flujo de onboarding: connectMarketplace() → OAuth → almacenar credenciales → disparar primer Data Sync (#10). Integracion con Observability (#8).
Risk AnalysisAnalisis de Riesgos
Rate Limit ExhaustionAgotamiento de Rate Limits
Impact: HImpacto: A
Mitigation: Per-marketplace rate limiter with backoff. MeLi: 1,500 req/min per seller (token bucket). Amazon: per-endpoint burst/restore. Shopify: cost-point leaky bucket (1000pts, 50pts/s restore) — monitor extensions.cost.throttleStatus in each GraphQL response. Queue overflow requests.Mitigacion: Rate limiter por marketplace con backoff. MeLi: 1,500 req/min por seller (token bucket). Amazon: burst/restore por endpoint. Shopify: leaky bucket cost-point (1000pts, 50pts/s restore) — monitorear extensions.cost.throttleStatus en cada respuesta GraphQL. Encolar requests en overflow.
Token Refresh Failure CascadeFallo en Cascada de Renovacion de Tokens
Impact: HImpacto: A
Mitigation: If marketplace OAuth endpoint goes down, all tokens expire within 6h (MeLi). TokenManager implements exponential backoff and circuit breaker. After 3 consecutive failures per credential, mark expired and notify user. MeLi gotcha: only the last refresh_token is valid — serialize refresh calls with mutex to prevent concurrent invalidation.Mitigacion: Si el endpoint OAuth del marketplace cae, todos los tokens expiran en 6h (MeLi). TokenManager implementa backoff exponencial y circuit breaker. Despues de 3 fallos consecutivos por credencial, marcar como expirado y notificar usuario. Gotcha de MeLi: solo el ultimo refresh_token es valido — serializar llamadas de refresh con mutex para prevenir invalidacion concurrente.
Marketplace API Breaking ChangesCambios Incompatibles en APIs de Marketplaces
Impact: MImpacto: M
Mitigation: Each adapter is isolated. #16 Eval Suite (api_monitor pipeline) checks changelogs every 24h + canary tests run daily against real APIs — creates Linear issue tagged api-change when a breaking change is detected. Breaking change in MeLi only affects MeLiAdapter. Schema validation on responses catches unexpected fields.Mitigacion: Cada adaptador esta aislado. #16 Eval Suite (pipeline api_monitor) chequea changelogs cada 24h + canary tests corren diariamente contra APIs reales — crea issue en Linear con tag api-change cuando detecta un cambio incompatible. Cambio incompatible en MeLi solo afecta MeLiAdapter. Validacion de schema en respuestas detecta campos inesperados.
Developer Account Approval DelaysRetrasos en Aprobacion de Cuentas de Desarrollador
Impact: HImpacto: A
Mitigation: Amazon SP-API can take 2-4 weeks. Start all applications at Week 0. El Capitan tracks weekly. If blocked, prioritize MeLi (1-2 weeks) as first functional adapter.Mitigacion: Amazon SP-API puede tomar 2-4 semanas. Iniciar todos los tramites en Week 0. El Capitan da seguimiento semanal. Si bloquea, priorizar MeLi (1-2 semanas) como primer adapter funcional.
Credential Security BreachBrecha de Seguridad de Credenciales
Impact: HImpacto: A
Mitigation: Tokens encrypted at rest (AES-256-GCM). Encryption key in AWS Secrets Manager (not env var). DynamoDB access restricted via IAM policy. No token ever appears in logs or traces. Post-MVP: AWS KMS with key rotation.Mitigacion: Tokens encriptados at rest (AES-256-GCM). Key de encriptacion en AWS Secrets Manager (no variable de entorno). Acceso a DynamoDB restringido via politica IAM. Ningun token aparece jamas en logs o trazas. Post-MVP: AWS KMS con rotacion de keys.
Key DecisionsDecisiones Clave
SKU as primary identifier over Publication ID — The same product can have multiple Publication IDs across marketplaces. IDs are assigned by the marketplace, not by the seller. Using SKU lets the agent operate without knowing marketplace-internal IDs. Each adapter resolves SKU → productId internally.SKU como identificador primario sobre Publication ID — El mismo producto puede tener multiples Publication IDs en diferentes marketplaces. Los IDs son asignados por el marketplace, no por el vendedor. Usar SKU permite al agente operar sin conocer IDs internos del marketplace. Cada adaptador resuelve SKU → productId internamente.
Strategy Pattern over Factory — Each marketplace is a pluggable adapter implementing IMarketplaceAdapter. Adding a new marketplace means adding one class, zero changes to existing code.Patron Strategy sobre Factory — Cada marketplace es un adaptador pluggable que implementa IMarketplaceAdapter. Agregar un nuevo marketplace significa agregar una clase, cero cambios al codigo existente.
No auth_token in MarketplaceRequest — The adapter resolves tokens internally via ITokenManager. Callers (Orchestrator, tools) only provide userId + marketplaceSlug. This eliminates token leakage risk and simplifies the public interface. Formerly #10 handled this externally; now it is an internal concern.Sin auth_token en MarketplaceRequest — El adaptador resuelve tokens internamente via ITokenManager. Los callers (Orquestador, tools) solo proveen userId + marketplaceSlug. Esto elimina el riesgo de fuga de tokens y simplifica la interfaz publica. Anteriormente #10 manejaba esto externamente; ahora es un asunto interno.
Write-first, reads only for rollback — This module is the execution layer (Create/Modify/Delete). Reads are limited to snapshotProduct() for pre-transaction state capture. The primary data read path lives in Data Sync (#10).Escritura primero, lectura solo para rollback — Este modulo es la capa de ejecucion (Crear/Modificar/Borrar). Las lecturas se limitan a snapshotProduct() para capturar estado pre-transaccion. La ruta principal de lectura de datos vive en Data Sync (#10).
DynamoDB for tokens, AWS Secrets Manager for static secrets — Secrets Manager is for values that change rarely (client_id, client_secret, encryption keys). DynamoDB handles high-frequency token reads/writes with conditional updates and TTL. This separation matches the access pattern of each secret type.DynamoDB para tokens, AWS Secrets Manager para secretos estaticos — Secrets Manager es para valores que cambian raramente (client_id, client_secret, keys de encriptacion). DynamoDB maneja lecturas/escrituras de tokens de alta frecuencia con updates condicionales y TTL. Esta separacion coincide con el patron de acceso de cada tipo de secreto.
Proactive refresh (30min before expiry) over on-demand — MeLi tokens expire every 6h. Refreshing only when a request needs a token creates latency spikes and race conditions. The 5-minute cron proactively refreshes any token expiring within 30 minutes, ensuring getToken() almost always hits a warm cache (<50ms).Renovacion proactiva (30min antes de expirar) sobre bajo demanda — Tokens de MeLi expiran cada 6h. Renovar solo cuando un request necesita un token crea picos de latencia y condiciones de carrera. El cron de 5 minutos renueva proactivamente cualquier token que expire en 30 minutos, asegurando que getToken() casi siempre tenga cache caliente (<50ms).
All 3 marketplaces from Phase 1 — MeLi + Amazon + Shopify ship in MVP. No phased rollout per marketplace. The IMarketplaceAdapter interface ensures each adapter is isolated; one can be developed/tested independently of the others.Los 3 marketplaces desde Fase 1 — MeLi + Amazon + Shopify se lanzan en MVP. Sin rollout por fases por marketplace. La interfaz IMarketplaceAdapter asegura que cada adaptador esta aislado; uno puede desarrollarse/testearse independientemente de los otros.
TypeScript/Node.js over Python — Consistency with core-intelligence services (TypeScript). Amazon @sp-api-sdk is the most active TS SDK. MeLi has no SDK in any language (raw HTTP regardless). Shopify official SDK is JS-first. Module is I/O-bound; marketplace API latency (100-500ms) dominates, not runtime. Strong typing via TypeScript catches adapter interface violations at compile time.TypeScript/Node.js sobre Python — Consistencia con servicios core-intelligence (TypeScript). Amazon @sp-api-sdk es el SDK TS mas activo. MeLi no tiene SDK en ningun lenguaje (HTTP directo sin importar). Shopify SDK oficial es JS-first. El modulo es I/O-bound; la latencia de APIs de marketplace (100-500ms) domina, no el runtime. Tipado fuerte via TypeScript detecta violaciones de interfaz de adaptador en tiempo de compilacion.
MVP Scope
[v4] MeLi REST + Amazon SP-API (LWA) + Shopify GraphQL. 17 write tools across 4 domains + 3 enrollment methods. SKU as primary key. TypeScript/Node.js. Internal TokenManager with DynamoDB + AWS Secrets Manager + AES-256-GCM. OAuth2 for all 3 marketplaces. Token refresh cron 5min. Onboarding flow. Absorbs former #10. [v4] MeLi REST + Amazon SP-API (LWA) + Shopify GraphQL. 17 write tools en 4 dominios + 3 metodos de enrolamiento. SKU como primary key. TypeScript/Node.js. TokenManager interno con DynamoDB + AWS Secrets Manager + AES-256-GCM. OAuth2 para los 3 marketplaces. Cron token refresh 5min. Flujo de onboarding. Absorbe al antiguo #10.
Inspired byInspirado en
Existing data orchestrator connectors. OAuth2 rotation from Data Orchestrator. Conectores existentes del orquestador de datos. Rotacion OAuth2 del Orquestador de Datos.
📝 Project ChangelogChangelog del Proyecto
Layer 5 — PLATFORMCapa 5 — PLATAFORMA
What sustains the business and infrastructureLo que sostiene el negocio e infraestructura
Billing & Credit Economy
core-platform-billing — Sergio
The economics engine of Shopilot — unifies metering and billing in a single project (core-platform-billing). Credit tracking already works in production via PostgreSQL triggers on agent_costs — the application never calculates credits, only inserts costs and the triggers handle deduction from clients.credits. This project extends that foundation with plan management (Free/Pro), Stripe integration for payments, Credit Packs, and the ICreditsGate contract that the Orchestrator (#2) calls before every tool execution. The Orchestrator receives allowed: boolean — it never knows about plans, Stripe, or pricing rules. Prompt caching (Anthropic cache_control) reduces LLM input token costs by 60-80% on layers 1-3 of the SystemPromptComposer. Absorbs former #14 (Billing & Subscription Management).
El motor economico de Shopilot — unifica metering y billing en un solo proyecto (core-platform-billing). El tracking de creditos ya funciona en produccion via triggers de PostgreSQL sobre agent_costs — la aplicacion nunca calcula creditos, solo inserta costos y los triggers manejan la deduccion desde clients.credits. Este proyecto extiende esa base con gestion de planes (Free/Pro), integracion con Stripe para pagos, Credit Packs, y el contrato ICreditsGate que el Orquestador (#2) llama antes de cada ejecucion de tool. El Orquestador recibe allowed: boolean — nunca sabe de planes, Stripe, ni reglas de pricing. Prompt caching (Anthropic cache_control) reduce costos de tokens LLM de entrada 60-80% en las capas 1-3 del SystemPromptComposer. Absorbe el anterior #14 (Billing & Subscription Management).
Beautonomous governance: ICreditsGate enforces resource limits per Core's permission matrix — the Orchestrator cannot execute any tool if the seller's credit budget is exhausted, regardless of role. Free plan limits align with Core's tier-based access rules.Governance de Beautonomous: ICreditsGate aplica los límites de recursos según la matriz de permisos de Core — el Orquestador no puede ejecutar ninguna tool si el presupuesto de créditos del vendedor está agotado, independientemente del rol. Los límites del plan Free se alinean con las reglas de acceso por tier de Core.
What this project does NOT doLo que este proyecto NO hace
agent_costs already do this. This project does not duplicate that logic.Calcular creditos por ejecucion — Los triggers de PostgreSQL sobre agent_costs ya lo hacen. Este proyecto no duplica esa logica.GET /billing/status.Prompts de compra en el chat — El Coach nunca interrumpe una conversacion con "compra mas creditos". Las alertas son datos que la Shell (#1) consume via GET /billing/status.LLMClientFactory in the Orchestrator (#2), not billing.Routing de modelos LLM — Que modelo usar (Haiku/Sonnet/Opus) lo decide el LLMClientFactory en el Orquestador (#2), no billing.ICreditsGate — allowed: boolean before each toolICreditsGate — allowed: boolean antes de cada tool
Upgrade to Pro + Credit Pack purchasesUpgrade a Pro + compra de Credit Packs
5 events, idempotent via stripe_webhook_events5 eventos, idempotente via stripe_webhook_events
FREE → PRO → PAST_DUE → GRACE → FREEFREE → PRO → PAST_DUE → GRACE → FREE
GET /billing/status <100ms for Shell (#1)GET /billing/status <100ms para Shell (#1)
cache_control on layers 1-3 — 60-80% savingscache_control en capas 1-3 — 60-80% ahorro
Plans & Credit PacksPlanes & Credit Packs
MVP PlansPlanes MVP
| Free | Pro | |
|---|---|---|
| PricePrecio | $0 | $49/mo |
| Credits/moCreditos/mes | 50 | 500 |
| Tools READ | ✓ | ✓ |
| Tools ANALYSIS | ✓ | ✓ |
| Tools WRITE | ✗ | ✓ |
| ProactivityProactividad | ✗ | ✓ |
| Credit Packs | ✗ | ✓ |
| At 0 creditsA 0 creditos | HARD BLOCKHARD BLOCK | SOFT BLOCKSOFT BLOCK |
Credit Packs (Pro only)Credit Packs (solo Pro)
| Pack | CreditsCreditos | PricePrecio |
|---|---|---|
| Basic | 100 | $5.00 |
| Popular | 500 | $20.00 |
| Power | 1,000 | $35.00 |
Pack credits expire 12 months from purchase. Plan credits reset monthly. Deduction order: plan first, then packs.Creditos de pack expiran 12 meses desde la compra. Creditos de plan se resetean mensualmente. Orden de deduccion: plan primero, luego packs.
Blocking LogicLogica de Bloqueo
Free, credits = 0 → HARD BLOCK — all tools blocked Pro, credits = 0 → SOFT BLOCK — writes blocked, reads+analysis continue Pro, credits > 0 → everything enabled Free, credits > 0 → reads+analysis enabled, writes always blocked
Tech StackStack Tecnologico
Schema, Contracts, Endpoints & Acceptance Criteria Schema, Contratos, Endpoints & Criterios de Aceptación
-- Hierarchy already in production: -- clients → agents_clients → agent_executions → agent_costs (FK charge_types) -- charge_types (configurable pricing): -- TOKENS: CEIL((input+output)/1000) * 1 credit -- EMBEDDING: 1 credit flat -- VECTOR_SEARCH: 1 credit flat -- BRAND_HEALTH: 1 credit flat -- EXTERNAL_COST: CEIL(cost_usd/0.01) * 1 credit -- Triggers (already working): -- trg_calculate_agent_credits BEFORE INSERT on agent_costs → calculates credits from charge_type -- trg_apply_credits_to_client AFTER INSERT on agent_costs → decrements clients.credits -- after_insert_agent_cost_sync AFTER INSERT on agent_costs → accumulates agents_clients.credits_used -- The app NEVER calculates credits. It inserts into agent_costs with correct charge_type_id. -- Triggers do the rest.
-- Extend clients with subscription fields:
ALTER TABLE clients
ADD COLUMN plan VARCHAR(20) DEFAULT 'free', -- free | pro
ADD COLUMN stripe_customer_id VARCHAR(100), -- cus_xxxxx
ADD COLUMN stripe_subscription_id VARCHAR(100), -- sub_xxxxx
ADD COLUMN subscription_status VARCHAR(30) DEFAULT 'active', -- active | past_due | grace_period | canceled
ADD COLUMN billing_period_start DATE,
ADD COLUMN billing_period_end DATE,
ADD COLUMN credits_from_plan NUMERIC(10,2) DEFAULT 50, -- resets monthly
ADD COLUMN credits_from_packs NUMERIC(10,2) DEFAULT 0; -- 12-month expiry
-- credits = credits_from_plan + credits_from_packs
-- Trigger deduction order: plan first, then packs:
-- UPDATE clients SET
-- credits_from_plan = GREATEST(0, credits_from_plan - deduction),
-- credits_from_packs = GREATEST(0, credits_from_packs - GREATEST(0, deduction - credits_from_plan)),
-- credits = credits - deduction
-- WHERE client_id = $1;
-- New table: stripe_webhook_events (idempotency)
CREATE TABLE stripe_webhook_events (
stripe_event_id VARCHAR(100) PRIMARY KEY, -- evt_xxxxx
event_type VARCHAR(100) NOT NULL,
processed_at TIMESTAMP DEFAULT now(),
payload JSONB,
status VARCHAR(20) DEFAULT 'processed' -- processed | failed
);
-- New table: credit_pack_purchases
CREATE TABLE credit_pack_purchases (
purchase_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id VARCHAR(50) NOT NULL REFERENCES clients(client_id),
stripe_payment_intent VARCHAR(100) NOT NULL,
pack_type VARCHAR(20) NOT NULL, -- basic | popular | power
credits_added NUMERIC(10,2) NOT NULL, -- 100 | 500 | 1000
amount_usd NUMERIC(8,2) NOT NULL, -- 5.00 | 20.00 | 35.00
purchased_at TIMESTAMP DEFAULT now(),
expires_at TIMESTAMP -- purchased_at + 12 months
);
// --- In core-intelligence-conversation-api (domain layer) ---
// domain/ports/ICreditsGate.ts
interface ICreditsGate {
canProceed(params: {
userId: string
toolCategory: 'read' | 'analysis' | 'write' | 'system'
}): Promise<CreditGateResult>
}
interface CreditGateResult {
allowed: boolean
reason?: 'insufficient_credits' | 'writes_blocked_no_credits' | 'plan_restriction'
creditsRemaining: number
plan: 'free' | 'pro'
}
// infrastructure/billing/HttpCreditGate.ts
class HttpCreditGate implements ICreditsGate {
// Calls POST /internal/gate on core-platform-billing
// Auth: x-internal-key header (SSM SecureString, rotated)
// Fail-open: if billing unavailable, returns allowed: true
// (triggers still deduct credits independently)
}
// --- In core-platform-billing ---
// POST /internal/gate handler (not public, internal API key required)
async function creditGateHandler(req): Promise<CreditGateResult> {
const { userId, toolCategory } = req.body
const { credits, plan } = await db.queryOne(
'SELECT credits, plan FROM clients WHERE client_id = $1', [userId]
)
if (plan === 'free' && credits <= 0)
return { allowed: false, reason: 'insufficient_credits', creditsRemaining: 0, plan }
if (plan === 'pro' && credits <= 0 && toolCategory === 'write')
return { allowed: false, reason: 'writes_blocked_no_credits', creditsRemaining: 0, plan }
return { allowed: true, creditsRemaining: credits, plan }
}
// Public endpoints (authenticated user):
// POST /billing/checkout → Stripe Checkout Session (mode=subscription) for Pro upgrade
// POST /billing/packs/checkout → Stripe Checkout Session (mode=payment) for Credit Pack (Pro only)
// POST /billing/portal → Stripe Customer Portal redirect (upgrade/cancel/invoices)
// GET /billing/status → BillingStatus (<100ms, direct read from clients)
// Internal endpoint (service-to-service):
// POST /internal/gate → CreditGateResult (x-internal-key auth)
// Webhook endpoint (Stripe signature verification):
// POST /billing/webhook → Handles 5 Stripe events with idempotency
interface BillingStatus {
plan: 'free' | 'pro'
subscriptionStatus: 'active' | 'past_due' | 'grace_period' | 'canceled'
creditsRemaining: number // = credits_from_plan + credits_from_packs
creditsFromPlan: number
creditsFromPacks: number
billingPeriodEnd: Date | null
percentUsed: number // for UI alerts in Shell (#1)
}
// All webhooks: INSERT into stripe_webhook_events first. // If PK violation → already processed → return 200 (idempotent). // invoice.payment_succeeded → renew subscription + reset credits_from_plan // invoice.payment_failed → subscription_status = 'past_due' (UI in #1 notifies) // customer.subscription.deleted → start 7-day grace period, then downgrade to Free // customer.subscription.updated → sync plan, price, billing dates // checkout.session.completed (mode=payment) → credit pack: INSERT credit_pack_purchases + add credits
// System prompt layers with cache_control: { type: "ephemeral" }
// Layer 1: Personality base ~1200 tokens cache hit ~95%
// Layer 2: Marketplace context ~400 tokens cache hit ~70%
// Layer 3: Tool definitions ~800 tokens cache hit ~90%
// Layer 4: User profile ~200 tokens NOT cached (dynamic)
//
// Reduction: 60-80% input token cost on layers 1-3.
// 5+ turn conversations see cumulative savings.
// Implementation lives in SystemPromptComposer (core-intelligence-conversation-api)
// Billing documents it because it directly impacts plan operating cost.
- ICreditsGate.canProceed() returns correct allowed/reason for all 4 blocking scenarios (Free 0cr, Pro 0cr write, Pro 0cr read, Free >0cr write)
- GET /billing/status returns BillingStatus in <100ms (direct PK read from clients)
- Stripe webhooks are idempotent: re-delivered events do not duplicate credits or re-trigger state changes
- invoice.payment_succeeded resets credits_from_plan to plan quota (50 or 500) and preserves credits_from_packs
- Credit Pack checkout adds credits_from_packs without affecting credits_from_plan
- Subscription state machine: FREE → PRO → PAST_DUE → GRACE_PERIOD → FREE works end-to-end
- Grace period: 7 days post-cancellation, Pro features remain active, then downgrade to Free with plan credits lost and pack credits preserved
- PostgreSQL trigger deduction order: plan credits consumed before pack credits
- Race condition: concurrent tool calls with 1 credit remaining — only one passes (WHERE credits >= deduction)
- Fail-open: if billing service is unavailable, HttpCreditGate returns allowed: true (triggers still deduct independently)
- Prompt caching: layers 1-3 of SystemPromptComposer use cache_control headers, achieving 60-80% input token cost reduction
- ICreditsGate.canProceed() retorna allowed/reason correcto para los 4 escenarios de bloqueo (Free 0cr, Pro 0cr write, Pro 0cr read, Free >0cr write)
- GET /billing/status retorna BillingStatus en <100ms (lectura directa por PK de clients)
- Webhooks de Stripe son idempotentes: eventos re-entregados no duplican creditos ni re-disparan cambios de estado
- invoice.payment_succeeded resetea credits_from_plan a la cuota del plan (50 o 500) y preserva credits_from_packs
- Checkout de Credit Pack suma a credits_from_packs sin afectar credits_from_plan
- Maquina de estados: FREE → PRO → PAST_DUE → GRACE_PERIOD → FREE funciona end-to-end
- Periodo de gracia: 7 dias post-cancelacion, features Pro siguen activas, luego downgrade a Free con creditos de plan perdidos y creditos de pack preservados
- Orden de deduccion del trigger PostgreSQL: creditos de plan se consumen antes que creditos de pack
- Race condition: tool calls concurrentes con 1 credito restante — solo una pasa (WHERE credits >= deduction)
- Fail-open: si el servicio de billing no esta disponible, HttpCreditGate retorna allowed: true (triggers siguen deduciendo independientemente)
- Prompt caching: capas 1-3 del SystemPromptComposer usan headers cache_control, logrando 60-80% reduccion en costo de tokens de entrada
How It WorksComo Funciona
ReActOrchestrator (#2)
|
| (before each tool call)
v
ICreditsGate.canProceed({ userId, toolCategory })
|
|— allowed: true → execute tool normally
|
+— allowed: false → append tool_result("No credits for this operation")
→ loop continues, LLM explains to user
Credit deduction (independent path):
tool executes → INSERT into agent_costs
→ trg_calculate_agent_credits (BEFORE INSERT)
→ trg_apply_credits_to_client (AFTER INSERT)
→ clients.credits decremented automatically
Subscription lifecycle:
SIGNUP → FREE (50cr/mo, internal reset)
|
| upgrade (Stripe Checkout)
v
PRO (500cr/mo, Stripe invoice reset) —————+
| | buy pack
| payment_failed v
v credit_pack_purchases
PAST_DUE (3 Stripe retries) + credits_from_packs
|
| all retries fail
v
GRACE_PERIOD (7 days, Pro still active)
|
| expires
v
FREE (downgrade, plan credits lost, pack credits preserved)
The Orchestrator calls ICreditsGate.canProceed() before each tool — it receives allowed: boolean and never knows about plans, Stripe, or pricing rules. Credit deduction happens independently via PostgreSQL triggers on agent_costs — the app only inserts costs, triggers handle the math. Stripe webhooks synchronize subscription state (renewals, failures, cancellations) into PostgreSQL. The billing service is the single source of truth for plan/credit state, while Stripe is the source of truth for payments.El Orquestador llama ICreditsGate.canProceed() antes de cada tool — recibe allowed: boolean y nunca sabe de planes, Stripe, ni reglas de pricing. La deduccion de creditos ocurre independientemente via triggers de PostgreSQL sobre agent_costs — la app solo inserta costos, los triggers manejan la matematica. Los webhooks de Stripe sincronizan el estado de suscripcion (renovaciones, fallos, cancelaciones) en PostgreSQL. El servicio de billing es la unica fuente de verdad para estado de plan/creditos, mientras Stripe es la fuente de verdad para pagos.
Implementation PlanPlan de Implementacion
Phase 1: Credit Gate + Schema (Orchestrator prerequisite)Fase 1: Credit Gate + Schema (prerequisito del Orquestador)
In core-platform-billing: extend clients with subscription columns (plan, stripe_customer_id, subscription_status, credits_from_plan, credits_from_packs, billing_period_*). Create stripe_webhook_events and credit_pack_purchases tables. Update deduction trigger for plan-vs-pack ordering. Build POST /internal/gate endpoint with Free/Pro decision logic. In core-intelligence-conversation-api: define ICreditsGate interface (domain) + HttpCreditGate implementation (HTTP call to billing service). All users start as Free with 50 credits/month.En core-platform-billing: extender clients con columnas de suscripcion (plan, stripe_customer_id, subscription_status, credits_from_plan, credits_from_packs, billing_period_*). Crear tablas stripe_webhook_events y credit_pack_purchases. Actualizar trigger de deduccion para orden plan-vs-pack. Construir endpoint POST /internal/gate con logica de decision Free/Pro. En core-intelligence-conversation-api: definir interfaz ICreditsGate (dominio) + implementacion HttpCreditGate (llamada HTTP al servicio de billing). Todos los usuarios arrancan como Free con 50 creditos/mes.
Phase 2: Stripe Checkout + WebhooksFase 2: Stripe Checkout + Webhooks
Checkout flow for upgrade to Pro (Stripe-hosted page — full PCI compliance, no card data touches our servers). Webhooks: invoice.payment_succeeded, payment_failed, subscription.deleted, subscription.updated. Customer Portal for self-service (upgrades, cancellations, invoices). Cron for monthly reset of Free users (check billing_period_end < now() for users without Stripe subscription).Flujo de checkout para upgrade a Pro (pagina hosted por Stripe — compliance PCI total, ningun dato de tarjeta toca nuestros servidores). Webhooks: invoice.payment_succeeded, payment_failed, subscription.deleted, subscription.updated. Customer Portal para autoservicio (upgrades, cancelaciones, facturas). Cron para reset mensual de usuarios Free (verificar billing_period_end < now() para usuarios sin suscripcion Stripe).
Phase 3: Credit Packs + Prompt CachingFase 3: Credit Packs + Prompt Caching
Credit Pack checkout (3 options, Pro only) via Stripe mode=payment. GET /billing/status endpoint for Shell (#1) with BillingStatus response. Prompt caching with cache_control: { type: "ephemeral" } on layers 1-3 of SystemPromptComposer. Quota alerts visible to frontend via billing status (percentUsed field).Checkout de Credit Packs (3 opciones, solo Pro) via Stripe mode=payment. Endpoint GET /billing/status para Shell (#1) con respuesta BillingStatus. Prompt caching con cache_control: { type: "ephemeral" } en capas 1-3 del SystemPromptComposer. Alertas de cuota visibles para el frontend via billing status (campo percentUsed).
Risk AnalysisAnalisis de Riesgos
Stripe ↔ PostgreSQL DriftDesfase Stripe ↔ PostgreSQL
Impact: HighImpacto: Alto
Mitigation: Stripe retries webhooks up to 72h with exponential backoff. Nightly reconciliation cron compares subscription_status in Stripe vs PostgreSQL. If they diverge, Stripe wins (source of truth for payments).Mitigacion: Stripe reintenta webhooks hasta 72h con backoff exponencial. Cron de reconciliacion nocturno compara subscription_status en Stripe vs PostgreSQL. Si divergen, gana Stripe (fuente de verdad de pagos).
Double Credit Deduction (race condition)Doble Deduccion de Creditos (race condition)
Impact: HighImpacto: Alto
Mitigation: PostgreSQL UPDATE with WHERE credits >= deduction prevents overdrawing. If the condition fails, trigger returns error. Gate implementation handles that error as allowed: false.Mitigacion: UPDATE de PostgreSQL con WHERE credits >= deduction previene sobregirar. Si la condicion falla, el trigger retorna error. La implementacion del gate maneja ese error como allowed: false.
Free Plan Abuse (multi-account)Abuso del Plan Free (multi-cuenta)
Impact: MediumImpacto: Medio
Mitigation: 50 credits/month is enough to evaluate but not to operate a business. Account creation requires Memberstack verification. Multi-account abuse patterns detected by IP/email in Memberstack.Mitigacion: 50 creditos/mes es suficiente para evaluar pero no para operar un negocio. La creacion de cuenta requiere verificacion via Memberstack. Patrones de abuso multi-cuenta detectados por IP/email en Memberstack.
Stripe Failure During CheckoutFalla de Stripe Durante Checkout
Impact: MediumImpacto: Medio
Mitigation: Checkout is non-critical (Coach works without it). Circuit breaker on checkout endpoint. Stripe has 99.99% historical uptime.Mitigacion: El checkout es una ruta no critica (el Coach funciona sin ella). Circuit breaker en el endpoint de checkout. Stripe tiene 99.99% de uptime historico.
Key DecisionsDecisiones Clave
Unify metering and billing in one project (#13 + #14) — The original separation created circular dependencies: billing needs to know credit consumption, metering needs to know plan limits. Unified in core-platform-billing, the source of truth is a single project.Unificar metering y billing en un proyecto (#13 + #14) — La separacion original creaba dependencias circulares: billing necesita saber el consumo de creditos, metering necesita saber los limites de cada plan. Unificados en core-platform-billing, la fuente de verdad es un solo proyecto.
Stripe Checkout + Customer Portal instead of custom payment UI — Full PCI compliance handled by Stripe. No card data touches our servers. Customer Portal provides upgrade, downgrade, invoice history, and cancellation without building those screens (that's Shell #1's job).Stripe Checkout + Customer Portal en vez de UI de pago propia — Compliance PCI manejado completamente por Stripe. Ningun dato de tarjeta toca nuestros servidores. El Customer Portal provee upgrade, downgrade, historial de facturas y cancelacion sin construir esas pantallas (eso es trabajo de Shell #1).
Credits as user abstraction, not raw tokens — Sellers understand "this action costs 3 credits", not "3,247 input tokens". Credits allow changing underlying LLM costs without altering the user-facing pricing experience.Creditos como abstraccion de usuario, no tokens crudos — Los vendedores entienden "esta accion cuesta 3 creditos", no "3,247 tokens de entrada". Los creditos permiten cambiar costos subyacentes de LLM sin alterar la experiencia de precio del usuario.
Soft block (Pro) instead of hard block when credits run out — Blocking everything creates terrible UX. Pro can still query data and receive analysis — only marketplace mutations are blocked. Keeps the user in the product while showing the value of buying more credits.Soft block (Pro) en vez de hard block al agotar creditos — Bloquear todo crea una UX terrible. Pro puede seguir consultando datos y recibiendo analisis — solo las mutaciones de marketplace se bloquean. Mantiene al usuario en el producto mientras muestra el valor de comprar mas creditos.
Blocking logic lives in billing, not in the Orchestrator — If tomorrow we add a Business plan with different rules, the Orchestrator doesn't change. Only the ICreditsGate implementation changes. This boundary is intentional.La logica de bloqueo vive en billing, no en el Orquestador — Si manana anadimos un plan Business con reglas distintas, el Orquestador no cambia. Solo cambia la implementacion de ICreditsGate. Este boundary es intencional.
Existing PostgreSQL schema as source of truth, not replaced — The trigger system is already correct. The migration extends clients with subscription fields. We don't rewrite what works. No Redis, no extra cache for the credit gate — a PK lookup on clients is O(1) sub-millisecond.Schema PostgreSQL existente como fuente de verdad, no reemplazarlo — El sistema de triggers ya es correcto. La migracion extiende clients con campos de suscripcion. No reescribimos lo que funciona. Sin Redis, sin cache extra para el credit gate — un lookup por PK sobre clients es O(1) sub-milisegundo.
MVP Scope
Free ($0, 50cr/mo) + Pro ($49/mo, 500cr). Credit Packs (3 tiers, Pro only, 12-month expiry). ICreditsGate contract with Orchestrator (#2). Stripe Checkout + Customer Portal + 5 webhooks. Subscription state machine (FREE → PRO → PAST_DUE → GRACE → FREE). Prompt caching on layers 1-3 (60-80% input cost reduction). No admin dashboard (Stripe Dashboard). No Business plan (Phase 2). Free ($0, 50cr/mes) + Pro ($49/mes, 500cr). Credit Packs (3 niveles, solo Pro, expiran en 12 meses). Contrato ICreditsGate con Orquestador (#2). Stripe Checkout + Customer Portal + 5 webhooks. Maquina de estados de suscripcion (FREE → PRO → PAST_DUE → GRACE → FREE). Prompt caching en capas 1-3 (60-80% reduccion de costo de entrada). Sin dashboard admin (Stripe Dashboard). Sin plan Business (Fase 2).
Inspired byInspirado en
AgentTracking + Stripe (Sellerfy). Claude Code prompt caching patterns. Anthropic cache_control documentation. AgentTracking + Stripe (Sellerfy). Patrones de prompt caching de Claude Code. Documentación de cache_control de Anthropic.
📝 Project ChangelogChangelog del Proyecto
DevOps (IaC)
Platform — Andrés
Infrastructure as Code for all Shopilot cloud resources. Governing principle: data projects → GCP (Terraform) · backend / API / microservices → AWS (CloudFormation/CDK). Exceptions are explicit and project-specific (e.g. DynamoDB stays on AWS even for data it stores, because it is already defined as the backend for #12, #2, and other service projects). A GCP Terraform project already exists for Data Sync (#10) — Cloud Composer (Airflow), GCS buckets, Cloud Run (FastAPI Data API), BigQuery. This project formalizes and extends that foundation to cover all modules. AWS CloudFormation/CDK is new: DynamoDB tables, Lambda functions, API Gateway, Secrets Manager, SSM. Every infrastructure change goes through version-controlled IaC — no manual console provisioning. Infraestructura como Codigo para todos los recursos cloud de Shopilot. Principio rector: proyectos de datos → GCP (Terraform) · backend / API / microservicios → AWS (CloudFormation/CDK). Las excepciones son explicitas y especificas por proyecto (ej. DynamoDB permanece en AWS incluso para datos que almacena, porque ya esta definido como backend de #12, #2 y otros proyectos de servicio). Ya existe un proyecto Terraform de GCP para Data Sync (#10) — Cloud Composer (Airflow), buckets GCS, Cloud Run (FastAPI Data API), BigQuery. Este proyecto formaliza y extiende esa base para cubrir todos los modulos. AWS CloudFormation/CDK es nuevo: tablas DynamoDB, funciones Lambda, API Gateway, Secrets Manager, SSM. Cada cambio de infraestructura pasa por IaC versionado — sin aprovisionamiento manual en consola.
Beautonomous governance: GitHub Actions workflows for production deployments are subject to Core's role gates — only El Mago (Mateo) can approve and trigger production deployments. IaC authoring is El Artesano scope (Andres); production promotion is El Mago scope (Mateo). No manual console provisioning — every change is version-controlled and auditable.Governance de Beautonomous: los workflows de GitHub Actions para deploys a producción están sujetos a los gates de roles de Core — solo El Mago (Mateo) puede aprobar y lanzar deploys a producción. La autoría de IaC es ámbito de El Artesano (Andres); la promoción a producción es ámbito de El Mago (Mateo). Sin aprovisionamiento manual en consola — cada cambio es versionado y auditable.
Data projects — #10, #9, #11, Open MetadataProyectos de datos — #10, #9, #11, Open Metadata
Backend/API/microservices — #12, #8, #13, #2, #3, #15, #4–#6, #7Backend/API/microservicios — #12, #8, #13, #2, #3, #15, #4–#6, #7
GitHub Actions: plan/applyGitHub Actions: plan/apply
dev / staging / prod
GCS backend (TF) + S3 (CF)Backend GCS (TF) + S3 (CF)
Grows with each project deployCrece con cada deploy de proyecto
Tech StackStack Tecnologico
Modules, Coverage & Acceptance Criteria Modulos, Cobertura & Criterios de Aceptación
# ═══════════════════════════════════════════════════════════
# GOVERNING RULE:
# DATA projects → GCP (Terraform)
# BACKEND/API/services → AWS (CloudFormation / CDK)
# Exceptions: DynamoDB and other AWS-native services stay
# on AWS even for data they store, as defined per project.
# ═══════════════════════════════════════════════════════════
# ─── GCP (Terraform) — DATA layer ───────────────────────────
# Projects: #10 Data Sync · #9 Cerebro KB · #11 Enrichment
# Repo: shopilot-infra/terraform/
modules/
├── data-sync/ # EXISTS — Cloud Composer (Airflow), GCS,
│ # Cloud Run (FastAPI Data API), BigQuery → #10
├── cerebro-kb/ # BigQuery kb_embeddings, Vertex AI embeddings → #9
├── enrichment/ # Cloud Run (Enrichment service) → #11
├── open-metadata/ # Open Metadata server (Cloud Run)
├── networking/ # VPC, subnets, firewall rules (shared GCP)
├── iam/ # GCP Service accounts, roles, bindings
└── monitoring/ # Cloud Monitoring dashboards + alerts (GCP)
envs/
├── dev.tfvars
├── staging.tfvars
└── prod.tfvars
# State: gs://shopilot-tf-state/{env}/terraform.tfstate
# ─── AWS (CloudFormation / CDK) — BACKEND/API/services ──────
# Projects: #12 · #8 · #13 · #2 · #3 · #15 · #4 · #5 · #6 · #7
# Repo: shopilot-infra/cloudformation/ (or CDK app)
stacks/
├── dynamodb.yaml # Conversation, Session, Token tables → #12, #2, #3
├── marketplace.yaml # Lambda + API GW + Secrets Manager → #12
├── orchestrator.yaml # Lambda (ReAct loop, tool dispatch) → #2
├── intelligence.yaml # Lambda (Personality #4, Context #5,
│ # Proactive #6, Guardrails #7)
├── billing.yaml # RDS PostgreSQL, Lambda, Stripe webhooks → #13
├── feedback.yaml # Lambda + EventBridge (impact tracking) → #15
├── observability.yaml # CloudWatch dashboards, X-Ray, alarms → #8
├── ssm.yaml # Parameter Store configs (all services)
├── iam.yaml # Lambda execution roles, IAM policies
└── eventbridge.yaml # Cron rules (token refresh, sync triggers)
# State: S3 bucket (CF native) / CDK bootstrap
# Environments: dev / staging / prod (stack suffixes / CDK context)
- All GCP resources provisioned via Terraform — zero manual console changes
- All AWS resources provisioned via CloudFormation — zero manual console changes
- CI/CD: terraform plan on PR, terraform apply on merge to main
- 3 environments (dev/staging/prod) with isolated state per env
- New project infra added by creating a new Terraform module or CF stack
- Drift detection: weekly check for manual changes, alert if found
- Todos los recursos GCP aprovisionados via Terraform — cero cambios manuales en consola
- Todos los recursos AWS aprovisionados via CloudFormation — cero cambios manuales en consola
- CI/CD: terraform plan en PR, terraform apply en merge a main
- 3 ambientes (dev/staging/prod) con estado aislado por ambiente
- Infra de nuevo proyecto se agrega creando un nuevo modulo Terraform o stack CF
- Deteccion de drift: verificacion semanal de cambios manuales, alertar si se encuentran
Envs: 3 (dev/staging/prod) · GCP: Terraform · AWS: CloudFormation · CI: GitHub Actions · Drift: weekly
How It WorksComo Funciona
Developer pushes infra change
|
v
GitHub Actions CI/CD
+-- terraform plan / cfn validate / cdk diff (on PR)
+-- terraform apply / cfn deploy / cdk deploy (on merge to main)
|
+── GCP (Terraform) ← DATA projects
| +── data-sync module EXISTS — Airflow, GCS, BigQuery, Cloud Run
| +── cerebro-kb module BigQuery embeddings + Vertex AI
| +── enrichment module Cloud Run (#11)
| +── open-metadata module
| +── networking / iam / monitoring
|
+── AWS (CloudFormation / CDK) ← BACKEND / API / microservices
+── dynamodb stack #12 (tokens) · #2 (sessions) · #3
+── marketplace stack #12 Lambda + API GW + Secrets Manager
+── orchestrator stack #2 Lambda
+── intelligence stack #4 #5 #6 #7 Lambda
+── billing stack #13 RDS + Lambda + Stripe
+── feedback stack #15 Lambda + EventBridge
+── observability stack #8 CloudWatch + X-Ray
+── ssm / iam / eventbridge
Environments: dev → staging → prod
State: GCS (Terraform) / S3 or CDK bootstrap (CloudFormation)
Key DecisionsDecisiones Clave
Cloud split driven by project type, not tooling preference — Data projects (#10 Data Sync, #9 Cerebro KB, #11 Enrichment) run on GCP because that's where the data infrastructure already lives (Cloud Composer, GCS, BigQuery, Vertex AI). All backend/API/microservice projects (#12, #8, #13, #2, #3, #15, #4–#6, #7) run on AWS because they use Lambda, DynamoDB, API Gateway, Secrets Manager — AWS-native services already specified per project. Exceptions are explicit: DynamoDB stays on AWS even for data it stores, because it is the runtime store for service state (tokens, sessions, credits), not the analytical data layer. This split is a governing rule, not a default.Particion cloud orientada por tipo de proyecto, no por preferencia de herramientas — Los proyectos de datos (#10 Data Sync, #9 Cerebro KB, #11 Enrichment) corren en GCP porque ahi ya vive la infraestructura de datos (Cloud Composer, GCS, BigQuery, Vertex AI). Todos los proyectos de backend/API/microservicios (#12, #8, #13, #2, #3, #15, #4–#6, #7) corren en AWS porque usan Lambda, DynamoDB, API Gateway, Secrets Manager — servicios nativos de AWS ya especificados por proyecto. Las excepciones son explicitas: DynamoDB permanece en AWS incluso para datos que almacena, porque es el store de estado de servicio en runtime (tokens, sesiones, creditos), no la capa de datos analiticos. Esta particion es una regla rectora, no un valor por defecto.
Terraform for GCP, CloudFormation/CDK for AWS — Terraform is already in use for Data Sync GCP resources. CloudFormation/CDK is native to AWS and the right fit for Lambda/DynamoDB/API Gateway stacks. No need for a single tool when each cloud has a mature native option. CDK generates CloudFormation — both are valid; per-project choice.Terraform para GCP, CloudFormation/CDK para AWS — Terraform ya esta en uso para recursos GCP de Data Sync. CloudFormation/CDK es nativo de AWS y el ajuste correcto para stacks Lambda/DynamoDB/API Gateway. No se necesita una sola herramienta cuando cada nube tiene una opcion nativa madura. CDK genera CloudFormation — ambos son validos; eleccion por proyecto.
Extend, don't rewrite existing Terraform — The GCP Terraform project for Data Sync already works. New modules are added alongside it. Same patterns, same state backend, same CI/CD pipeline.Extender, no reescribir Terraform existente — El proyecto Terraform de GCP para Data Sync ya funciona. Nuevos modulos se agregan junto a el. Mismos patrones, mismo backend de estado, mismo pipeline CI/CD.
Transversal project — grows with every module — DevOps is not a one-time deliverable. Each project that needs cloud resources adds a module/stack here. The cloud split rule determines where it goes. The project scope expands organically.Proyecto transversal — crece con cada modulo — DevOps no es un entregable unico. Cada proyecto que necesita recursos cloud agrega un modulo/stack aqui. La regla de particion cloud determina adonde va. El alcance del proyecto se expande organicamente.
MVP Scope
Formalize existing GCP Terraform. Add AWS CloudFormation stacks (DynamoDB, Lambda, API Gateway). CI/CD with GitHub Actions. 3 environments. Formalizar Terraform GCP existente. Agregar stacks AWS CloudFormation (DynamoDB, Lambda, API Gateway). CI/CD con GitHub Actions. 3 ambientes.
Inspired byInspirado en
Existing GCP Terraform for Data Sync. Standard IaC practices. Terraform GCP existente para Data Sync. Practicas estandar de IaC.
📝 Project ChangelogChangelog del Proyecto
Go to Market & Analytics
core-platform-gtm-analytics — Pablo · External Team
Go-to-market strategy and user activity tracking. Defines the launch playbook (positioning, channels, early adopter acquisition, onboarding funnels) and the analytics infrastructure to measure product usage, retention, conversion, and growth metrics from day one. Owned by Pablo and an external GTM + analytics team — no engineering tasks for the internal dev team in the MVP sprints. Estrategia de salida al mercado y rastreo de actividad del usuario. Define el playbook de lanzamiento (posicionamiento, canales, adquisición de early adopters, funnels de onboarding) y la infraestructura de analytics para medir uso del producto, retención, conversión y métricas de crecimiento desde el día uno. A cargo de Pablo y un equipo externo de GTM + analytics — sin tareas de ingeniería para el equipo interno de desarrollo en los sprints del MVP.
Owned by Pablo + external GTM & analytics team — no sprint tasks for Sergio, Andrés, or Mateo. Internal engineers may integrate analytics SDKs when specs are ready. A cargo de Pablo + equipo externo de GTM & analytics — sin tareas de sprint para Sergio, Andrés ni Mateo. Los ingenieros internos podrán integrar SDKs de analytics cuando los specs estén listos.
Detailed spec pending — project structure created, content will be added when the GTM strategy is defined. Spec detallado pendiente — estructura del proyecto creada, el contenido se agregará cuando la estrategia de GTM esté definida.
📝 Project Changelog Changelog del Proyecto
Layer 6 — QUALITYCapa 6 — CALIDAD
What measures if the Coach works wellLo que mide si el Coach funciona bien
Feedback Loop
Quality — Sergio
[Phase 2 MVP] core-quality-feedback measures the real business impact of actions the Coach executes in the marketplace. When the Coach changes a product title, this project waits 7 days, compares before/after metrics, and calculates a weighted impact score. Also manages when to ask the seller for explicit feedback (anti-fatigue gate) and collects implicit signal (accepted/rejected/edited proposals). The Coach emits raw signal via conversation-api — it does not process, measure, or decide when to ask.
[Fase 2 MVP] core-quality-feedback mide el impacto real de negocio de las acciones que el Coach ejecuta en el marketplace. Cuando el Coach cambia el titulo de un producto, este proyecto espera 7 dias, compara metricas antes/despues, y calcula un impact score ponderado. Tambien gestiona cuando pedirle feedback al vendedor (gate anti-fatiga) y recopila senal implicita (propuestas aceptadas/rechazadas/editadas). El Coach emite la senal cruda via conversation-api — no procesa, no mide, no decide cuando preguntar.
Beautonomous governance: every action measured by Feedback Loop first passed through Core's governance gates — only CONFIRMED WRITE actions (via ConfirmationFlow) generate FeedbackEntry records. Impact measurement is an audit of Core-governed changes, linking accountability to outcomes.Governance de Beautonomous: cada acción medida por Feedback Loop primero pasó por los gates de governance de Core — solo las acciones WRITE CONFIRMED (vía ConfirmationFlow) generan registros FeedbackEntry. La medición de impacto es una auditoría de cambios gobernados por Core, vinculando responsabilidad con resultados.
What this project does NOT doLo que este proyecto NO hace
conversation-apiCapturar senal cruda → conversation-apiCron every 6h — closes pending entries after 7 daysCron cada 6h — cierra entries pendientes tras 7 dias
Anti-fatigue — decides if/when to show feedback promptAnti-fatiga — decide si/cuando mostrar prompt de feedback
6 REST endpoints — summary, history, should-prompt, explicit, implicit6 endpoints REST — summary, history, should-prompt, explicit, implicit
Computes impact summaries and topWins per userComputa summaries de impacto y topWins por usuario
Responsibility Split with conversation-apiDivision de Responsabilidades con conversation-api
Lives in conversation-apiVive en conversation-api
FeedbackCapture — hook before_tool / after_tool of HookLifecycle. On WRITE tool: snapshots ProductMetrics before, writes raw FeedbackEntry to DynamoDB with status: pending. Has write-only IAM on the table.FeedbackCapture — hook before_tool / after_tool del HookLifecycle. En tool WRITE: snapshot de ProductMetrics antes, escribe FeedbackEntry raw en DynamoDB con status: pending. Tiene IAM write-only sobre la tabla.
Lives in this projectVive en este proyecto
DynamoDB table (owned here), FeedbackMeasurer (cron), FeedbackGate (anti-fatigue), REST endpoints, explicit/implicit collection. The Shell queries GET /should-prompt — no anti-fatigue logic in the Shell.Tabla DynamoDB (owned aqui), FeedbackMeasurer (cron), FeedbackGate (anti-fatiga), endpoints REST, recopilacion explicita/implicita. La Shell consulta GET /should-prompt — sin logica anti-fatiga en la Shell.
Tech StackStack Tecnologico
Data Models, API Signatures & Acceptance Criteria Modelos de Datos, APIs & Criterios de Aceptación
interface FeedbackEntry {
id: string; // ULID
userId: string;
executionId: string; // AgentExecution id del turno
toolName: string; // e.g. "update_product_content"
productId: string;
marketplace: 'meli' | 'amazon';
fieldChanged?: string; // e.g. "title", "description"
valueBefore?: string;
valueAfter?: string;
executedAt: Date;
metricsBefore?: ProductMetrics; // snapshot pre-ejecucion
metricsAfter?: ProductMetrics; // rellenado por FeedbackMeasurer
impactScore?: number; // -100 a 100
impactClass?: 'positive' | 'neutral' | 'negative';
measuredAt?: Date;
retryCount: number; // intentos de medicion (max 3)
status: 'pending' | 'measured' | 'unmeasurable';
}
interface ProductMetrics {
visits7d: number;
sales7d: number;
conversionRate: number; // 0-1
searchPosition: number; // posicion promedio (lower = better)
capturedAt: Date;
}
interface ExplicitFeedbackEntry {
id: string; // ULID
userId: string;
executionId?: string; // puede ser feedback general de sesion
trigger: 'post_write' | 'post_reject' | 'post_session';
sentiment?: 'positive' | 'neutral' | 'negative';
rating?: number; // 1-5
reason?: string; // texto libre opcional
createdAt: Date;
sessionId: string;
}
interface ImplicitFeedbackEntry {
id: string; // ULID
userId: string;
sessionId: string;
skillProposed: string; // e.g. "update_product_content"
action: 'accepted' | 'rejected' | 'edited' | 'ignored';
context: {
category?: string;
marketplace?: 'meli' | 'amazon';
productId?: string;
};
timeToActionMs?: number;
createdAt: Date;
}
interface FeedbackThrottle {
userId: string;
sessionId: string;
promptsByType: Record<string, number>; // trigger → count en esta sesion
lastPromptAt?: Date;
consecutiveIgnores: number;
suppressed: boolean;
ttl: number; // TTL de 48h en DynamoDB
}
// ── IFeedbackGate ──
type FeedbackTrigger = 'post_write' | 'post_reject' | 'post_session';
interface GateResult {
shouldPrompt: boolean;
type?: FeedbackTrigger;
reason?: string; // logging interno
}
interface IFeedbackGate {
shouldPrompt(userId: string, sessionId: string, trigger: FeedbackTrigger): Promise<GateResult>;
recordIgnore(userId: string, sessionId: string): Promise<void>;
}
// Rules: max 1/type/session · cooldown 15min · suppress after 2 ignores
// backoff after 3 sessions with all-ignored
// ── Impact Score ──
function calculateImpactScore(before: ProductMetrics, after: ProductMetrics): ImpactResult;
// Weights: visits7d × 0.2, sales7d × 0.4, conversionRate × 0.3, searchPosition × 0.1 (inverted)
// Classification: > +20 → positive · -20 to +20 → neutral · < -20 → negative
// Range: clamped -100 to 100
// ── REST Endpoints ──
GET /feedback/:userId/summary // counts por clase + topWins
GET /feedback/:userId/history // lista paginada. ?productId=
GET /feedback/:userId/should-prompt // { shouldPrompt, type }. ?trigger=&sessionId=
POST /feedback/explicit // crea ExplicitFeedbackEntry
POST /feedback/implicit // crea ImplicitFeedbackEntry
GET /feedback/:userId/implicit/summary // { acceptanceRateBySkill, totalProposals, totalAccepted }
// ── FeedbackMeasurer (cron) ──
// EventBridge rate(6 hours) → FeedbackMeasurerHandler
// 1. Query entries with status: pending
// 2. Skip if executedAt < 7 days ago
// 3. Fetch current metrics from Data Sync (#10)
// 4. If metrics unavailable: retryCount++ (max 3 → status: unmeasurable)
// 5. If metrics available: calculateImpactScore → status: measured
// Table: core-feedback
// ┌───────────────────┬────────────────────┬──────────────────────┐
// │ Entity │ pk │ sk │
// ├───────────────────┼────────────────────┼──────────────────────┤
// │ FeedbackEntry │ User#{userId} │ Feedback#{ULID} │
// │ FeedbackThrottle │ User#{userId} │ Throttle#{sessionId} │ TTL 48h
// │ ExplicitFeedback │ User#{userId} │ Explicit#{ULID} │
// │ ImplicitFeedback │ User#{userId} │ Implicit#{ULID} │
// └───────────────────┴────────────────────┴──────────────────────┘
//
// GSI1: pk = status, sk = executedAt
// Usage: FeedbackMeasurer queries status='pending' ordered by executedAt
//
// conversation-api has IAM write-only on this table (PutItem only)
// FeedbackAPIHandler → Lambda → serves REST endpoints // FeedbackMeasurerHandler → Lambda → entry point for measurement cron // EventBridge Rule → Rule → rate(6 hours) → FeedbackMeasurerHandler // core-feedback → DynamoDB → table with GSI1 (status/executedAt) // IAM Grant → Policy → conversation-api Lambda → PutItem on core-feedback
core-quality-feedback/ ├── src/ │ ├── domain/ │ │ ├── interfaces/ │ │ │ ├── IFeedbackRepository.ts │ │ │ └── IFeedbackGate.ts │ │ └── models/ │ │ ├── FeedbackEntry.ts │ │ ├── ExplicitFeedbackEntry.ts │ │ ├── ImplicitFeedbackEntry.ts │ │ └── FeedbackThrottle.ts │ ├── application/ │ │ ├── FeedbackMeasurerService.ts │ │ ├── FeedbackGate.ts │ │ └── FeedbackSummaryService.ts │ └── infrastructure/ │ ├── repositories/ │ │ └── DynamoFeedbackRepository.ts │ └── lambda/ │ ├── FeedbackAPIHandler.ts │ └── FeedbackMeasurerHandler.ts ├── lib/ │ └── feedback-stack.ts └── test/
- [Ph2 MVP] FeedbackCapture (in conversation-api) writes raw FeedbackEntry with metricsBefore on every WRITE tool execution
- [Ph2 MVP] FeedbackMeasurer cron closes pending entries after 7 days with <1% unmeasurable rate
- [Ph2 MVP] GET /feedback/:userId/summary returns counts by impactClass + topWins
- [Ph2 MVP] GET /feedback/:userId/history returns paginated FeedbackEntry list with ?productId filter
- [Ph2 Full] FeedbackGate enforces: max 1 prompt/type/session, 15min cooldown, suppression after 2 consecutive ignores
- [Ph2 Full] GET /feedback/:userId/should-prompt returns correct gate decision for Shell
- [Ph2 Full] POST /feedback/explicit creates ExplicitFeedbackEntry with trigger, sentiment, rating
- [Ph2 Full] POST /feedback/implicit creates ImplicitFeedbackEntry with action and context
- [Ph2 Full] GET /feedback/:userId/implicit/summary returns acceptanceRateBySkill
- [Ph2 Full] FeedbackThrottle TTL 48h — auto-expires in DynamoDB
- [Ph2 Full] conversation-api has IAM write-only on core-feedback table — cannot read or manage
- [Ph2 MVP] FeedbackCapture (en conversation-api) escribe FeedbackEntry raw con metricsBefore en cada ejecucion de tool WRITE
- [Ph2 MVP] FeedbackMeasurer cron cierra entries pendientes tras 7 dias con <1% de tasa unmeasurable
- [Ph2 MVP] GET /feedback/:userId/summary retorna counts por impactClass + topWins
- [Ph2 MVP] GET /feedback/:userId/history retorna lista paginada de FeedbackEntry con filtro ?productId
- [Ph2 Full] FeedbackGate respeta: max 1 prompt/tipo/sesion, cooldown 15min, supresion tras 2 ignores consecutivos
- [Ph2 Full] GET /feedback/:userId/should-prompt retorna decision correcta del gate para la Shell
- [Ph2 Full] POST /feedback/explicit crea ExplicitFeedbackEntry con trigger, sentiment, rating
- [Ph2 Full] POST /feedback/implicit crea ImplicitFeedbackEntry con action y context
- [Ph2 Full] GET /feedback/:userId/implicit/summary retorna acceptanceRateBySkill
- [Ph2 Full] FeedbackThrottle TTL 48h — auto-expira en DynamoDB
- [Ph2 Full] conversation-api tiene IAM write-only sobre tabla core-feedback — no puede leer ni gestionar
Measurement delay: 7 days · Cron: every 6h · Weights: visits 0.2, sales 0.4, conv 0.3, position 0.1 (inverted) · Thresholds: ±20 · Retry: max 3 → unmeasurable · Gate: 1/type/session, 15min cooldown, suppress after 2 ignores · Table: single-table core-feedback with GSI1
How It WorksComo Funciona
Coach executes a WRITE tool (e.g. update_product_content)
↓
FeedbackCapture (in conversation-api)
hook before_tool: snapshot ProductMetrics
hook after_tool: write FeedbackEntry raw → DynamoDB (status: pending)
conversation-api has IAM write-only on the table
↓
... 7 days later ...
↓
FeedbackMeasurer (cron every 6h, in THIS project)
1. Query entries with status: pending, age >= 7d
2. Fetch current metrics from Data Sync (#10)
3. If metrics unavailable → retryCount++ (max 3 → unmeasurable)
4. Calculate impact score:
visits7d × 0.2 = +10.8
sales7d × 0.4 = +40.0
convRate × 0.3 = +9.0
searchPos × 0.1 = +4.7 (inverted: lower position = better)
─────────────────────────
impactScore = +64.5 → POSITIVE (>+20)
5. Update entry: metricsAfter, impactScore, impactClass, status: measured
↓
Shell queries GET /feedback/:userId/summary
→ shows seller: "+54% visits after title change on MLA123456"
↓
Shell queries GET /feedback/:userId/should-prompt
→ FeedbackGate checks throttle rules → { shouldPrompt: true, type: 'post_write' }
→ Shell renders inline feedback prompt
→ Seller responds → POST /feedback/explicit
The Coach's job is to respond in real time. Measuring impact happens days later. The logic for measurement windows, marketplace delay retries (24-48h), weighted scoring, and anti-fatigue has no place in the request path. Once the Coach is in production, adding or changing feedback logic must not touch the conversation loop — they are two distinct lifecycles: the loop converses in milliseconds; feedback measures in days.El trabajo del Coach es responder en tiempo real. Medir impacto ocurre dias despues. La logica de ventanas de medicion, reintentos por delay del marketplace (24-48h), scoring ponderado, y anti-fatiga no tiene lugar en el path del request. Una vez que el Coach esta en produccion, agregar o cambiar la logica de feedback no debe tocar el loop de conversacion — son dos ciclos de vida distintos: el loop conversa en milisegundos; el feedback mide en dias.
Implementation PlanPlan de Implementacion
Phase 2 MVP — Scheduled Post-Core BuildFase 2 MVP — Programado Post-Construccion Core
This project is scheduled for Phase 2 of the MVP build. FeedbackCapture lives in conversation-api as a hook — not in this project. Full implementation begins after the 10-week core build when there is real user data to measure against.Este proyecto esta programado para la Fase 2 de la construccion del MVP. FeedbackCapture vive en conversation-api como hook — no en este proyecto. La implementacion completa comienza despues de la construccion core de 10 semanas cuando haya datos reales de usuarios para medir.
Phase 2 MVP: Capture + Measurement (Post-MVP Week 1-2)Fase 2 MVP: Captura + Medicion (Post-MVP Semana 1-2)
FeedbackCapture hook in conversation-api (before_tool / after_tool). FeedbackMeasurerService + Lambda + EventBridge cron. GET /feedback/:userId/summary and GET /feedback/:userId/history endpoints.Hook FeedbackCapture en conversation-api (before_tool / after_tool). FeedbackMeasurerService + Lambda + EventBridge cron. Endpoints GET /feedback/:userId/summary y GET /feedback/:userId/history.
Phase 2 Full: Gate + Feedback Collection (Post-MVP Week 3-4)Fase 2 Full: Gate + Recopilacion de Feedback (Post-MVP Semana 3-4)
FeedbackGate + FeedbackThrottle in DynamoDB. GET /should-prompt endpoint. POST /feedback/explicit and /implicit endpoints. GET /implicit/summary endpoint. Shell integration — Shell queries gate, renders prompt, posts response.FeedbackGate + FeedbackThrottle en DynamoDB. Endpoint GET /should-prompt. Endpoints POST /feedback/explicit e /implicit. Endpoint GET /implicit/summary. Integracion con Shell — Shell consulta gate, renderiza prompt, envia respuesta.
Phase 3: FeedbackLearner (Post-MVP Week 4+)Fase 3: FeedbackLearner (Post-MVP Semana 4+)
Reads FeedbackEntry with impactClass: positive and triggers KB pipeline in core-knowledge-semantic-base (#9). Does NOT write to KB directly — fires external pipeline. Requires sufficient measured data volume to avoid contaminating KB with noisy signal.Lee FeedbackEntry con impactClass: positive y dispara pipeline de KB en core-knowledge-semantic-base (#9). NO escribe en KB directamente — dispara el pipeline externo. Requiere volumen suficiente de datos medidos para evitar contaminar KB con senal ruidosa.
Risk AnalysisAnalisis de Riesgos
Confounding FactorsFactores Confundidores
Impact: HImpacto: A
Mitigation: A title change may coincide with a competitor going out of stock or a seasonal surge. The system reports correlation, not causation. Users see "after changing the title, visits increased +54%" — not "your change caused +54%".Mitigacion: Un cambio de titulo puede coincidir con un competidor sin stock o un auge estacional. El sistema reporta correlacion, no causalidad. El usuario ve "despues de cambiar el titulo, las visitas subieron +54%" — no "tu cambio causo +54%".
7-Day Measurement WindowVentana de Medicion de 7 Dias
Impact: MImpacto: M
Mitigation: 7 days may be too short for SEO changes (2-4 weeks) and too long for price changes (24h). Keep 7 days as default. Future: configurable per skill type.Mitigacion: 7 dias puede ser muy corto para cambios SEO (2-4 semanas) y muy largo para cambios de precio (24h). Mantener 7 dias como default. Futuro: configurable por tipo de skill.
False AttributionAtribucion Falsa
Impact: HImpacto: A
Mitigation: Multiple concurrent changes on the same product make attribution impossible. Track concurrent FeedbackEntry per productId and flag in impact report. Never claim causation.Mitigacion: Multiples cambios concurrentes en el mismo producto hacen la atribucion imposible. Rastrear FeedbackEntry concurrentes por productId y marcar en el reporte de impacto. Nunca reclamar causalidad.
Delayed Metrics AvailabilityDisponibilidad Retrasada de Metricas
Impact: MImpacto: M
Mitigation: Marketplace APIs report metrics with 24-48h delay. FeedbackMeasurer retries up to 3 times (retryCount). After 3 failed attempts: status → unmeasurable. Does not block the pipeline.Mitigacion: APIs de marketplace reportan metricas con retraso de 24-48h. FeedbackMeasurer reintenta hasta 3 veces (retryCount). Tras 3 intentos fallidos: status → unmeasurable. No bloquea el pipeline.
Key DecisionsDecisiones Clave
Separated from conversation-api — The Coach emits a raw signal and doesn't know what happens with it. Measurement, scoring, anti-fatigue, and learning logic must not be in the request path. Two distinct lifecycles: the loop converses in milliseconds; feedback measures in days.Separado de conversation-api — El Coach emite una senal cruda y no sabe que pasa con ella. La logica de medicion, scoring, anti-fatiga y aprendizaje no debe estar en el path del request. Dos ciclos de vida distintos: el loop conversa en milisegundos; el feedback mide en dias.
FeedbackGate lives here, Shell queries it — The Shell has no anti-fatigue logic. The gate is this project's responsibility — it owns the complete state of prompts per session and ignore history.FeedbackGate vive aqui, la Shell lo consulta — La Shell no tiene logica de anti-fatiga. El gate es responsabilidad de este proyecto — tiene el estado completo de los prompts por sesion y el historial de ignores.
Correlation, not causation — "After changing the title, visits increased +54%" — not "your change caused the increase". The system cannot control other variables affecting the product at the same time.Correlacion, no causalidad — "Despues de cambiar el titulo, las visitas subieron +54%" — no "tu cambio causo el aumento". El sistema no puede controlar otras variables que afectan el producto al mismo tiempo.
FeedbackLearner deferred to Phase 3 — Automating KB updates requires real data at scale. Until there are enough measured FeedbackEntry with impactClass: positive, the learner is premature and risks contaminating the KB with noisy signals.FeedbackLearner diferido a Fase 3 — Automatizar actualizaciones en la KB requiere datos reales a escala. Hasta tener suficientes FeedbackEntry medidos con impactClass: positive, el learner es prematuro y arriesga contaminar la KB con senales ruidosas.
MVP Scope
Phase 2 MVP. FeedbackCapture hook in conversation-api writes raw entries. This project measures, scores, and exposes results via REST. Fase 2 MVP. Hook FeedbackCapture en conversation-api escribe entries crudos. Este proyecto mide, puntua, y expone resultados via REST.
Inspired byInspirado en
A/B testing frameworks, Shopilot Data Sync pipeline Frameworks de A/B testing, pipeline Data Sync de Shopilot
📝 Project ChangelogChangelog del Proyecto
Eval Suite
Quality — Pablo
core-quality-stack-evaluation is the quality evaluation platform for every project in the stack. It runs automated suites on every PR (Coach, Shell), on schedule (Figma), and on demand. It evaluates Coach response quality via an LLM Judge, validates API contracts between projects, checks KB chunk retrievability, validates Electron builds for macOS and Windows (compilation, signing, notarization, startup, bundle size), and audits Design System Figma files against 15 quality checks. It also runs the api_monitor pipeline: daily checks against marketplace API changelogs + canary tests against live endpoints. It never runs in production — its role is to block merges that introduce regressions, validate that desktop builds are distributable, and ensure Figma is MCP-compatible before implementation.
core-quality-stack-evaluation es la plataforma de evaluación de calidad para todos los proyectos del stack. Ejecuta suites de evaluación automáticas en cada PR (Coach, Shell), en schedule (Figma), y bajo demanda. Evalúa la calidad de respuestas del Coach via un LLM Judge, valida contratos de API entre proyectos, chequea la recuperabilidad de chunks de KB, valida builds de Electron para macOS y Windows (compilación, firma, notarización, arranque, tamaño del bundle), y audita los archivos Figma del Design System contra 15 checks de calidad. También ejecuta el pipeline api_monitor: chequeos diarios contra changelogs de APIs de marketplaces + canary tests contra endpoints en vivo. Nunca corre en producción — su rol es bloquear merges que introducen regresiones, validar que los builds de escritorio son distribuibles, y asegurar que el Figma es MCP-compatible antes de implementar.
Beautonomous governance: Eval Suite is the quality gate that validates Core-governed changes before they reach production — it blocks merges that introduce regressions in ConfirmationFlow enforcement, permission matrix adherence, or governance rule compliance across all projects in the stack. Desktop builds are validated as distributable artifacts; Figma is validated as MCP-compatible input for the design-to-code pipeline.Governance de Beautonomous: Eval Suite es el gate de calidad que valida los cambios gobernados por Core antes de que lleguen a producción — bloquea merges que introducen regresiones en la aplicación del ConfirmationFlow, la adherencia a la matriz de permisos, o el cumplimiento de las reglas de governance en todos los proyectos del stack. Los builds de escritorio se validan como artefactos distribuibles; el Figma se valida como input MCP-compatible para el pipeline design-to-code.
What this project does NOT doLo que este proyecto NO hace
7 Evaluation Pipelines7 Pipelines de Evaluación
Coach response quality vs golden dataset. Judge LLM (Haiku/Sonnet) scores relevance, accuracy, tone, actionabilityCalidad de respuesta del Coach vs golden dataset. Judge LLM (Haiku/Sonnet) puntúa relevancia, precisión, tono, accionabilidad
API contracts between projects. Consumer-driven: Tool Registry defines what it expects from Data Sync. Provider can’t break consumer without knowingContratos de API entre proyectos. Consumer-driven: Tool Registry define qué espera de Data Sync. El proveedor no puede romper al consumidor sin saberlo
Are KB chunks relevant and retrievable for expected queries? Detects knowledge gaps before they’re visible in production¿Los chunks de KB son relevantes y recuperables para queries esperadas? Detecta huecos de conocimiento antes de que sean visibles en producción
Full Shell flows: proposal → confirmation → action → response. Cross-project regressions in a single reportFlujos completos de Shell: propuesta → confirmación → acción → respuesta. Regresiones cross-proyecto en un solo reporte
Electron builds for macOS (arm64+x64) and Windows (x64). 11 checks: compilation, code signing, notarization, app startup, bundle size, native modules, auto-updater, deep links, window rendering, IPC channels. Runs on native runners per platformBuilds de Electron para macOS (arm64+x64) y Windows (x64). 11 checks: compilación, firma de código, notarización, arranque, tamaño del bundle, módulos nativos, auto-updater, deep links, renderizado de ventana, canales IPC. Corre en runners nativos por plataforma
15 automated checks against Design System (#18) requirements via Figma REST API: variable architecture, Code Syntax, Auto Layout, naming, states, color hardcoding, spacing, semantic aliasing, Light/Dark modes, WCAG contrast, MCP compatibility. Scheduled weekly + on-demand. Blocks implementation, not merges15 checks automáticos contra requisitos del Design System (#18) via Figma REST API: arquitectura de variables, Code Syntax, Auto Layout, naming, states, color hardcodeado, spacing, aliasing semántico, modos Light/Dark, contraste WCAG, compatibilidad MCP. Semanal + bajo demanda. Bloquea implementación, no merges
Daily checks against marketplace API changelogs (MeLi, Amazon SP-API, Shopify) + canary tests against live endpoints. When a breaking change or new capability is detected, creates a Linear issue tagged
api-change with the affected adapter and recommended action. Runs on a cron schedule — not triggered by code changesChequeos diarios contra changelogs de APIs de marketplaces (MeLi, Amazon SP-API, Shopify) + canary tests contra endpoints en vivo. Cuando detecta un cambio incompatible o nueva capacidad, crea un issue en Linear con tag api-change con el adaptador afectado y la acción recomendada. Corre en schedule cron — no se dispara por cambios de código
Orchestrates pipelines — runs configured suite against targetOrquesta pipelines — corre suite configurada contra target
External evaluator (Haiku/Sonnet) — scores relevance, accuracy, tone, actionabilityEvaluador externo (Haiku/Sonnet) — puntua relevancia, precision, tono, accionabilidad
Consumer-driven contract validation between projectsValidacion de contratos consumer-driven entre proyectos
EvalReport with per-case scores + regression delta vs baselineEvalReport con scores por caso + delta de regresion vs baseline
Specialized runner for llm_judge pipelineRunner especializado para pipeline llm_judge
Validates chunk retrievability for expected queriesValida recuperabilidad de chunks para queries esperadas
11 checks per platform — compilation, signing, startup, bundle, IPC11 checks por plataforma — compilación, firma, arranque, bundle, IPC
15 checks against DS requirements — variables, Auto Layout, WCAG, MCP15 checks contra requisitos del DS — variables, Auto Layout, WCAG, MCP
Reads Figma files via REST API — nodes, variables, components, stylesLee archivos Figma via API REST — nodos, variables, componentes, estilos
Sandbox Isolation — How it evaluates without affecting productionAislamiento Sandbox — Como evalua sin afectar produccion
conversation-api Lambda directly (not via HTTP) in a sandboxed staging environment. Uses a fixed snapshot of KB and brand health (reproducible between runs).Invoca Lambda de conversation-api directamente (no via HTTP) en un entorno de staging sandboxed. Usa snapshot fijo de KB y brand health (reproducible entre runs).Tech Stack (TypeScript — CI/CD Tooling)Stack Tecnologico (TypeScript — Tooling CI/CD)
Data Models, Interfaces & Acceptance Criteria Modelos de Datos, Interfaces & Criterios de Aceptación
interface IEvalPipeline {
run(config: EvalConfig): Promise<EvalReport>;
}
interface EvalConfig {
projectId: string;
pipelineType: 'llm_judge' | 'contract' | 'kb_quality' | 'e2e' | 'desktop_build' | 'figma_quality';
datasetId: string;
blockOnFailure: boolean;
thresholds: EvalThresholds;
}
interface EvalThresholds {
minPassRate: number; // 0-1, e.g. 0.85
maxRegressionDelta: number; // e.g. -0.05 = no more than 5% regression vs baseline
}
interface EvalReport {
pipelineId: string;
projectId: string;
passRate: number;
cases: EvalCase[];
regressionDelta?: number;
blocksDeployment: boolean;
generatedAt: Date;
}
interface EvalCase {
id: string;
input: unknown;
expectedOutput: unknown;
actualOutput: unknown;
score: number; // 0-1
passed: boolean;
judgeRationale?: string; // Judge LLM explanation
}
interface GoldenDataset {
id: string;
projectId: string;
version: string;
cases: GoldenCase[];
}
interface GoldenCase {
id: string;
description: string;
input: unknown;
expectedOutput: unknown;
evaluationCriteria: string[]; // passed to Judge LLM as scoring rubric
tags: string[]; // e.g. ['write_tool', 'meli', 'high_priority']
}
interface LLMJudgeScore {
relevance: number; // 0-1: does it answer the question?
accuracy: number; // 0-1: is the information correct?
tone: number; // 0-1: matches Personality Engine?
actionability: number; // 0-1: can the seller act on this?
overall: number; // weighted: relevance 0.3 · accuracy 0.4 · tone 0.15 · actionability 0.15
}
// Judge uses Claude Haiku for most cases. Claude Sonnet for cases tagged 'critical'.
interface ContractTest {
consumer: string; // e.g. 'tool-registry'
provider: string; // e.g. 'data-sync'
endpoint: string; // e.g. 'GET /products/:id/metrics'
requestSchema: JSONSchema;
responseSchema: JSONSchema;
slaMs: number; // max expected response time
}
// Consumer-driven: the consumer defines what it expects, not the provider.
// If Data Sync changes its response schema and removes a field that
// Tool Registry uses, the contract fails BEFORE deploy.
// Contracts: Tool Registry ↔ Data Sync, Tool Registry ↔ Enrichment
interface DesktopBuildConfig {
platforms: ('darwin' | 'win32')[];
arch: ('x64' | 'arm64')[];
checks: DesktopBuildCheck[];
maxBundleSizeMB: number; // e.g. 250
maxStartupMs: number; // e.g. 5000
}
type DesktopBuildCheck =
| 'compilation' // build completes without errors
| 'code_signing' // binary is signed (codesign / Authenticode)
| 'notarization' // Apple notarization passes (macOS only)
| 'app_startup' // app starts without crash in <5s
| 'bundle_size' // artifact < maxBundleSizeMB
| 'native_modules' // keytar, better-sqlite3, etc. load correctly
| 'auto_updater' // update feed URL resolves
| 'deep_links' // shopilot:// protocol registered
| 'window_rendering' // WebContentsView loads without critical errors
| 'ipc_channels' // registered IPC channels respond to ping
interface DesktopBuildReport extends EvalReport {
platform: 'darwin' | 'win32';
arch: 'x64' | 'arm64';
checks: DesktopCheckResult[];
bundleSizeMB: number;
bundleSizeDeltaMB: number; // vs baseline
startupTimeMs: number;
}
// Blocks merge if: compilation fails, signing fails, notarization fails,
// app crashes on startup, bundle > 250MB, native modules fail to load,
// window rendering has critical errors, IPC channels don't respond.
// Warning only (release branches): auto_updater, deep_links.
interface FigmaQualityConfig {
fileKeys: FigmaFileKey[];
checks: FigmaQualityCheck[];
minComplianceRate: number; // 0-1, e.g. 0.95
}
type FigmaQualityCheck =
| 'variable_architecture' // 3 collections (Primitives, Semantic, Component)
| 'code_syntax' // all variables have Code Syntax (Web) configured
| 'auto_layout' // all components use Auto Layout
| 'naming_convention' // slash naming, no generic names (Frame 1, Group)
| 'states_coverage' // all interactive states present per type
| 'color_hardcoding' // no hardcoded hex in components
| 'spacing_hardcoding' // no hardcoded spacing values
| 'semantic_aliasing' // Semantic tokens alias Primitives
| 'light_dark_modes' // Semantic has Light + Dark modes
| 'component_properties' // Component Properties used to reduce variants
| 'descriptions' // published components have descriptions
| 'cover_pages' // each file has cover page
| 'base_components_hidden' // . or _ prefix components are hidden
| 'wcag_contrast' // WCAG AA contrast verified
| 'mcp_compatibility' // semantic names in all layers
interface FigmaQualityReport extends EvalReport {
files: FigmaFileReport[];
overallComplianceRate: number;
criticalViolations: FigmaViolation[];
warnings: FigmaViolation[];
}
interface FigmaViolation {
check: FigmaQualityCheck;
severity: 'critical' | 'warning';
componentName?: string;
nodeName?: string;
detail: string;
suggestion: string; // e.g. "Bind fill to variable color/interactive/primary"
}
// Critical (blocks implementation): variable_architecture, code_syntax,
// auto_layout, color_hardcoding, naming_convention, states_coverage,
// light_dark_modes, wcag_contrast, mcp_compatibility.
// Warning (no block): spacing_hardcoding, semantic_aliasing,
// component_properties, descriptions, cover_pages, base_components_hidden.
// Reads Figma via REST API, NOT MCP. MCP is for interactive agent use.
// Scheduled weekly + on-demand. Does not block merges — blocks implementation.
Eres un evaluador de calidad de un agente conversacional para vendedores de marketplace.
Query del usuario: {query}
Contexto recuperado (KB + tool results): {context}
Respuesta del Coach: {response}
Evalua la respuesta contra los siguientes criterios:
{evaluationCriteria}
Para cada criterio, asigna un score de 0 a 1 y explica brevemente por que.
Responde con JSON:
{
"scores": {
"relevance": 0.0,
"accuracy": 0.0,
"tone": 0.0,
"actionability": 0.0
},
"judgeRationale": "..."
}
# .github/workflows/eval-on-pr.yml
name: Eval Suite
on:
pull_request:
branches: [main, develop]
jobs:
eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run eval suite
run: npm run eval -- --project=coach --dataset=v2
env:
STAGING_ENDPOINT: ${{ secrets.STAGING_ENDPOINT }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- name: Check threshold
run: npm run eval:check
# Fails job if EvalReport.blocksDeployment === true
- name: Post report to PR
uses: actions/github-script@v7
# Flow:
# 1. GitHub Actions triggers EvalRunner on PR
# 2. EvalRunner runs configured pipelines against new version
# 3. Compares passRate with stored baseline
# 4. If regressionDelta < maxRegressionDelta → blocksDeployment: true → blocks merge
# 5. Reports stored in S3 or DynamoDB for historical comparison
# .github/workflows/desktop-build-eval.yml
name: Desktop Build Eval
on:
pull_request:
paths: ['core-product-desktop-client/**']
branches: [main, develop]
jobs:
build-macos:
runs-on: macos-14 # Apple Silicon runner
steps:
- uses: actions/checkout@v4
- run: npm ci && npm run build:mac # arm64 + x64
env:
CSC_LINK: ${{ secrets.MAC_CERTIFICATE }}
APPLE_ID: ${{ secrets.APPLE_ID }}
- run: npm run eval -- --pipeline=desktop_build --platform=darwin
build-windows:
runs-on: windows-latest
steps:
- uses: actions/checkout@v4
- run: npm ci && npm run build:win # x64
env:
WIN_CSC_LINK: ${{ secrets.WIN_CERTIFICATE }}
- run: npm run eval -- --pipeline=desktop_build --platform=win32
report:
needs: [build-macos, build-windows]
runs-on: ubuntu-latest
# Aggregates reports from both platforms, posts to PR
# .github/workflows/figma-quality-eval.yml
name: Figma Quality Eval
on:
workflow_dispatch: # manual trigger
schedule:
- cron: '0 8 * * 1' # every Monday 8:00 UTC
jobs:
figma-eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm run eval -- --pipeline=figma_quality
env:
FIGMA_ACCESS_TOKEN: ${{ secrets.FIGMA_ACCESS_TOKEN }}
# Posts report to Slack #engineering or as GitHub issue
# Does NOT block merges — blocks implementation.
# Figma has no PRs/webhooks. Gate is pre-implementation, not pre-merge.
- [Ph 1] Golden dataset has 20-30 curated cases covering fees, metrics, scope, and tool activation
- [Ph 1] CoachEvalRunner + AnthropicLLMJudge score each response with relevance (0.3), accuracy (0.4), tone (0.15), actionability (0.15)
- [Ph 1] CI gate blocks merge when passRate < minPassRate OR regressionDelta exceeds maxRegressionDelta
- [Ph 1] EvalReport published as PR comment with per-case scores and judge rationale
- [Ph 1] Baseline stored for regression comparison across runs
- [Ph 2] ContractEvalRunner validates Tool Registry ↔ Data Sync contract (request/response schemas + SLA)
- [Ph 2] ContractEvalRunner validates Tool Registry ↔ Enrichment contract
- [Ph 2] Contract tests integrated in CI/CD of core-knowledge-data-synchronizator and core-knowledge-enrichment
- [Ph 3] KBQualityRunner validates chunk retrievability for expected queries
- [Ph 3] E2E flows: proposal → confirmation → action → response validated end-to-end
- [Ph 3] Multi-project regression suite: cross-project regressions in a single report
- [Ph 4] DesktopBuildRunner validates macOS (arm64+x64) and Windows (x64) builds with 11 checks per platform
- [Ph 4] Code signing verified: macOS codesign + Apple notarization, Windows Authenticode
- [Ph 4] App startup validated <5s, bundle size <250MB, native modules load, IPC channels respond
- [Ph 4] Desktop build eval runs on native runners (macos-14 + windows-latest) only for PRs touching desktop-client
- [Ph 5] FigmaQualityRunner validates 15 checks against Design System requirements via Figma REST API
- [Ph 5] Critical checks (9) block implementation: variable_architecture, code_syntax, auto_layout, color_hardcoding, naming, states, light_dark, wcag_contrast, mcp_compatibility
- [Ph 5] Figma eval runs weekly on schedule + on-demand. Reporte identifies component, check, node, and suggestion per violation
- [Ph 1] Golden dataset tiene 20-30 casos curados cubriendo fees, métricas, scope y activación de tools
- [Ph 1] CoachEvalRunner + AnthropicLLMJudge puntúan cada respuesta con relevancia (0.3), precisión (0.4), tono (0.15), accionabilidad (0.15)
- [Ph 1] Gate CI bloquea merge cuando passRate < minPassRate O regressionDelta excede maxRegressionDelta
- [Ph 1] EvalReport publicado como comentario en PR con scores por caso y rationale del judge
- [Ph 1] Baseline almacenado para comparación de regresiones entre runs
- [Ph 2] ContractEvalRunner valida contrato Tool Registry ↔ Data Sync (schemas request/response + SLA)
- [Ph 2] ContractEvalRunner valida contrato Tool Registry ↔ Enrichment
- [Ph 2] Contract tests integrados en CI/CD de core-knowledge-data-synchronizator y core-knowledge-enrichment
- [Ph 3] KBQualityRunner valida recuperabilidad de chunks para queries esperadas
- [Ph 3] Flujos E2E: propuesta → confirmación → acción → respuesta validados end-to-end
- [Ph 3] Suite de regresión multi-proyecto: regresiones cross-proyecto en un solo reporte
- [Ph 4] DesktopBuildRunner valida builds macOS (arm64+x64) y Windows (x64) con 11 checks por plataforma
- [Ph 4] Firma de código verificada: macOS codesign + notarización Apple, Windows Authenticode
- [Ph 4] Arranque <5s, bundle <250MB, módulos nativos cargan, canales IPC responden
- [Ph 4] Desktop build eval corre en runners nativos (macos-14 + windows-latest) solo para PRs que tocan desktop-client
- [Ph 5] FigmaQualityRunner valida 15 checks contra requisitos del Design System via Figma REST API
- [Ph 5] Checks críticos (9) bloquean implementación: variable_architecture, code_syntax, auto_layout, color_hardcoding, naming, states, light_dark, wcag_contrast, mcp_compatibility
- [Ph 5] Figma eval corre semanalmente en schedule + bajo demanda. Reporte identifica componente, check, nodo, y sugerencia por violación
CI/CD only · Not runtime · 7 pipelines (llm_judge, contract, kb_quality, e2e, desktop_build, figma_quality, api_monitor) · Judge weights: relevance 0.3, accuracy 0.4, tone 0.15, actionability 0.15 · Consumer-driven contracts · 11 desktop checks (native runners) · 15 Figma checks (REST API, weekly) · Baseline regression tracking
How It WorksComo Funciona
Developer opens a PR to conversation-api
↓
GitHub Actions triggers eval suite
↓
EvalRunner runs 30 golden cases against new Coach version
Judge LLM scores each response: relevance · accuracy · tone · actionability
↓
Compares with baseline of previous version
↓
+2% improvement vs baseline
→ merge allowed, report published as PR comment
-7% regression vs baseline
→ merge blocked, report shows which cases failed and why
───────────────────────────────────────
PIPELINE: contract (Phase 2)
Tool Registry defines: "I expect ProductMetrics with visits7d, sales7d, conversionRate"
Data Sync changes response → contract test fails BEFORE deploy
→ Provider can't break consumer without knowing
PIPELINE: kb_quality (Phase 3)
Query: "fees for Electronics in MercadoLibre"
→ KBQualityRunner checks: are there retrievable chunks covering this?
→ If not → knowledge gap detected before production
—————————————————————
PIPELINE: desktop_build (Phase 4)
Developer opens PR to core-product-desktop-client
↓
Two jobs in parallel:
→ macOS runner: build arm64+x64, code sign, notarize, 11 checks
→ Windows runner: build x64, Authenticode sign, 11 checks
↓
Each job validates: compilation, signing, startup <5s, bundle <250MB,
native modules, IPC channels, window rendering, deep links
↓
All green → merge allowed
Something red → merge blocked + report with failed check and platform
—————————————————————
PIPELINE: figma_quality (Phase 5)
UX/UI team publishes a library in Figma
↓
Trigger: manual or weekly cron (every Monday 8:00 UTC)
↓
FigmaQualityRunner reads files via Figma REST API
↓
15 automated checks:
variables ✓ | Code Syntax ✓ | Auto Layout ✓ | naming ✓ | states ✓
color hardcoding ✓ | spacing ✓ | semantic aliasing ✓ | Light/Dark ✓
WCAG contrast ✓ | MCP compatibility ✓ | descriptions | covers | hidden
↓
95%+ compliance → implementation allowed
Critical violations → agent does NOT implement until fixed
e.g. "Button/Primary/Default: fill #3B82F6 is hardcoded hex
→ bind to variable color/interactive/primary"
The Eval Suite runs entirely in CI/CD — it never touches production. It evaluates the Coach, validates API contracts, checks KB quality, validates desktop builds as distributable artifacts, and audits Figma for MCP compatibility. Its users are PRs, CI/CD pipelines, and internal quality reports — not sellers. It cannot live inside any project it evaluates — it needs to be above them, without depending on their release cycle or internal architecture.El Eval Suite corre enteramente en CI/CD — nunca toca producción. Evalúa el Coach, valida contratos de API, chequea calidad de KB, valida builds de escritorio como artefactos distribuibles, y audita el Figma para compatibilidad MCP. Sus usuarios son PRs, pipelines de CI/CD, y reportes de calidad internos — no vendedores. No puede vivir dentro de ningún proyecto que evalúa — necesita estar por encima de ellos, sin depender de su ciclo de release ni de su arquitectura interna.
Implementation Plan (5 Phases)Plan de Implementación (5 Fases)
Phase 1: LLM Judge for the CoachFase 1: LLM Judge para el Coach
Golden dataset v1: 20-30 cases (fees, sales metrics, scope, tool activation). CoachEvalRunner + AnthropicLLMJudge. CI/CD integration in conversation-api as merge gate. Baseline stored for regression comparison.Golden dataset v1: 20-30 casos (fees, métricas de ventas, scope, activación de tools). CoachEvalRunner + AnthropicLLMJudge. Integración CI/CD en conversation-api como gate de merge. Baseline almacenado para comparación de regresiones.
Phase 2: Contract TestingFase 2: Contract Testing
Contracts Tool Registry ↔ Data Sync. Contracts Tool Registry ↔ Enrichment. ContractEvalRunner + schemas in datasets/contracts/. CI/CD integrated in core-knowledge-data-synchronizator and core-knowledge-enrichment.Contratos Tool Registry ↔ Data Sync. Contratos Tool Registry ↔ Enrichment. ContractEvalRunner + schemas en datasets/contracts/. CI/CD integrado en core-knowledge-data-synchronizator y core-knowledge-enrichment.
Phase 3: KB Quality + E2E ShellFase 3: KB Quality + E2E Shell
KBQualityRunner: validates KB chunks are retrievable for expected queries. E2E Shell flows: proposal → confirmation → action → response. Multi-project regression suite: cross-project regressions in a single report.KBQualityRunner: valida que chunks de KB son recuperables para queries esperadas. Flujos E2E Shell: propuesta → confirmación → acción → respuesta. Suite de regresión multi-proyecto: regresiones cross-proyecto en un solo reporte.
Phase 4: Desktop Build EvalFase 4: Eval de Builds de Escritorio
DesktopBuildRunner for macOS (arm64+x64) and Windows (x64). GitHub Actions with native runners per platform. 11 checks: compilation, code signing, notarization, app startup, bundle size, native modules, auto-updater, deep links, window rendering, IPC channels. Merge gate for PRs to core-product-desktop-client.DesktopBuildRunner para macOS (arm64+x64) y Windows (x64). GitHub Actions con runners nativos por plataforma. 11 checks: compilación, firma de código, notarización, arranque, tamaño del bundle, módulos nativos, auto-updater, deep links, renderizado de ventana, canales IPC. Gate de merge para PRs a core-product-desktop-client.
Phase 5: Figma Quality EvalFase 5: Eval de Calidad del Figma
FigmaQualityRunner + FigmaRESTClient. 15 checks against Design System requirements (doc 72). Scheduled weekly + on-demand + pre-implementation gate. Report published to Slack #engineering or as GitHub issue. Each violation identifies: component, check, node, and actionable suggestion.FigmaQualityRunner + FigmaRESTClient. 15 checks contra requisitos del Design System (doc 72). Semanal + bajo demanda + gate de pre-implementación. Reporte publicado en Slack #engineering o como issue de GitHub. Cada violación identifica: componente, check, nodo, y sugerencia accionable.
Risk AnalysisAnalisis de Riesgos
Judge LLM inconsistent scoringPuntuacion inconsistente del Judge LLM
Impact: MImpacto: M
Mitigation: criteria are specific and verifiable, not subjective. "Response includes exact fee percentage" is more stable than "response is useful". For critical cases, Claude Sonnet provides higher consistency.Mitigacion: criterios son especificos y verificables, no subjetivos. "La respuesta incluye el porcentaje exacto del fee" es mas estable que "la respuesta es util". Para casos criticos, Claude Sonnet provee mayor consistencia.
Golden dataset becomes staleGolden dataset se vuelve stale
Impact: MImpacto: M
Mitigation: golden datasets are versioned code — PRs go through review. Every feature PR includes golden cases, every bug fix becomes a permanent regression test. Quality depends on dataset quality, not volume.Mitigacion: golden datasets son codigo versionado — los PRs pasan por review. Cada PR de feature incluye golden cases, cada bug fix se convierte en test de regresion permanente. La calidad depende de la calidad del dataset, no del volumen.
Eval suite too slow for CIEval suite demasiado lento para CI
Impact: LImpacto: B
Mitigation: parallelism in EvalRunner (10 concurrent). Lightweight Judge (Claude Haiku). Initial dataset small (20-30 cases). Target: <5 minutes per pipeline.Mitigacion: paralelismo en EvalRunner (10 concurrentes). Judge ligero (Claude Haiku). Dataset inicial pequeno (20-30 casos). Objetivo: <5 minutos por pipeline.
Contract schema driftDrift de schemas de contrato
Impact: MImpacto: M
Mitigation: consumer-driven — the consumer defines expectations. If the provider changes its response, the contract fails in the provider’s CI. The provider must update the contract explicitly.Mitigación: consumer-driven — el consumidor define expectativas. Si el proveedor cambia su respuesta, el contrato falla en el CI del proveedor. El proveedor debe actualizar el contrato explícitamente.
macOS runner costCosto de runner macOS
Impact: MImpacto: M
Mitigation: macOS runners are ~10x more expensive than Linux. Desktop build eval only triggers on PRs that touch core-product-desktop-client — not on every PR. Cost estimated at ~$5-15/month with typical PR volume.Mitigación: runners macOS son ~10x más caros que Linux. Desktop build eval solo se dispara en PRs que tocan core-product-desktop-client — no en cada PR. Costo estimado ~$5-15/mes con el volumen típico de PRs.
Figma API rate limitsRate limits de la API de Figma
Impact: LImpacto: B
Mitigation: Figma REST API has generous rate limits for reading files. Weekly schedule + on-demand keeps request volume low. Personal Access Token stored as GitHub Actions secret.Mitigación: la API REST de Figma tiene rate limits generosos para lectura de archivos. Schedule semanal + bajo demanda mantiene el volumen de requests bajo. Personal Access Token almacenado como secret de GitHub Actions.
Key DecisionsDecisiones Clave
Does not live in conversation-api — Today it evaluates the Coach; tomorrow it evaluates the entire stack. A project that evaluates multiple projects cannot live inside one of them. It has its own lifecycle, golden dataset, and CI infrastructure.No vive en conversation-api — Hoy evalua el Coach; manana evalua todo el stack. Un proyecto que evalua multiples proyectos no puede vivir dentro de uno de ellos. Tiene su propio ciclo de vida, golden dataset, e infraestructura CI.
LLM-as-judge, not rules — Scoring rules break when the Coach evolves. The Judge LLM evaluates semantic quality — relevance, accuracy, tone, actionability — like a human reviewer would. The golden dataset criteria are the rubric; the Judge applies judgment.LLM-as-judge, no reglas — Las reglas de scoring se rompen cuando el Coach evoluciona. El Judge LLM evalua calidad semantica — relevancia, precision, tono, accionabilidad — igual que lo haria un revisor humano. Los criterios del golden dataset son la rubric; el Judge aplica criterio.
Golden datasets are versioned code — Evaluation cases live in the repo as JSON/YAML files. PRs to datasets go through review just like code. Eval quality depends on dataset quality, not volume.Golden datasets son codigo versionado — Los casos de evaluacion viven en el repo como archivos JSON/YAML. Los PRs a los datasets pasan por review igual que el codigo. La calidad del eval depende de la calidad del dataset, no de su volumen.
Consumer-driven contracts — The consumer (Tool Registry) defines what it expects from the provider (Data Sync). Not the other way around. If the provider changes its contract and breaks the consumer, the test fails before the provider’s deploy. This inverts responsibility: whoever changes must prove they didn’t break anyone.Contratos consumer-driven — El consumidor (Tool Registry) define qué espera del proveedor (Data Sync). No al revés. Si el proveedor cambia su contrato y rompe al consumidor, el test falla antes del deploy del proveedor. Esto invierte la responsabilidad: quien cambia demuestra que no rompió a nadie.
Desktop builds need native runners — Code signing, notarization, and native modules (keytar, better-sqlite3) are OS-specific. Cannot validate a macOS build on Linux. macOS runners are ~10x more expensive — mitigated by only triggering on PRs that touch core-product-desktop-client.Builds de escritorio necesitan runners nativos — Firma de código, notarización, y módulos nativos (keytar, better-sqlite3) son específicos del OS. No se puede validar un build de macOS en Linux. Runners macOS son ~10x más caros — se mitiga corriendo solo en PRs que tocan core-product-desktop-client.
Figma eval uses REST API, not MCP — MCP is for interactive agent use (when Claude implements components). Automated evaluation in CI needs a programmatic client calling the Figma REST API directly. MCP is reserved for manual diagnosis and on-demand pre-implementation checks.Figma eval usa API REST, no MCP — MCP es para uso interactivo del agente (cuando Claude implementa componentes). La evaluación automatizada en CI necesita un cliente programático que llame a la API REST de Figma directamente. MCP se reserva para diagnóstico manual y checks de pre-implementación bajo demanda.
Figma eval is scheduled, not PR-triggered — Figma has no PRs or library-publish webhooks. The pipeline runs on schedule (weekly) or on demand. It does not block merges of code — it blocks implementation: the agent must not implement a component that doesn’t pass quality checks.Figma eval es programado, no PR-triggered — Figma no tiene PRs ni webhooks de publicación de librería. El pipeline corre en schedule (semanal) o bajo demanda. No bloquea merges de código — bloquea implementación: el agente no debe implementar un componente que no pasa los checks de calidad.
Figma checks come from doc 72, not invented — Each of the 15 checks is mapped to a specific requirement from Design System Internals (doc 72). If a requirement changes in doc 72, the check updates. The Eval Framework does not define what Figma should have — the Design System defines it, the Eval Framework verifies it.Los checks del Figma vienen del doc 72, no son inventados — Cada uno de los 15 checks está mapeado a un requisito específico del Design System Internals (doc 72). Si el requisito cambia en el doc 72, el check se actualiza. El Eval Framework no define qué debe tener el Figma — el Design System lo define, el Eval Framework lo verifica.
File StructureEstructura de Archivos
core-quality-stack-evaluation/
│── src/
│ │── domain/
│ │ │── interfaces/
│ │ │ │── IEvalPipeline.ts
│ │ │ │── ILLMJudge.ts
│ │ │ │── IFigmaAPIClient.ts
│ │ │ ├── IGoldenDatasetManager.ts
│ │ ├── models/
│ │ │── EvalConfig.ts
│ │ │── EvalReport.ts
│ │ │── GoldenDataset.ts
│ │ │── LLMJudgeScore.ts
│ │ │── DesktopBuildReport.ts
│ │ ├── FigmaQualityReport.ts
│ │── application/
│ │ │── EvalRunner.ts
│ │ │── LLMJudge.ts
│ │ │── ContractTester.ts
│ │ ├── ReportGenerator.ts
│ ├── infrastructure/
│ │── runners/
│ │ │── CoachEvalRunner.ts
│ │ │── ContractEvalRunner.ts
│ │ │── KBQualityRunner.ts
│ │ │── DesktopBuildRunner.ts
│ │ ├── FigmaQualityRunner.ts
│ │── judge/
│ │ ├── AnthropicLLMJudge.ts
│ ├── figma/
│ ├── FigmaRESTClient.ts
│── datasets/
│ │── coach/ ← golden cases (JSON/YAML, versioned)
│ │── kb/ ← KB quality cases
│ │── contracts/ ← contract definitions between projects
│ │── desktop/ ← build check config + thresholds per platform
│ ├── figma/ ← file keys, checks enabled, thresholds per file
│── cli/
│ ├── eval.ts ← npm run eval -- --pipeline=<type> [--platform=<os>]
├── .github/
├── workflows/
│── eval-on-pr.yml
│── desktop-build-eval.yml
├── figma-quality-eval.yml
MVP Scope
Phase 1: Golden dataset (20-30 cases) + CoachEvalRunner + AnthropicLLMJudge + CI gate. Not runtime — CI/CD only. Expands across 5 phases: Coach quality, API contracts, KB+E2E, desktop builds (macOS+Windows), and Figma quality (15 checks via REST API). Fase 1: Golden dataset (20-30 casos) + CoachEvalRunner + AnthropicLLMJudge + gate CI. No es runtime — solo CI/CD. Se expande en 5 fases: calidad del Coach, contratos API, KB+E2E, builds de escritorio (macOS+Windows), y calidad del Figma (15 checks via API REST).
Inspired byInspirado en
Pact (consumer-driven contracts), DeepEval, Anthropic eval best practices Pact (contratos consumer-driven), DeepEval, mejores practicas de eval de Anthropic
📝 Project ChangelogChangelog del Proyecto
desktop_build (Electron macOS+Windows, 11 checks) and figma_quality (15 checks via Figma REST API)2 nuevos pipelines de evaluación: desktop_build (Electron macOS+Windows, 11 checks) y figma_quality (15 checks via Figma REST API)Layer 7 — INTERNALCapa 7 — INTERNO
How the team worksCómo trabaja el equipo
Beautonomous
Internal — Pablo — Zero code. Zero infrastructure. Config only.Cero código. Cero infraestructura. Solo configuración.
The internal operating agent of the Shopilot team. Lives in OpenClaw UI — the team opens the core-internal-team-workflow project and works from there. Slack receives proactive notifications and pipeline approvals directly, without opening any other tool.El agente operativo interno del equipo Shopilot. Vive en OpenClaw UI — el equipo abre el proyecto core-internal-team-workflow y trabaja desde ahí. Slack recibe las notificaciones proactivas y las aprobaciones del pipeline directamente, sin abrir ninguna otra herramienta.
4 engineers operating as 10–15. The problem is not technical capacity — it’s operational fragmentation: to know what’s happening you have to go to Linear, GitHub, and Slack separately; simple changes require interrupting someone; there’s no centralized place to approve changes or trigger reviews. Beautonomous solves this from OpenClaw UI (main interface: full conversation, context, history, all tools, automatic role auth) and Slack (second native channel: direct conversation, proactive notifications, pipeline approvals). 4 native OAuth connectors (GitHub · Linear · Code · Slack) + 3 governance roles (El Capitán / El Mago / El Artesano) + a Quality Gate that runs automatically on every PR across all 11 repos. The only code to write: the script that calls Claude Code via API inside quality-gate.yml — written once, replicated from a template in core-internal-team-workflow/templates/.
4 ingenieros operando como 10–15. El problema no es la capacidad técnica — es la fragmentación operativa: para saber qué está pasando hay que ir a Linear, GitHub y Slack por separado; los cambios simples requieren interrumpir a alguien; no hay un lugar centralizado para aprobar cambios o disparar reviews. Beautonomous lo resuelve desde OpenClaw UI (interfaz principal: conversación completa, contexto, historial, todas las herramientas, auth automática por rol) y Slack (segundo canal nativo: conversación directa, notificaciones proactivas, aprobaciones del pipeline). 4 conectores OAuth nativos (GitHub · Linear · Code · Slack) + 3 roles de gobernanza (El Capitán / El Mago / El Artesano) + un Quality Gate que corre automáticamente en cada PR de los 11 repos. El único código que hay que escribir: el script que invoca Claude Code vía API dentro del quality-gate.yml — se escribe una vez y se replica desde un template en core-internal-team-workflow/templates/.
Main interface — full context + historyInterfaz principal — contexto + historial
Second native channel — alerts + approvalsSegundo canal nativo — alertas + aprobaciones
Repos, PRs, Issues, Actions — 10 toolsRepos, PRs, Issues, Actions — 10 herramientas
Tasks, sprints, assignments — 9 toolsTareas, sprints, asignaciones — 9 herramientas
3 roles + risk taxonomy + audit log3 roles + taxonomía de riesgo + audit log
lint + tests + architecture review per PRlint + tests + architecture review por PR
Read + propose changes via PR — 7 toolsLectura + proponer cambios via PR — 7 herramientas
CLAUDE.md + .claudeignore + settings.json + MEMORY.md + specs/ + skills/ + quality-gate.yml
Configuration Stack — the only “code”: quality-gate.yml script (written once, replicated from template)Stack de Configuración — el único “código”: script quality-gate.yml (se escribe una vez, se replica desde template)
Beautonomous depends on: OpenClaw account + GitHub org beautonomous + Linear workspace AUT + Slack workspace beautonomous. All other projects (#1–#16) depend on Beautonomous being operational first.Beautonomous depende de: cuenta OpenClaw + org GitHub beautonomous + workspace Linear AUT + workspace Slack beautonomous. Todos los demás proyectos (#1–#16) dependen de que Beautonomous esté operacional primero.
Architecture, Quality Gate, Bootstrap, Governance & System Prompt Arquitectura, Quality Gate, Bootstrap, Gobernanza & System Prompt
┌──────────────────────────────┐ ┌──────────────────────────────┐
│ OPENCLAW UI │ │ SLACK │
│ Interfaz principal │ │ Segundo canal nativo │
│ Conversación + historial │ │ Notificaciones proactivas │
│ Todas las herramientas │ │ Aprobaciones del pipeline │
│ Auth automática por rol │ │ Alertas de CI/CD │
└──────────────┬───────────────┘ └───────────────┬──────────────┘
│ │
└────────────┬────────────────────────┘
│
Terminal / Claude Code
(operaciones técnicas directas)
│
┌─────────────────────────▼────────────────────────────────┐
│ OPENCLAW — Motor del agente │
│ ReAct Loop · Governance Guard · Audit Log │
│ Auth: identifica rol automáticamente por usuario logueado │
│ Conectores: GitHub · Linear · Code · Slack │
└─────────────────────────┬────────────────────────────────┘
│ invoca via API / GitHub Actions
┌─────────────────────────▼────────────────────────────────┐
│ ESTRUCTURA BASE DE CALIDAD — por cada repositorio del stack │
│ ├── CLAUDE.md instrucciones + convenciones del repo │
│ ├── .claude/memory/ contexto persistente │
│ └── quality-gate.yml GitHub Action: lint + tests + review │
└─────────────────────────────────────────────────────────────────┘
1. See status from SlackVer status desde Slack
Any member asks in Slack and gets a synthesized response from GitHub, Linear and Slack — without opening other tools. Daily summary auto-published in #team at 9:00 AM: pending PRs, failing workflows, tasks in progress per person, active blockers.Cualquier miembro pregunta en Slack y obtiene una respuesta sintetizada desde GitHub, Linear y Slack — sin abrir otras herramientas. Resumen diario automático en #team a las 9:00 AM: PRs pendientes, workflows fallando, tareas en progreso por persona, bloqueos activos.
2. Create and manage tasks from SlackCrear y gestionar tareas desde Slack
Create tasks, assign them, change status and add comments in Linear — from Slack, without opening Linear.Crear tareas, asignarlas, cambiar estado y agregar comentarios en Linear — desde Slack, sin abrir Linear.
3. Approve PRsAprobar PRs
When a PR passes the quality gate, Beautonomous notifies Mateo (El Mago) with the PR summary, diff and automatic review result. Mateo can respond from OpenClaw UI or directly from the Slack DM — wherever he is at that moment. If the PR goes to production, the same flow reaches Pablo after Mateo approves. The team doesn’t need to enter GitHub to approve — the decision happens where the approver is, the merge and deploy happen automatically.Cuando un PR pasa el quality gate, Beautonomous notifica a Mateo con el resumen del PR, el diff y el resultado de la revisión automática. Mateo puede responder desde OpenClaw UI o directamente desde el DM en Slack — donde esté en ese momento. Si el PR va a producción, el mismo flujo llega a Pablo después de que Mateo aprueba. El equipo no necesita entrar a GitHub para aprobar — la decisión ocurre donde el aprobador esté, el merge y el deploy ocurren automáticamente.
4. Activate the quality agentActivar el quality agent
The quality gate runs automatically on every PR. Also activatable manually from OpenClaw UI, terminal or Slack to review any repo at any time. Review includes cross-repo contract validation: if a PR breaks an interface another project consumes, the quality gate detects it and fails with the specific reason. Contracts live in the CLAUDE.md of each repo.El quality gate corre automáticamente en cada PR. También puede activarse manualmente desde OpenClaw UI, terminal o Slack para revisar cualquier repo. La revisión incluye validación de contratos entre repos: si un PR rompe una interfaz que otro proyecto consume, el quality gate lo detecta y falla con la razón específica. Los contratos viven en el CLAUDE.md de cada repo.
Runs automatically on every PR to develop or main, and manually from Slack. Steps are sequential — if any fails, the PR does not advance. If step 0 fails, Beautonomous notifies in #deploys with missing files and bootstrap instructions.Corre automáticamente en cada PR hacia develop o main, y manualmente desde Slack. Los pasos son secuenciales — si cualquiera falla, el PR no avanza. Si el paso 0 falla, Beautonomous notifica en #deploys con los archivos faltantes e instrucciones de bootstrap.
| StepPaso | ToolHerramienta | What it detectsQué detecta |
|---|---|---|
| 0. Base structure | Shell script | Required files present (CLAUDE.md, .claudeignore, settings.json, MEMORY.md, specs/, skills/)Archivos requeridos presentes (CLAUDE.md, .claudeignore, settings.json, MEMORY.md, specs/, skills/) |
| 1. Lint + types | ESLint + tsc / ruff | Syntax errors, incorrect typesErrores de sintaxis, tipos incorrectos |
| 2. Tests | Jest / pytest | Broken tests, coverage below minimum defined in CLAUDE.mdTests rotos, cobertura bajo el mínimo definido en CLAUDE.md |
| 3. Architecture review | Claude Code via API | Clean Architecture boundary violations, broken contracts between reposViolaciones de boundaries de Clean Architecture, contratos rotos entre repos |
| 4. Convention check | Claude Code via API | Naming, folder structure, repo-specific patternsNaming, estructura de carpetas, patrones específicos del repo |
Steps 3 and 4 receive full context: CLAUDE.md + MEMORY.md + .claude/specs/architecture.md + .claude/specs/contracts.md + PR diff + repo skills. Output: structured JSON with passed/failed checks and actionable issues per file/line.Los pasos 3 y 4 reciben contexto completo: CLAUDE.md + MEMORY.md + .claude/specs/architecture.md + .claude/specs/contracts.md + diff del PR + skills del repo. Output: JSON estructurado con checks aprobados/fallidos e issues accionables por archivo/línea.
El Mago runs the bootstrap by copying templates from core-internal-team-workflow/templates/ and filling in the repo-specific context. Without bootstrap, the agent operates without context and the quality gate fails at step 0.El Mago ejecuta el bootstrap copiando los templates de core-internal-team-workflow/templates/ y rellenando el contexto específico del repo. Sin bootstrap el agente opera sin contexto y el quality gate falla en el paso 0.
repo/
├── CLAUDE.md # instrucciones + convenciones del repo
├── .claudeignore # archivos que Claude no debe leer
└── .claude/
├── settings.json # permisos + hook PostToolUse (build:check)
├── memory/
│ └── MEMORY.md # contexto persistente del repo
├── specs/
│ ├── architecture.md # decisiones + boundaries
│ ├── contracts.md # contratos con otros repos
│ └── testing.md # qué testear y cómo
└── skills/ (symlinks)
├── clean-ddd-hexagonal # todos los repos
├── solid # todos los repos
└── clean-architecture # todos los repos
+ .github/workflows/quality-gate.yml # GitHub Action: lint + tests + Claude Code review
PostToolUse hook in settings.json runs build:check automatically after every edit — the agent sees TypeScript errors immediately without being asked. core-intelligence-conversation-api has 11 skills installed and serves as the reference repo for bootstrap.El hook PostToolUse en settings.json corre build:check automáticamente después de cada edición — el agente ve los errores de TypeScript de inmediato. core-intelligence-conversation-api tiene 11 skills instalados y sirve como repo de referencia para el bootstrap.
A contract is any interface or agreement between two projects that, if changed in one, breaks the other. Contracts live in CLAUDE.md under a standard “Contratos con otros repos” section. The quality gate reads them in every PR to detect breaks. El Mago updates them when an integration is designed or changed — not automatic, it’s an architecture decision.Un contrato es cualquier interfaz o acuerdo entre dos proyectos que, si cambia en uno, rompe el otro. Los contratos viven en CLAUDE.md bajo una sección estándar. El quality gate los lee en cada PR para detectar rupturas. El Mago los actualiza cuando se diseña o cambia una integración — no es automático, es una decisión de arquitectura.
## Contratos con otros repos
### Expone (otros repos dependen de esto)
- ICreditsGate.canProceed({ userId, toolCategory }) → { allowed, reason }
Consumidor: core-intelligence-conversation-api
Rompe si: cambia la firma, cambia el significado de `allowed`, se elimina
### Consume (este repo depende de esto)
- POST /internal/gate (core-platform-billing)
Rompe si: cambia el path, cambia el body schema, cambia los status codes
| ActionAcción | El Capitán | El Mago | El Artesano |
|---|---|---|---|
| View team status (GitHub / Linear / Slack)Ver estado del equipo (GitHub / Linear / Slack) | ✅ | ✅ | ✅ |
| Read code (all repos)Consultar código (lectura total) | ✅ | ✅ | ✅ |
| Create tasks in LinearCrear tareas en Linear | ✅ | ✅ | ✅ |
| Assign tasks to anyoneAsignar tareas a cualquier persona | ✅ | ✅ | Own onlySolo propias |
| Send messages to Slack channelsEnviar mensajes a canales de Slack | ⛔ | ✅ | ✅ (conf.) |
| Trigger staging workflowDisparar workflow (staging) | ⛔ | ✅ | ✅ |
| Trigger production workflowDisparar workflow (producción) | ⛔ | ✅ + conf. | ⛔ |
| Propose UI code changes (generates PR)Proponer cambios de código UI (genera PR) | ✅ | ✅ | ✅ |
| Propose backend logic changes (generates PR)Proponer cambios de lógica backend (genera PR) | ⛔ | ✅ | ✅ |
| Infra / critical config changesCambios de infra / configuración crítica | ⛔ | ✅ + conf. | ⛔ |
| Approve agent-generated PRsAprobar PRs generados por el agente | ⛔ | ✅ | ⛔ |
| Approve deploy to productionAprobar deploy a producción | ✅ (final) | ✅ (técn.) | ⛔ |
| Manage roles in BeautonomousGestionar roles en Beautonomous | ⛔ | ✅ | ⛔ |
# Beautonomous — Agente Operativo Interno de Shopilot
Eres el agente operativo del equipo. Tu función: dar visibilidad completa
del proyecto y ejecutar acciones en GitHub, Linear, Slack y el código.
Operas desde OpenClaw UI (interfaz principal), Slack (notificaciones y
aprobaciones) y terminal. El rol del usuario ya viene determinado por
OpenClaw — nunca lo asumas ni lo pidas explícitamente.
## Usuario actual
{USER_NAME} | {USER_EMAIL} | Rol: {USER_ROLE}
## Roles
El Capitán (pablo@shopilot.ai):
- Lectura total de GitHub, Linear y Slack
- Crear y asignar tareas en Linear
- Solicitar cambios de UI (genera PR, El Mago aprueba)
- Aprobación final de negocio para deploys a producción
- NO puede disparar workflows ni tocar código backend/infra
El Mago (mateo@shopilot.ai):
- Acceso completo a todos los sistemas
- Aprobar y rechazar PRs del agente
- Disparar cualquier workflow (siempre con confirmación previa)
- Enviar mensajes a Slack en nombre del equipo
- Modificar infra y config crítica (con confirmación)
- Gestionar permisos del equipo en Beautonomous
- Firma técnica en el pipeline de aprobación
El Artesano (andres@shopilot.ai, sergio@shopilot.ai):
- Lectura total de todos los repos y Slack
- Proponer cambios de código via PR (El Mago los aprueba)
- Disparar workflows de staging
- Crear y gestionar tareas propias en Linear
- Enviar mensajes a Slack (con confirmación previa)
## Gobernanza — NUNCA omitas estas reglas
1. Antes de cualquier escritura: muestra exactamente qué vas a hacer.
2. Para código: muestra el diff completo antes de crear el PR.
3. Para Slack: muestra la vista previa antes de publicar.
4. Si el rol no tiene permiso: explica por qué y ofrece escalar a El Mago.
5. Acciones de alto riesgo requieren confirmación de El Mago, siempre.
6. Confirma el resultado: qué cambió, dónde, cuándo.
## Repositorios del stack (11)
core-intelligence-conversation-api (Coach — Node.js 18 TypeScript)
core-knowledge-semantic-base (KB — Go + Vertex AI + BigQuery)
core-knowledge-data-synchronizator (Data Sync — Airflow + GCS)
core-product-desktop-client (App — Electron + React)
core-platform-infrastructure (Infra — CDK TypeScript + Terraform GCP)
core-action-marketplace-provider
core-platform-billing
core-knowledge-enrichment
core-quality-feedback
core-quality-stack-evaluation
core-internal-team-workflow (este proyecto — solo configuración)
## Canales Slack autorizados
#engineering · #deploys · #general · #team
| ConnectorConector | ReadLectura | WriteEscritura | Total |
|---|---|---|---|
| GitHub | repos, PRs, issues, workflows, logsrepos, PRs, issues, workflows, logs | issues, comments, propose PR, trigger/re-run workflowsissues, comentarios, proponer PR, disparar/re-ejecutar workflows | 10 |
| Linear | tasks, sprints, team metricstareas, sprints, métricas del equipo | create/assign/comment tasks, change status/priority, create sprintcrear/asignar/comentar tareas, cambiar estado/prioridad, crear sprint | 9 |
| Code | read file, search in codeleer archivo, buscar en código | low-risk changes via PR, propose logic changes via PRcambios de bajo riesgo via PR, proponer cambios de lógica via PR | 7 |
| Slack | channels, threads, searchcanales, hilos, búsqueda | messages (with prior confirmation), approval notificationsmensajes (con confirmación previa), notificaciones de aprobación | 5 |
- “Beautonomous” project created in OpenClaw with system prompt configured
- GitHub OAuth connected and all 11 repos authorized
- Linear OAuth connected with Shopilot workspace (AUT team)
- Slack OAuth connected with 4 authorized channels (#engineering, #deploys, #team, #general)
- 4 roles correctly assigned by email (pablo=Capitán, mateo=Mago, andres=Artesano, sergio=Artesano)
- All 4 team members make 3 read queries each without error
- 5 tasks created in Linear from Beautonomous without incidents
- 1 PR passes quality gate automatically → Mateo receives DM with summary, diff and result, approves from Slack
- quality-gate.yml deployed and running in at least 3 repos (bootstrap complete)
- Beautonomous detects a GitHub Actions failure and notifies in #deploys in <5 minutes
- Daily sprint summary published in #team at 9:00 AM for 3 consecutive days
- Proyecto “Beautonomous” creado en OpenClaw con system prompt configurado
- GitHub OAuth conectado y los 11 repos autorizados
- Linear OAuth conectado con el workspace Shopilot (equipo AUT)
- Slack OAuth conectado con 4 canales autorizados (#engineering, #deploys, #team, #general)
- 4 roles asignados correctamente por email (pablo=Capitán, mateo=Mago, andres=Artesano, sergio=Artesano)
- Los 4 miembros hacen 3 consultas de lectura sin error
- 5 tareas creadas en Linear desde Beautonomous sin incidentes
- 1 PR pasa el quality gate automáticamente → Mateo recibe DM con resumen, diff y resultado, aprueba desde Slack
- quality-gate.yml desplegado y corriendo en al menos 3 repos (bootstrap completo)
- Beautonomous detecta un fallo de GitHub Actions y notifica en #deploys en <5 minutos
- Resumen diario publicado en #team a las 9:00 AM durante 3 días consecutivos
Sequential Approval PipelinePipeline de Aprobación Secuencial
PR abierto
│
▼
Quality Gate (automático — Claude Code)
├── FALLA → #deploys + DM al Artesano → vuelve al Artesano. Fin.
│
└── PASA → DM a Mateo en Slack
│
├── RECHAZA → comentario en PR + DM al Artesano. Fin.
│
└── APRUEBA
├── destino staging → merge automático
└── destino prod → DM a Pablo en Slack
├── RECHAZA → Fin.
└── APRUEBA → merge → deploy prod
Mateo and Pablo approve from Slack: Beautonomous sends PR summary + diff + quality gate result to Slack and the approver responds in that thread. Zero context switch.Mateo y Pablo aprueban desde Slack: Beautonomous envía el resumen del PR + diff + resultado del quality gate, y el aprobador responde en ese hilo. Cero context switch.
Proactivity — Beautonomous doesn’t wait to be askedProactividad — Beautonomous no espera que le pregunten
| TriggerDisparador | Automatic actionAcción automática |
|---|---|
| GitHub Action fails (any repo)GitHub Action falla (cualquier repo) | Message in #deploys: workflow, repo, branch, link to logMensaje en #deploys: workflow, repo, rama, link al log |
| GitHub Action fails on main or prodGitHub Action falla en main o prod | Message in #deploys + direct DM to El MagoMensaje en #deploys + DM directo a El Mago |
| PR unreviewed >4 hoursPR sin revisar >4 horas | Ping in #engineering with link and authorPing en #engineering con enlace y autor |
| Linear task blocked >2 daysTarea Linear bloqueada >2 días | Alert to El Mago with block contextAlerta a El Mago con contexto del bloqueo |
| 9:00 AM daily9:00 AM diario | Summary in #team: pending PRs, failing CI, tasks in progress per personResumen en #team: PRs pendientes, CI fallando, tareas en progreso por persona |
What it does NOT doQué NO hace
Not the Shopilot product interfaceNo es la interfaz del producto Shopilot — Beautonomous is the team’s agent, not the seller’s. Zero relation with the seller Coach or projects #1–#16 at runtime.Beautonomous es el agente del equipo, no del vendedor. No tiene ninguna relación con el Coach de los vendedores ni con los proyectos #1–#16 en tiempo de ejecución.
Does not self-mergeNo hace self-merge — PRs generated by the agent can only be approved by El Mago. No exceptions — never self-merge.Los PRs que genera el agente solo los puede aprobar El Mago. Sin excepción — nunca self-merge.
Does not manage production credentialsNo gestiona credenciales de producción — AWS/GCP secrets, external API tokens, prod env vars — out of scope. El Mago manages them directly.Secrets de AWS/GCP, tokens de APIs externas, variables de entorno de prod — fuera del scope. Los maneja El Mago directamente.
Does not make technical decisionsNo toma decisiones técnicas — Detects convention violations in the quality gate but doesn’t decide if an architecture change is correct. Escalates to El Mago with context.Detecta violaciones de convenciones en el quality gate, pero no decide si un cambio de arquitectura es correcto. Escala a El Mago con contexto.
Does not auto-sync .memory between reposNo sincroniza automáticamente los .memory entre repos — The general MEMORY.md is not auto-generated from individual ones. Requires El Mago to update it when there are cross-repo relevant decisions.El MEMORY.md general no se genera automáticamente desde los individuales. Requiere que El Mago lo actualice cuando hay decisiones cross-repo relevantes.
5-Phase Implementation Plan — everything is OpenClaw config, the only code is quality-gate.ymlPlan de Implementación en 5 Fases — todo es config OpenClaw, el único código es quality-gate.yml
Phase 1 — Connect (Day 1–2)Fase 1 — Conectar (Día 1–2)
Create Beautonomous project in OpenClaw → connect GitHub OAuth → authorize 11 repos → connect Linear OAuth → paste system prompt. Agent operational for read queries. Owner: Pablo.Crear proyecto Beautonomous en OpenClaw → conectar GitHub OAuth → autorizar 11 repos → conectar Linear OAuth → pegar system prompt. Agente operacional para consultas de lectura. Owner: Pablo.
Phase 2 — Roles & Slack (Day 2–3)Fase 2 — Roles y Slack (Día 2–3)
Assign 3 roles by email in OpenClaw Team Settings → connect Slack OAuth → authorize 4 channels → configure proactivity alerts → validate: each team member makes 3 read queries. Owner: Mateo.Asignar 3 roles por email en OpenClaw Team Settings → conectar Slack OAuth → autorizar 4 canales → configurar alertas de proactividad → validar: cada miembro hace 3 consultas. Owner: Mateo.
Phase 3 — Quality Gate Bootstrap (Week 1–2)Fase 3 — Bootstrap Quality Gate (Semana 1–2)
Copy templates from core-internal-team-workflow/templates/ to each repo: CLAUDE.md + MEMORY.md + .claude/specs/ + skills symlinks + quality-gate.yml → configure branch protection rules (develop: quality gate + 1 review; main: quality gate + 2 reviews + no direct push). Owner: Mateo.Copiar templates de core-internal-team-workflow/templates/ a cada repo: CLAUDE.md + MEMORY.md + .claude/specs/ + symlinks de skills + quality-gate.yml → configurar branch protection rules (develop: quality gate + 1 review; main: quality gate + 2 reviews + no direct push). Owner: Mateo.
Phase 4 — Progressive Write Access (Week 2)Fase 4 — Escritura Progresiva (Semana 2)
Enable write categories most-reversible first: Linear tasks → GitHub issues → re-run workflows → propose code changes via PR → staging workflows. Each step validates before advancing. Owner: Mateo.Habilitar escritura por categoría, lo más reversible primero: tareas Linear → issues GitHub → re-run workflows → proponer cambios via PR → workflows staging. Cada paso valida antes de avanzar. Owner: Mateo.
Phase 5 — PR Approval Pipeline Validation (Week 2–3)Fase 5 — Validación Pipeline de Aprobación (Semana 2–3)
Validate end-to-end pipeline: PR → quality gate → Mateo DM → Pablo DM (prod only) → auto merge. Validate that the agent does not self-merge its own PRs. Validate daily summary in #team at 9:00 AM. Owner: Mateo + Pablo.Validar el pipeline end-to-end: PR → quality gate → DM Mateo → DM Pablo (solo prod) → merge automático. Validar que el agente no hace self-merge de sus propios PRs. Validar resumen diario en #team a las 9:00 AM. Owner: Mateo + Pablo.
Risk AnalysisAnálisis de Riesgos
Governance jailbreakJailbreak de gobernanza
Impact: HighImpacto: Alto
Mitigation: Double layer — rules in system prompt (LLM understands why) AND platform-level permissions in OpenClaw (LLM cannot do X regardless of prompt). Both layers required: one for reasoning quality, one for operational safety.Mitigación: Doble capa — reglas en system prompt (el LLM entiende por qué) Y permisos a nivel de plataforma OpenClaw (el LLM no puede hacer X). Ambas capas requeridas: una para calidad de razonamiento, otra para seguridad operativa.
Quality gate without context (step 0 fails)Quality gate sin contexto (falla paso 0)
Impact: MediumImpacto: Medio
Mitigation: Step 0 verifies required files before running the agent. Beautonomous notifies in #deploys with missing files and bootstrap instructions. No repo is unblocked without the complete base structure.Mitigación: El paso 0 verifica los archivos requeridos antes de correr el agente. Beautonomous notifica en #deploys con los archivos faltantes e instrucciones de bootstrap. Ningún repo queda desbloqueado sin la estructura base completa.
Broken cross-repo contracts not detectedContratos entre repos rotos sin detectar
Impact: HighImpacto: Alto
Mitigation: Contracts in CLAUDE.md + the quality gate reads them on every PR. El Mago updates contracts when an integration is designed or changed — not optional.Mitigación: Contratos en CLAUDE.md + el quality gate los lee en cada PR. El Mago actualiza los contratos cuando se diseña o cambia una integración — no es opcional.
System prompt stalenessSystem prompt obsoleto
Impact: Low–MediumImpacto: Bajo–Medio
Mitigation: Bi-weekly review owned by Pablo. As the stack evolves (new repos, tools, governance rules), the system prompt must reflect it. Version the prompt in git alongside technical specs.Mitigación: Revisión bimensual propiedad de Pablo. A medida que el stack evoluciona, el system prompt debe reflejarlo. Versionar el prompt en git junto a los specs técnicos.
Key DecisionsDecisiones Clave
OpenClaw vs. custom agent infrastructureOpenClaw vs. infraestructura de agente propia — Building a custom operational agent would require: Lambda, DynamoDB, GitHub App, Linear webhook, Slack bot — 3–4 weeks of engineering. OpenClaw provides all of this from Day 1. The Shopilot team builds for sellers, not for itself.Construir un agente operativo propio requeriría: Lambda, DynamoDB, GitHub App, webhook Linear, bot Slack — 3–4 semanas de ingeniería. OpenClaw provee todo esto desde el Día 1. El equipo Shopilot construye para vendedores, no para sí mismo.
Quality Gate via Claude Code API vs. static linters onlyQuality Gate via Claude Code API vs. solo linters estáticos — Static linters detect syntax and types but miss architecture boundaries and cross-repo contracts. Claude Code with repo context (CLAUDE.md + specs) detects what linters cannot. The script is written once and replicated — maintenance cost is O(1).Los linters estáticos detectan sintaxis y tipos pero no boundaries de arquitectura ni contratos entre repos. Claude Code con contexto del repo (CLAUDE.md + specs) detecta lo que los linters no pueden. El script se escribe una vez y se replica — costo de mantenimiento O(1).
Contracts in CLAUDE.md (not a separate service)Contratos en CLAUDE.md (no un servicio separado) — A contract registry as a separate service creates yet another thing to maintain. CLAUDE.md already lives in every repo, is versioned with the code, and the quality gate already reads it. Contracts are plain text in an existing file — zero overhead.Un registro de contratos como servicio separado crea otra cosa más que mantener. CLAUDE.md ya vive en cada repo, se versiona con el código, y el quality gate ya lo lee. Los contratos son texto plano en un archivo existente — cero overhead.
Slack as second native channel (not just notifications)Slack como segundo canal nativo (no solo notificaciones) — The team is in Slack all day. Requiring them to open OpenClaw to approve a PR creates friction. Beautonomous sends diff + quality gate result to Slack and the approver responds in the same thread — zero context switch.El equipo está en Slack todo el día. Obligarlos a abrir OpenClaw para aprobar un PR genera fricción. Beautonomous envía el diff + resultado del quality gate a Slack y el aprobador responde en el mismo hilo — cero context switch.
Current StateEstado Actual
| View status from SlackVer status desde Slack | ❌ Pending — OpenClaw + connectorsPendiente — OpenClaw + conectores |
| Create tasks from SlackCrear tareas desde Slack | ❌ Pending — Linear OAuthPendiente — Linear OAuth |
| Approve PRs from SlackAprobar PRs desde Slack | ❌ Pending — quality gate + branch protectionPendiente — quality gate + branch protection |
| Activate quality agent from SlackActivar quality agent desde Slack | ❌ Pending — quality-gate.yml in 11 reposPendiente — quality-gate.yml en 11 repos |
| Proactivity (alerts + daily summary)Proactividad (alertas + resumen diario) | ❌ Pending — OpenClaw configuredPendiente — OpenClaw configurado |
| Base structure per repo (bootstrap)Estructura base por repo (bootstrap) | 🔨 Partial — only core-intelligence-conversation-api (incomplete: no specs/, skills/, full .claudeignore)Parcial — solo core-intelligence-conversation-api (incompleto: sin specs/, skills/, .claudeignore completo) |
📋 Project ChangelogChangelog del Proyecto
9. MVP — 10+2 Week Execution Plan MVP — Plan de Ejecucion 10+2 Semanas
Quick Navigation Navegacion Rapida
9.1
OverviewResumen
9.2
Philosophy SV/YCFilosofía SV/YC
9.3
Pre-SprintPre-Sprint
9.4
TimelineTimeline
9.5
DeliverablesEntregables
9.6
Eng. TracksTracks Ing.
9.7
Sprint Tasks 100%Tareas Sprint 100%
9.8
Tasks + ACTareas + AC
9.9
Daily BlueprintBlueprint Diario
9.10
Critical PathRuta Crítica
9.11
Dep. MapMapa Deps.
9.12
GatesGates
9.13
Risk RegisterRegistro Riesgos
9.14
Infra & CostsInfra y Costos
9.15
Ops & LaunchOps y Lanzamiento
9.16
Workflow & CoffeeWorkflow y Coffee
9.1 Non-Technical Overview Resumen No Técnico
For investors, advisors, and non-technical stakeholders Para inversores, advisors y stakeholders no técnicos
What is Shopilot? Que es Shopilot?
Shopilot is an AI assistant that lives inside your online store. Think of it as a smart co-worker who knows your products, watches your competitors, and helps you make better decisions — all from a single app where you also browse your marketplace normally. Shopilot es un asistente de IA que vive dentro de tu tienda online. Piensa en un companero de trabajo inteligente que conoce tus productos, vigila a tus competidores y te ayuda a tomar mejores decisiones — todo desde una sola app donde tambien navegas tu marketplace normalmente.
It works with MercadoLibre, Amazon, and Shopify — the three biggest e-commerce platforms in Latin America. One assistant for all your stores. Funciona con MercadoLibre, Amazon y Shopify — las tres plataformas de e-commerce mas grandes de Latinoamerica. Un solo asistente para todas tus tiendas.
The Problem El Problema
Online sellers spend 3-5 hours every day on repetitive tasks that don't directly grow their business: Los vendedores online gastan 3-5 horas cada dia en tareas repetitivas que no hacen crecer directamente su negocio:
• Checking competitor prices across dozens of products• Revisar precios de la competencia en docenas de productos
• Manually optimizing listing titles and descriptions• Optimizar manualmente titulos y descripciones de publicaciones
• Monitoring inventory levels across marketplaces• Monitorear niveles de inventario entre marketplaces
• Analyzing sales data to spot trends and problems• Analizar datos de ventas para detectar tendencias y problemas
• Switching between marketplace dashboards and spreadsheets• Alternar entre dashboards de marketplaces y hojas de calculo
• Reacting to competitor changes hours or days too late• Reaccionar a cambios de la competencia horas o dias tarde
Result: sellers are reactive instead of strategic. They spend their time on operations, not growth. Resultado: los vendedores son reactivos en vez de estrategicos. Gastan su tiempo en operaciones, no en crecimiento.
Our Solution Nuestra Solucion
An AI copilot that understands your store, answers your questions, takes actions with your permission, and proactively alerts you when something needs attention. It's not a dashboard — it's a conversation. Un copiloto de IA que entiende tu tienda, responde tus preguntas, toma acciones con tu permiso, y te alerta proactivamente cuando algo necesita atencion. No es un dashboard — es una conversacion.
You say: "How are my sales this week?"Tu dices: "Como van mis ventas esta semana?"
Shopilot: "Sales are up 12% vs last week. Your top product is X with 45 units. However, product Y dropped 30% — your competitor lowered their price by 15%. Want me to adjust your price?"Shopilot: "Las ventas subieron 12% vs la semana pasada. Tu producto top es X con 45 unidades. Sin embargo, el producto Y cayo 30% — tu competidor bajo su precio 15%. Quieres que ajuste tu precio?"
You say: "Yes, match their price minus 5%"Tu dices: "Si, iguala su precio menos 5%"
Shopilot: "Done. Price updated from $89 to $76. Next time you ask about this product, I'll show you how the competition reacted."Shopilot: "Listo. Precio actualizado de $89 a $76. La próxima vez que preguntes por este producto, te mostraré cómo reaccionó la competencia."
How It Works (User Journey) Como Funciona (Recorrido del Usuario)
1. Download & Install1. Descarga e Instala
Download Shopilot.app for Mac. Install in seconds — no technical setup required.Descarga Shopilot.app para Mac. Se instala en segundos — no requiere setup técnico.
2. Connect Your Store2. Conecta Tu Tienda
Link your MercadoLibre, Amazon, or Shopify account with one click. Shopilot syncs your products, sales, and metrics automatically.Vincula tu cuenta de MercadoLibre, Amazon o Shopify con un click. Shopilot sincroniza tus productos, ventas y metricas automaticamente.
3. Browse & Chat3. Navega y Chatea
Browse your marketplace normally. Shopilot's sidebar is always available — ask anything about your store.Navega tu marketplace normalmente. La barra lateral de Shopilot siempre esta disponible — pregunta lo que quieras sobre tu tienda.
4. Act With Permission4. Actua Con Permiso
Shopilot can update titles, adjust prices, and manage listings — but always asks for your confirmation first. You stay in control.Shopilot puede actualizar titulos, ajustar precios y gestionar publicaciones — pero siempre pide tu confirmacion primero. Tu mantienes el control.
5. Smart Suggestions5. Sugerencias Inteligentes
While you chat, Shopilot detects opportunities: "Your competitor dropped prices on 3 products — want me to adjust yours?" Act on them instantly.Mientras conversas, Shopilot detecta oportunidades: "Tu competidor bajó precios en 3 productos — ¿quieres que ajuste los tuyos?" Actúa sobre ellas al instante.
What Makes Us Different Que Nos Hace Diferentes
Native App, Not ExtensionApp Nativa, No Extension
A real desktop application — no browser extensions that slow down your store, break with updates, or leak your data.Una aplicacion de escritorio real — sin extensiones de navegador que ralenticen tu tienda, se rompan con actualizaciones o filtren tus datos.
AI That Reasons, Not RulesIA Que Razona, No Reglas
Powered by Claude — understands context, nuance, and your business. Not a rigid rule-based system that gives the same advice to everyone.Impulsado por Claude — entiende contexto, matices y tu negocio. No es un sistema rigido de reglas que da el mismo consejo a todos.
3 Marketplaces, 1 Tool3 Marketplaces, 1 Herramienta
MercadoLibre + Amazon + Shopify in one assistant. Most tools only cover one platform. We cover where LatAm sellers actually sell.MercadoLibre + Amazon + Shopify en un solo asistente. La mayoria de herramientas solo cubren una plataforma. Nosotros cubrimos donde los vendedores LatAm realmente venden.
Business Model Modelo de Negocio
Free — $0/mo
50 actions/month, read-only. Try before you buy.50 acciones/mes, solo lectura. Prueba antes de comprar.
Pro — $49/mo
500 actions/month, read + write + proactive alerts. The real product.500 acciones/mes, lectura + escritura + alertas proactivas. El producto real.
Credit PacksPaquetes de Creditos
Need more? Buy packs: $5/100, $20/500, $35/1000 credits. Pro users only.Necesitas mas? Compra paquetes: $5/100, $20/500, $35/1000 creditos. Solo usuarios Pro.
Unit economics: Our AI cost per user is ~$4/month. At $49/month Pro pricing, that's a 91% gross margin. The business works from user #12. Unit economics: Nuestro costo de IA por usuario es ~$4/mes. A $49/mes precio Pro, eso es un 91% de margen bruto. El negocio funciona desde el usuario #12.
The 10+2 Week Plan El Plan de 10+2 Semanas
4 engineers building in parallel for 12 weeks (10 core + 2 buffer). Each engineer owns a vertical: one builds the AI brain, one builds the data pipes, one builds the app, and the CEO (also a product engineer) owns product quality and launch. Every 2 weeks there's a clear deliverable. By week 10, real sellers are using the product. Weeks 11-12 absorb beta fixes, deferred scope, and hardening. 4 ingenieros construyendo en paralelo por 12 semanas (10 core + 2 buffer). Cada ingeniero es dueno de una vertical: uno construye el cerebro de IA, otro los pipes de datos, otro la app, y el CEO (tambien product engineer) es dueno de la calidad del producto y el lanzamiento. Cada 2 semanas hay un entregable claro. Para la semana 10, vendedores reales estan usando el producto. Semanas 11-12 absorben bugs de beta, scope diferido y hardening.
Mateo
CTO
AI + OrchestrationIA + Orquestacion
Andres
Data + BE
APIs + DataAPIs + Datos
Sergio
Full-Stack
App + UIApp + UI
Pablo
CEO / PE
Product + QAProducto + QA
Success Metrics Metricas de Exito
1+
Action in First SessionAccion en Primera Sesion
Activation — user gets value immediatelyActivacion — usuario obtiene valor de inmediato
48h
Return Within 48 HoursRetorno en 48 Horas
Retention — product is worth coming back toRetencion — el producto vale la pena volver
60%
Time Saved vs ManualTiempo Ahorrado vs Manual
Value — Shopilot is measurably fasterValor — Shopilot es mediblemente mas rapido
9.2 Execution Philosophy — SV/YC Methodology Filosofia de Ejecucion — Metodologia SV/YC
4 engineers × AI — ship in 10+2 weeks. (aspiration: leverage AI to operate above headcount) 4 ingenieros × IA — entregar en 10+2 semanas. (aspiración: usar IA para operar por encima del headcount)
This plan fuses YC Build Sprint (12-week cycles, weekly accountability, launch early) with Shape Up (L/M/S task classification, appetite-based scoping, circuit breakers) and amplifies it with Beautonomous (#17 CORE) — the AI operational agent that eliminates coordination overhead. Este plan fusiona el Build Sprint de YC (ciclos de 12 semanas, accountability semanal, lanzar temprano) con Shape Up (clasificacion L/M/S de tareas, scoping por apetito, circuit breakers) y lo amplifica con Beautonomous (#17 CORE) — el agente operacional IA que elimina el overhead de coordinacion.
A — Three Founding Pillars A — Tres Pilares Fundacionales
Do Things That Don't Scale
Onboard every beta user personally. Write every KB doc manually. Review every PR. Automate later — earn trust first.Hacer onboarding personal a cada usuario beta. Escribir cada doc KB manualmente. Revisar cada PR. Automatizar despues — ganar confianza primero.
Default Alive
Every spending decision: does this help us reach revenue before runway ends? Free tier is acquisition, Pro tier is survival. Frugal by design.Cada decision de gasto: ¿ayuda a llegar a revenue antes de que termine el runway? Tier Free es adquisicion, tier Pro es supervivencia. Frugal por diseño.
OMTM: Tools Executed / Week [CORREGIDO]
One Metric That Matters. Not signups, not MRR — tools executed per week per active user. That's the proof the copilot is delivering value.La Unica Metrica que Importa. No signups, no MRR — tools ejecutadas por semana por usuario activo. Esa es la prueba de que el copilot entrega valor.
YC PrinciplesPrincipios YC
• Weekly goals within 2-week cyclesObjetivos semanales dentro de ciclos de 2 semanas
• Launch early, launch oftenLanzar temprano, lanzar seguido
• Risk-first: address uncertainty earlyRiesgo primero: abordar incertidumbre temprano
• Maker's schedule: 4h uninterrupted blocksHorario maker: bloques de 4h sin interrupciones
Shape Up PatternsPatrones Shape Up
• Appetite (not estimate): 2 weeks per scopeApetito (no estimado): 2 semanas por scope
• Circuit breaker: not done at deadline = cutCircuit breaker: no listo al deadline = cortar
• Hill chart: "figuring out" → "making it happen"Hill chart: "descubriendo" → "haciendolo"
• Scopes (not tasks): group by user outcomeScopes (no tareas): agrupar por resultado usuario
CeremoniesCeremonias
• Async standup daily 9:30 AM (Linear+Slack)Standup asincrono diario 9:30 AM (Linear+Slack)
• Cycle planning biweekly (60 min sync)Planeacion de ciclo bisemanal (60 min sync)
• Friday demo (30 min, each engineer demos)Demo viernes (30 min, cada ingeniero demuestra)
• Retro biweekly (30 min, Lean Coffee)Retro bisemanal (30 min, Lean Coffee)
B — Sprint Contract: Success Criteria per Sprint B — Contrato de Sprint: Criterios de Exito por Sprint
| Sprint | Label | Success Criteria | Gate |
|---|---|---|---|
| S0 | Pre-Sprint | CORE operational. All 4 engineers aligned. Beautonomous managing Linear + GitHub + Slack. Zero ambiguity before W1.CORE operacional. Los 4 ingenieros alineados. Beautonomous gestionando Linear + GitHub + Slack. Cero ambiguedad antes de S1. | T0.8 ✓ |
| S1–2 | Foundation | Walking skeleton E2E: Electron loads marketplace → sidebar sends message → ReAct loop processes → response returns. Ugly OK — architecture proven.Walking skeleton E2E: Electron carga marketplace → sidebar envia mensaje → loop ReAct procesa → respuesta retorna. Feo OK — arquitectura probada. | W2 |
| S3–4 | Core Engines | 10 READ tools registered as stubs (mock data, T2.5). IContextAssembler + Health summary working. Eval runner executes 15+ golden cases. Tool Registry + HookLifecycle deployed.10 tools READ registradas como stubs (datos mock, T2.5). IContextAssembler + Health summary funcionando. Eval runner ejecuta 15+ golden cases. Tool Registry + HookLifecycle desplegados. | Gate 1 |
| S5–6 | WRITE Tools | First WRITE tools (update_product_content, update_price, pause_product, activate_product) execute on all 3 marketplaces. Confirmation flow works. Billing Free tier live. Enrichment returns competitor data. Eval CI integration blocks PRs on regression.Primeros tools WRITE (update_product_content, update_price, pause_product, activate_product) ejecutan en los 3 marketplaces. Flujo de confirmacion funciona. Billing Free tier vivo. Enrichment retorna datos de competidores. Eval CI integration bloquea PRs en regresión. |
W6 |
| S7–8 | Hardening | 4+ WRITE tools operational (more per circuit breaker capacity). WebSocket streaming live. Proactive suggestions via afterTool LLM hook (max 2/turn). Load test: 50 concurrent users passes. Eval score ≥0.70. Staging deployed.4+ tools WRITE operacionales (más las que quepan según circuit breaker). WebSocket streaming vivo. Sugerencias proactivas via hook LLM afterTool (max 2/turno). Load test: 50 usuarios concurrentes pasa. Eval score ≥0.70. Staging desplegado. |
Gate 2 |
| S9–10 | Launch | Beta: 10+ real sellers onboarded. 0 P0/P1 bugs. .dmg signed + notarized. Production deployed. OMTM: ≥1 tool/user/week. Eval score ≥0.70.Beta: 10+ vendedores reales onboardeados. 0 bugs P0/P1. .dmg firmado + notarizado. Produccion desplegada. OMTM: ≥1 tool/usuario/semana. Eval score ≥0.70. | Gate 3 |
| S11–12 | Buffer | Beta bug fixes (P1/P2). Performance hardening (p95, RAM). Deferred scope from circuit breaker (remaining WRITE tools, ProactiveSuggestions v2). Eval score target 0.80. System prompt v3 with real beta data.Bug fixes de beta (P1/P2). Hardening de performance (p95, RAM). Scope diferido por circuit breaker (WRITE tools restantes, ProactiveSuggestions v2). Eval score target 0.80. System prompt v3 con datos reales de beta. | — |
C — Decision Gates (Go / No-Go) C — Gates de Decision (Go / No-Go)
Gate 1 — "It Talks" (W4)Gate 1 — "Habla" (S4)
Owner: Pablo (CEO). Held: Friday W4 demo.Owner: Pablo (CEO). Fecha: Demo viernes S4.
✓ Coach responds coherently in Spanish to seller questionsCoach responde coherentemente en español a preguntas de vendedor
✓ ReAct loop calls ≥1 tool per relevant queryLoop ReAct llama ≥1 tool por query relevante
✓ KB docs indexed — context injection workingDocs KB indexados — context injection funcionando
✓ Electron app loads MeLi URL without crashesApp Electron carga URL MeLi sin crashes
✓ Unit test coverage ≥70%Cobertura tests unitarios ≥70%
✓ Beautonomous used for all task management (Linear + GitHub via CORE)Beautonomous usado para todo el manejo de tareas (Linear + GitHub via CORE)
✗ No-Go: loop doesn't use tools OR response incoherentNo-Go: loop no usa tools O respuesta incoherente
Gate 2 — "It Acts" (W8)Gate 2 — "Actúa" (S8)
Owner: Pablo (CEO). Held: Friday W8 demo.Owner: Pablo (CEO). Fecha: Demo viernes S8.
✓ WRITE tools execute real changes on MeLi + Amazon + ShopifyTools WRITE ejecutan cambios reales en MeLi + Amazon + Shopify
✓ Confirmation flow: diff shown, Accept/Reject worksFlujo confirmacion: diff mostrado, Accept/Reject funciona
✓ Billing: Free tier limits enforced, Pro upgrade worksBilling: limites Free tier aplicados, upgrade Pro funciona
✓ Load test 50 concurrent users passesLoad test 50 usuarios concurrentes pasa
✓ WebSocket streaming live (T4.1)WebSocket streaming vivo (T4.1)
✓ Proactive suggestions active via afterTool hookSugerencias proactivas activas via hook afterTool
✓ Eval score ≥0.70Eval score ≥0.70
✓ CI/CD pipeline auto-deploys to staging on mergePipeline CI/CD auto-deploy a staging en merge
✓ E2E tests ≥30 passingE2E tests ≥30 pasando
✗ No-Go: WRITE tool fails OR confirmation flow brokenNo-Go: tool WRITE falla O flujo confirmacion roto
Gate 3 — "It Ships" (W10)Gate 3 — "Entrega" (S10)
Owner: All 4 engineers. Held: Final Go/No-Go sync.Owner: Los 4 ingenieros. Fecha: Sync final Go/No-Go.
✓ Beta cohort: ≥10 real sellers onboardedCohort beta: ≥10 vendedores reales onboardeados
✓ 0 high-severity bugs (P0/P1)0 bugs alta severidad (P0/P1)
✓ .dmg signed + notarized, installs without Gatekeeper warning.dmg firmado + notarizado, instala sin warning de Gatekeeper
✓ OMTM baseline: ≥1 tool executed per active user per weekBaseline OMTM: ≥1 tool ejecutada por usuario activo por semana
✓ Billing Stripe live (production)Billing Stripe en vivo (producción)
✓ Eval score ≥0.70Eval score ≥0.70
✓ API p95 <3sAPI p95 <3s
✓ Guardrails active (ToolPolicyFilter enforced)Guardrails activos (ToolPolicyFilter aplicado)
✓ OWASP review approvedRevisión OWASP aprobada
✗ No-Go: P0 bug open OR <5 users onboardedNo-Go: bug P0 abierto O <5 usuarios onboardeados
D — Linear Structure (Exportable)D — Estructura Linear (Exportable)
TeamEquipo
Shopilot (AUT)
CyclesCiclos
6 × 2-week cycles (incl. buffer)6 × ciclos de 2 semanas (incl. buffer)
ProjectsProyectos
19 active (1 per project)19 activos (1 por proyecto)
Labels
L/M/S (size) • Track-Mateo/Andres/Sergio/Pablo • Risk-high/medium/low • Spike
Workflow: Backlog → Todo → In Progress → In Review → Done. Relations: blocks / is-blocked-by for dependencies. Workflow: Backlog → Todo → In Progress → In Review → Done. Relaciones: bloquea / bloqueado-por para dependencias.
E — Task Decomposition PatternE — Patron de Descomposicion de Tareas
Epic
5-10 daysdias
Story
1-3 daysdias
Task
2-8 hourshoras
Sub-task
1-4 hourshoras
Rule: if a Task takes >8h, break it down. Single-threaded ownership: 1 owner per task, no committees.Regla: si un Task toma >8h, desglosarlo. Propiedad single-threaded: 1 dueno por tarea, sin comites.
6 Phases — ~150 Tasks — 17 Projects — 4 Engineers6 Fases — ~150 Tareas — 17 Proyectos — 4 Ingenieros
Phase 0
Pre-Sprint
8 tasks • W0 • #17 CORE
Phase 1
Foundation
62 tasks • S1-4
Phase 2
Full Features
60 tasks • S5-8
Phase 3
Polish & Launch
17 tasks • S9-10
9.3 Pre-Sprint 0: Technical Alignment Session Pre-Sprint 0: Sesion de Alineacion Técnica
Project ParametersParámetros del Proyecto
12
weeks (10+2)semanas (10+2)
4
engineersingenieros
183
taskstareas
383
story points
OMTM: Tools Executed / Week / Active User — proof the copilot delivers real value.OMTM: Tools Ejecutadas / Semana / Usuario Activo — prueba de que el copilot entrega valor real.
Methodology — Shape Up + Scrum + Kanban (Hybrid L/M/S)Metodología — Shape Up + Scrum + Kanban (Híbrido L/M/S)
| SizeTamaño | TimeTiempo | ModelModelo | CeremonyCeremonia | ExampleEjemplo |
|---|---|---|---|---|
| L | >3 dias | Shape Up bet | Discovery + 2-week appetite. Circuit breaker if unfinished.Descubrimiento + apetito 2 sem. Circuit breaker si no termina. | AgentLoopOrchestrator, MeLiAdapter, Electron Shell |
| M | 1–3 dias | Scrum story | Sprint planning + clear ACs + PR review.Sprint planning + ACs claros + PR review. | IContextWindowManager, TokenRefreshCron, BillingView |
| S | <1 dia | Kanban card | Pull from backlog, execute, merge. WIP limit: 2 per engineer.Pull del backlog, ejecutar, merge. Límite WIP: 2 por ingeniero. | GSI projection fix, ESLint config, OAuth Slack connect |
Assignment rule: L = bet at cycle start (circuit breaker if not done). M = sprint-planned + estimated. S = pull Kanban, no standup. Distribution target: ~1% L + ~81% M + ~18% S.Regla de asignación: L = apuesta inicio de ciclo (circuit breaker si no termina). M = planificada en sprint + estimada. S = pull Kanban, sin standup. Distribución objetivo: ~1% L + ~81% M + ~18% S.
1. Do Things That Don't Scale1. Haz Cosas Que No Escalan
Personal onboarding, manual KB docs, review every PR via BeautonomousOnboarding personal, docs KB manuales, review de cada PR via Beautonomous
2. Default Alive
Every expense justified against runway. Free = acquisition, Pro = survivalCada gasto justificado contra runway. Free = adquisición, Pro = supervivencia
3. OMTM Focus
One metric: tools/week/user. Proves real value being delivered every sprintUna métrica: tools/semana/usuario. Prueba valor real entregado cada sprint
CeremoniesCeremonias
| CeremonyCeremonia | FreqFrec. | Dur. |
|---|---|---|
| Async standup (Linear+Slack)Standup asíncrono (Linear+Slack) | Daily 9:30 AMDiario 9:30 AM | Async |
| Cycle planningPlaneación ciclo | Bi-weeklyBisemanal | 60 min |
| Friday demoDemo viernes | WeeklySemanal | 30 min |
| Retro (Lean Coffee) | Bi-weeklyBisemanal | 30 min |
Critical Tech Debt — Before Sprint 1 (T1.0)Deuda Técnica Crítica — Antes de Sprint 1 (T1.0)
- • SK Message/Trace not time-sortable: UUID v4 → ULIDSK Message/Trace no time-sortable: UUID v4 → ULID
- •
findByMessageIdO(n) scan → SKO(n) scan → SKTrace#{messageId} - • GSI2 defined but never used → repurpose as sparse indexGSI2 definido nunca usado → sparse index
- •
queryEmbedding(6KB) in Trace → eliminate(6KB) en Trace → eliminar - •
ProjectionType.ALLon GSIs → change to INCLUDEen GSIs → cambiar a INCLUDE
Sprint ContractContrato de Sprint
| Sprint | Label | Success CriterionCriterio de Éxito | Gate |
|---|---|---|---|
| S0 | Pre-Sprint | CORE operational. Zero ambiguity. All 11 repos created.CORE operacional. Cero ambigüedad. 11 repos creados. | T0.8 |
| S1-2 | Foundation | Walking skeleton E2E: Electron → sidebar → ReAct → real MeLi data → response.Walking skeleton E2E: Electron → sidebar → ReAct → datos MeLi reales → respuesta. | — |
| S3-4 | Core Engines | 10 READ tools in 3 marketplaces. Context injection. Playground usable.10 tools READ en 3 marketplaces. Context injection. Playground usable. | Gate 1 |
| S5-6 | WRITE Tools | 4 WRITE tools execute in 3 marketplaces. ConfirmationFlow. Billing Free tier active.4 tools WRITE ejecutan en 3 marketplaces. ConfirmationFlow. Billing Free tier activo. | — |
| S7-8 | Hardening | Proactive suggestions live. Onboarding wizard E2E. Load test 50 users passes. Staging deployed.Sugerencias proactivas activas. Onboarding wizard E2E. Load test 50 usuarios pasa. Staging desplegado. | Gate 2 |
| S9-10 | Launch | Beta 10+ sellers. 0 P0 bugs. Signed .dmg. Production deployed. OMTM ≥1 tool/user/week.Beta 10+ vendedores. 0 bugs P0. .dmg firmado. Producción. OMTM ≥1 tool/usuario/semana. | Gate 3 |
| S11-12 | Buffer | Beta bug fixes (P1/P2). Performance hardening. Deferred scope from circuit breaker. Eval score target 0.80.Bug fixes de beta (P1/P2). Hardening de performance. Scope diferido por circuit breaker. Eval score target 0.80. | — |
Definition of Done — Per SprintDefinition of Done — Por Sprint
| AspectAspecto | S4 | S7 | S10 |
|---|---|---|---|
| Unit testsTests unitarios | ≥70% | ≥80% | ≥80% |
| E2E tests | ≥10 | ≥30 | ≥50 |
| API p95 | <5s | <3s | <3s |
| RAM Electron | <600MB | <500MB | <500MB |
| First token (streaming)Primer token (streaming) | — | <1s | <1s |
| Error rateTasa error | <5% | <1% | <1% |
| OAuth refresh | 100% | 100% | 100% |
Per task DoD: code reviewed via Beautonomous (El Mago) • unit tests for new logic • no blocking linter warnings • PR merged to main • task marked Done in Linear.DoD por tarea: código revisado via Beautonomous (El Mago) • tests unitarios para lógica nueva • sin warnings bloqueantes • PR mergeado a main • tarea marcada Done en Linear.
AssumptionsSupuestos
• 4 engineers full-time for 12 weeks (10 core + 2 buffer S11-12)4 ingenieros a tiempo completo 12 semanas (10 core + 2 buffer S11-12)
• MeLi, Amazon SP-API, Shopify Admin API — test accounts readyMeLi, Amazon SP-API, Shopify Admin API — cuentas prueba listas
• Anthropic account: Claude Sonnet 4 + prompt cachingCuenta Anthropic: Claude Sonnet 4 + prompt caching
• Apple Developer Program active (code signing + notarization)Apple Developer Program activo (code signing + notarización)
• Stripe configured (test + live modes)Stripe configurado (modos test + live)
• AWS + GCP provisioned with IAM/GCP rolesAWS + GCP aprovisionados con roles IAM/GCP
• Beautonomous (#17) operational before Sprint 1 — absolute prerequisiteBeautonomous (#17) operacional antes de Sprint 1 — prerequisito absoluto
• Real Sellerfy MeLi data available for testingDatos reales de Sellerfy (MeLi) disponibles para testing
⚠ Capacity Analysis — The Plan Is Aggressive⚠ Análisis de Capacidad — El Plan es Agresivo
237.5
days-engineer est.días-ingeniero est.
240
days-engineer avail.días-ingeniero dispon.
0.99x
ratio (with buffer)ratio (con buffer)
43d
S11-12 marginmargen S11-12
L tasks (~1%): 2 × 4.5d = 9 days • M tasks (~80%): 124 × 1.65d = 204.5 days • S tasks (~19%): 31 × 0.8d = 24 days = 237.5 days-engineer. With 4 engineers × 12 weeks = 240 available.Tareas L (~1%): 2 × 4.5d = 9 días • M (~80%): 124 × 1.65d = 204.5 días • S (~19%): 31 × 0.8d = 24 días = 237.5 días-ingeniero. Con 4 ingenieros × 12 sem = 240 disponibles.
The 0.99x ratio means the plan is near capacity — buffer is essential. Without buffer (S1-S10 only, 200 days), ratio is 1.19x — aggressive but feasible with buffer. S11-12 provide 2.5d slack + 40d buffer = 43d for circuit breaker overflow and beta fixes.El ratio 0.99x significa que el plan está cerca de capacidad — el buffer es esencial. Sin buffer (S1-S10, 200 días), el ratio es 1.19x — agresivo pero viable con buffer. S11-12 aportan 2.5d slack + 40d buffer = 43d para overflow del circuit breaker y bugs de beta.
9.4 Sprint-by-Sprint Visual Timeline Timeline Visual Sprint por Sprint
6 two-week sprints (10 core + 2 buffer). 4 parallel tracks. 3 integration gates. Each cell shows the primary deliverable. 6 sprints de dos semanas (10 core + 2 buffer). 4 tracks paralelos. 3 gates de integración. Cada celda muestra el entregable principal.
FoundationFundacion
Core EnginesMotores Core
WRITE Tools + AuthTools WRITE + Auth
Proactive + PolishProactivo + Polish
Beta + ShipBeta + Ship
BufferBuffer
ReAct LoopLoop ReAct
#2 + multi-turn history + REST API + DynamoDB fix (ULID, GSI) + UserProfile + SystemPromptComposer L1+L2
Tools + Context + CachingTools + Contexto + Caching
Tool Registry + IContextAssembler + prompt caching + WRITE stubs + update_user_profile + contextSummary
WRITE Tools + EnrichmentWRITE Tools + Enrichment
#3 WRITE tools + #7 Guardrails + #11 Enrichment + HttpCreditGate
Proactive + Streaming + FeedbackProactivo + Streaming + Feedback
#6 ProactiveSugg + WS streaming + FeedbackCapture + ActionLog + OutputGuard + SystemPromptComposer L3
Bug Fix + QAFix Bugs + QA
Monitoring + Observability
Hardening + WRITE deferHardening + WRITE defer
P1/P2 + advertising tools + p95 + ProactiveSugg v2
Adapters + OAuth + InfraAdaptadores + OAuth + Infra
#12 MeLi + Amazon scaffold + OAuth2 + SellerConnection + MarketplaceAction + Terraform GCP verify + WRITE API docs + user mgmt research
Shopify + Data + CIShopify + Data + CI
#12 Shopify + AmazonAds OAuth + ISKUResolver + TokenRefreshCron + #10 Clean Arch + DAGs verify + #14 CDK base + CI multi-repo
Fast Data + Rate Limit + CIFast Data + Rate Limit + CI
#10 Fast Data 11 endpoints + GCS snapshots + DAG Amazon + #12 IRateLimiter + onboarding trigger + CI/CD 11 repos
Staging + Load Test + WebSocketStaging + Load Test + WebSocket
#14 Staging deploy + load test + CloudWatch + WebSocket CDK + #10 Silver/Gold
Prod Deploy + Data PipelineDeploy Prod + Data Pipeline
#14 CDK + Terraform prod + rollback testing + #10 OpenMetadata + embeddings DAGs
Prod HardeningHardening Prod
CloudWatch + adapter fixes + Silver→Gold DAG
Electron Shell + MK1Shell Electron + MK1
#1 + WebContentsView + Tabs (con tokens T0.BB) + Mockup shell container
Chat UI + MockupsChat UI + Mockups
Chat UI (T1.BB) + WebSocket + OnboardingWizard (T1.BB) + MK1 ChatView + MK2 Onboarding
Billing + Views + MockupsBilling + Vistas + Mockups
#13 Stripe + Confirmations (T2.BB) + Cards + ProfileView + MK1 Billing + MK2 Profile + MK3 ConfirmDialog
Enrollment + Feedback + MockupsEnrollment + Feedback + Mockups
#1 WS client (T3.BB) + EnrollmentView + #15 FeedbackLoop + MK1 Enrollment + MK2 flujo WRITE
Ship .dmg + MK DashboardShip .dmg + MK Dashboard
Code signing + Security + Bug fixes (T4.BB) + MK1 Dashboard view
Beta Fixes + WindowsFixes Beta + Windows
P1/P2 UI + auto-updater S3 + FeedbackThrottle + Windows build
Foundations + AtomsFoundations + Atoms
T0.BB Brand book + Foundations + Icons + T1.BB Atoms + AI-native + Molecules + Chat organisms
Molecules + OrganismsMolecules + Organismos
T2.BB Molecules restantes + ConfirmDialog + ToolAccordion + MarketplaceKPI + CreditEconomy + EnrollmentCard
Advanced OrganismsOrganismos Avanzados
T3.BB ReActStream + DataTable + AuditLog + RollbackPanel + FraudAlert + ErrorRecovery. Publish [LIB] Pattern Components
Quality AuditAuditoría Calidad
T4.BB All frames “Ready for development”, zero generic names, variables verified, annotations
Pipeline ClosedPipeline Cerrado
Point queries onlySolo consultas puntuales
—
KB + Beautonomous + UX/UIKB + Beautonomous + UX/UI
#17 bootstrap + Eval Setup + brand reg + Apple/Win auth + #18 approves T0.BB + T1.BB
Eval + Quality + UX/UIEval + Quality + UX/UI
#16 LLM Judge + EvalRunner + E2E testing + #17 Linear + Quality gate + #18 approves T2.BB
QA + Eval + UX/UIQA + Eval + UX/UI
#16 LLM-as-Judge + Real data QA + #18 approves T3.BB
Eval + Beta + UX/UIEval + Beta + UX/UI
#16 Eval CI + testing proactivas + beta selection + contract testing + #18 approves T4.BB
Launch + E2E EvalLanzamiento + E2E Eval
Beta + Feedback + Security + #16 E2E eval pipeline + #17 Beautonomous prompt v2 + Go/No-Go
Eval 0.80 + KB v3Eval 0.80 + KB v3
Golden cases from beta + KB from gaps + 2nd feedback round
S4 Demo — "It Talks"S4 Demo — "Habla"
User asks a question in the Electron sidebar → ReAct loop processes → 10 READ tool stubs respond → streamed answer with KB context in chat.Usuario hace una pregunta en el sidebar de Electron → loop ReAct procesa → 10 READ tool stubs responden → respuesta con contexto KB en el chat.
S8 Demo — "It Acts"S8 Demo — "Actúa"
User says "change this price" → Shopilot shows preview → user confirms → price updated on MeLi → billing deducted → proactive suggestion appears in conversation.Usuario dice "cambia este precio" → Shopilot muestra preview → usuario confirma → precio actualizado en MeLi → billing descontado → sugerencia proactiva aparece en la conversación.
S10 Demo — "It Ships"S10 Demo — "Se Lanza"
Seller downloads .dmg → installs → connects 3 marketplaces → asks, acts, receives proactive suggestions during the conversation → billing works → production-ready.Vendedor descarga .dmg → instala → conecta 3 marketplaces → pregunta, actúa, recibe sugerencias proactivas durante la conversación → billing funciona → listo para producción.
9.5 Week-by-Week Deliverables Matrix Matriz de Entregables Semana por Semana
Each cell is a concrete, testable deliverable. Bold = demo day deliverable. Color-coded by engineer. Cada celda es un entregable concreto y testeable. Bold = entregable de demo day. Codificado por color por ingeniero.
| WeekSem | Mateo (CTO) | Andres (Data+BE) | Sergio (Full-Stack) | Pablo (CEO/PE) |
|---|---|---|---|---|
| 0 | Pre-Sprint: Technical alignment session (2h, all 4). Pablo: Project #17 CORE bootstrap (OpenClaw + roles + system prompt) Pre-Sprint: Sesión alineación técnica (2h, los 4). Pablo: Bootstrap Proyecto #17 CORE (OpenClaw + roles + system prompt) | |||
| 1 | DynamoDB fix ULID + UserProfile + ILLMClientDynamoDB fix ULID + UserProfile + ILLMClient | Scaffold Marketplace Provider + IMarketplaceAdapter + AES256GCMCipher + SellerConnection + IOAuth2Flow + WRITE API docs (MeLi 3, AmazonAds 5, Amazon 2, Shopify 9) + user mgmt provider researchScaffold Marketplace Provider + IMarketplaceAdapter + AES256GCMCipher + SellerConnection + IOAuth2Flow + docs APIs WRITE (MeLi 3, AmazonAds 5, Amazon 2, Shopify 9) + investigación proveedor gestor usuarios | Electron scaffold + WebContentsView + MarketplaceDetector + Auth Memberstack + canary build (sem 1, sin Figma)Scaffold Electron + WebContentsView + MarketplaceDetector + Auth Memberstack + canary build (sem 1, sin Figma) | Eval Setup + golden dataset 15-20 cases + brand registration + Apple/Win auth. #18 UX/UI: T0.BB Brand book + Foundations Figma (approves end wk1)Eval Setup + golden dataset 15-20 casos + registro marca + auth Apple/Win. #18 UX/UI: T0.BB Brand book + Foundations Figma (aprueba fin sem1) |
| 2 | AgentLoopOrchestrator ReAct + RestResponseEventEmitter + verify ObservabilityAgentLoopOrchestrator ReAct + RestResponseEventEmitter + verificar Observability | Amazon scaffold + MeLiOAuth2Flow + Terraform GCP verify + external deps + MarketplaceAction entityScaffold Amazon + MeLiOAuth2Flow + Terraform GCP verify + deps externas + entidad MarketplaceAction | Tabs + Sidebar 2.5d (con tokens T0.BB) + Mockup shell container (T1.MK1)Tabs + Sidebar 2.5d (con tokens T0.BB) + Mockup shell container (T1.MK1) | KB: 15-20 docs + 10 READ tool specs. #18 UX/UI: T1.BB Atoms + Molecules + Chat organisms (approves end wk2)KB: 15-20 docs + 10 specs tools READ. #18 UX/UI: T1.BB Atoms + Molecules + Organismos chat (aprueba fin sem2) |
| 3 | Tool definitions (ToolDefinition class + HookLifecycle)Definiciones de tools (clase ToolDefinition + HookLifecycle) | ShopifyOAuth2Flow + ShopifyAdapter + Data Sync Clean Arch refactorShopifyOAuth2Flow + ShopifyAdapter + refactor Clean Arch Data Sync | Chat UI 2.5d (T1.BB components) + WebSocket client + URL context injectionChat UI 2.5d (componentes T1.BB) + WebSocket client + inyección contexto URL | KB incremental + batch embeddings + Eval LLM Judge + EvalRunner. #18 UX/UI: T2.BB Molecules + Organisms (delivery S3-4)KB procesamiento incremental + batch embeddings + Eval LLM Judge + EvalRunner. #18 UX/UI: T2.BB Molecules + Organismos (entrega S3-4) |
| 4 | 10 READ stubs + WRITE stubs + SYSTEM tool + IContextAssembler + Health summary + prompt caching10 stubs READ + stubs WRITE + tool SYSTEM + IContextAssembler + Health summary + prompt caching | AmazonAdapter complete (if E1) + TokenRefreshCron + CDK base AWS + CI multi-repo + AmazonAdsOAuth + ISKUResolverAmazonAdapter completo (si E1) + TokenRefreshCron + CDK base AWS + CI multi-repo + AmazonAdsOAuth + ISKUResolver | OnboardingWizard 2.5d (T1.BB) + react-router views + MK1 ChatView + MK2 OnboardingWizard + Gate 1 signed buildOnboardingWizard 2.5d (T1.BB) + vistas react-router + MK1 ChatView + MK2 OnboardingWizard + build firmado Gate 1 | E2E testing Playground + bootstrap ~150 tasks Linear + Quality gate 5-step Beautonomous. #18 approves T2.BBTesting E2E Playground + bootstrap ~150 tareas Linear + Quality gate 5-step Beautonomous. #18 aprueba T2.BB |
| GATE 1 — "It Talks"GATE 1 — "Habla" | ||||
| 5 | 10 real READ handlers + ConfirmationFlow + InputGuard + HttpCreditGate10 handlers READ reales + ConfirmationFlow + InputGuard + HttpCreditGate | Fast Data Layer 11 endpoints + GCS snapshots + DAG AmazonFast Data Layer 11 endpoints + snapshots GCS + DAG Amazon | BillingView 2.5d (T2.BB) + ProfileView (T2.BB) + Stripe Checkout + billing backendBillingView 2.5d (T2.BB) + ProfileView (T2.BB) + Stripe Checkout + billing backend | KB BigQuery indexing + Eval CI + golden dataset 50. #18 UX/UI: T3.BB Advanced Organisms (delivery S5-6)KB Indexación BigQuery + Eval CI + golden dataset 50. #18 UX/UI: T3.BB Organismos avanzados (entrega S5-6) |
| 6 | 4 WRITE tools + ProactiveSuggestionService + Enrichment scaffold + MeLi market intelligence + VisionLLM + 8 ANALYSIS handlers4 tools WRITE + ProactiveSuggestionService + scaffold Enrichment + MeLi market intelligence + VisionLLM + 8 handlers ANALYSIS | IRateLimiter per marketplace + onboarding trigger + CI/CD 11 reposIRateLimiter por marketplace + onboarding trigger + CI/CD 11 repos | Confirmation dialogs (T2.BB) + suggestion cards 1.5d (T2.BB) + MK1 BillingView + MK2 ProfileView + MK3 ConfirmDialogDiálogos confirmación (T2.BB) + cards sugerencias 1.5d (T2.BB) + MK1 BillingView + MK2 ProfileView + MK3 ConfirmDialog | QA conversation flows 3 marketplaces + golden dataset edge cases. #18 approves T3.BBQA flujos conversación 3 marketplaces + golden dataset edge cases. #18 aprueba T3.BB |
| 7 | WebSocket streaming + SystemPromptComposer L3 + OutputGuard + FeedbackCapture in HookLifecycleWebSocket streaming + SystemPromptComposer L3 + OutputGuard + FeedbackCapture en HookLifecycle | Load testing (50 users) + staging deployLoad testing (50 usuarios) + deploy staging | WS client 2.5d (T3.BB: ReActStream + RollbackPanel) + EnrollmentView + Sentry + Feedback Loop scaffoldWS client 2.5d (T3.BB: ReActStream + RollbackPanel) + EnrollmentView + Sentry + scaffold Feedback Loop | Proactive suggestions testing + KB batch v2 + Eval automated CI. #18 UX/UI: T4.BB Quality Audit (delivery S7-8)Testing sugerencias proactivas + KB batch v2 + Eval automatizado CI. #18 UX/UI: T4.BB Auditoría calidad (entrega S7-8) |
| 8 | Remaining WRITE tools (circuit breaker) + ActionLog entity + p95 optimizationWRITE tools restantes (circuit breaker) + entidad ActionLog + optimización p95 | #12 + #10 integration tested + CloudWatch dashboard + PagerDuty alerts + WebSocket CDK + Silver/Gold circuit breaker#12 + #10 integración testeada + dashboard CloudWatch + alertas PagerDuty + WebSocket CDK + circuit breaker Silver/Gold | FeedbackMeasurer + FeedbackGate + explicit/implicit + grace 7d + MK1 EnrollmentView + MK2 flujo WRITE + Gate 2 buildFeedbackMeasurer + FeedbackGate + explicit/implicit + grace 7d + MK1 EnrollmentView + MK2 flujo WRITE + build Gate 2 | Contract testing + KB quality eval + beta selection + onboarding prep. #18 approves T4.BBContract testing + eval calidad KB + selección beta + prep onboarding. #18 aprueba T4.BB |
| GATE 2 — "It Acts"GATE 2 — "Actúa" | ||||
| 9 | Bug fixes + agent quality tuningFix bugs + tuning calidad agente | Prod deploy (Lambda CDK) + RDS backupsDeploy prod (Lambda CDK) + backups RDS | Code signing + .dmg + bug fixes 3.5d (post T4.BB audit) + Billing Stripe liveCode signing + .dmg + bug fixes 3.5d (post auditoría T4.BB) + Billing Stripe live | Beta onboarding (10-15 sellers)Onboarding beta (10-15 vendedores) |
| 10 | System Prompt v3 final + P1/P2 intelligence bug fixesSystem Prompt v3 final + bug fixes P1/P2 inteligencia | #14 CDK + Terraform + SSL + domain + rollback testing + Data Sync OpenMetadata#14 CDK + Terraform + SSL + dominio + rollback testing + Data Sync OpenMetadata | Security hardening + telemetry + MK1 Dashboard viewHardening seguridad + telemetría + MK1 Dashboard view | Feedback calls + security review + E2E eval pipeline + Go/No-Go checklistCalls feedback + review seguridad + pipeline E2E eval + checklist Go/No-Go |
| LAUNCH GATE — "It Ships"GATE LANZAMIENTO — "Se Lanza" | ||||
9.6 Engineer Deep Dive — 4 Tracks Detalle por Ingeniero — 4 Tracks
9.6.1 — Mateo Quintero — CTO
Mateo Quintero — CTO
Orchestration + Tools + Intelligence + Knowledge Base + Enrichment + ObservabilityOrquestacion + Tools + Inteligencia + Knowledge Base + Enrichment + Observabilidad
Mateo owns the AI brain of Shopilot. He builds the ReAct orchestrator (#2), the tool registry (#3), the personality engine (#4), the context aggregator (#5), the proactive suggestion engine (#6), the guardrails layer (#7), the observability system (#8), the Cerebro Knowledge Base (#9, Go 1.24 + Vertex AI + BigQuery vectors), and the enrichment layer (#11) for competitive analysis tools. Mateo es dueño del cerebro de IA de Shopilot. Construye el orquestador ReAct (#2), el tool registry (#3), el motor de personalidad (#4), el context aggregator (#5), el motor de sugerencias proactivas (#6), la capa de guardrails (#7), el sistema de observabilidad (#8), la Cerebro Knowledge Base (#9, Go 1.24 + Vertex AI + vectores BigQuery), y la capa de enrichment (#11) para tools de análisis competitivo.
Sprint 1-2 — ReAct Loop + REST + DynamoDB FixSprint 1-2 — Loop ReAct + REST + Fix DynamoDB
Goal: A working orchestrator that receives a user message, calls Claude with tools, executes tool calls, and returns the full response via REST (WebSocket upgrade is T4.1 in S7-8).Objetivo: Un orquestador funcional que recibe un mensaje de usuario, llama a Claude con tools, ejecuta tool calls, y retorna la respuesta completa via REST (el upgrade a WebSocket es T4.1 en S7-8).
• T1.1 — DynamoDB schema fix: SK UUID → ULID (time-sortable), findByMessageId O(n) → SK Trace#{messageId}, remove queryEmbedding (6KB), fix GSIsT1.1 — Fix schema DynamoDB: SK UUID → ULID, findByMessageId O(n) → SK Trace#{messageId}, eliminar queryEmbedding (6KB), fix GSIs
• T1.2 — UserProfile entity: pk User#{userId}, sk ProfileT1.2 — Entidad UserProfile: pk User#{userId}, sk Profile
• T1.3 — Conversation history in prompt: last N messages, findWindowForPrompt, token budget 200KT1.3 — Historial en el prompt: últimos N mensajes en prompt, findWindowForPrompt, token budget 200K
• T1.4 — ILLMClient update: chat() accepts toolDefinitions, returns ContentBlock[]T1.4 — Actualizar ILLMClient: chat() acepta toolDefinitions, retorna ContentBlock[]
• T1.5 — SystemPromptComposer L1+L2: base identity (cached) + session (UserProfile + alerts)T1.5 — SystemPromptComposer L1+L2: identidad base (cached) + sesión (UserProfile + alertas)
• T1.6 — Implement AgentLoop (ReAct): user_message → LLM (with tools) → tool_use? → execute → observe → repeat. MAX_ROUNDS=10, cost guard 50K tokensT1.6 — Implementar AgentLoop (ReAct): user_message → LLM (con tools) → tool_use? → ejecutar → observar → repetir. MAX_ROUNDS=10, cost guard 50K tokens
• T1.7 — RestResponseEventEmitter: full response post-rounds, no streamingT1.7 — RestResponseEventEmitter: respuesta completa post-rondas, sin streaming
• T1.8 — Verify Observability with ReAct: ConversationTrace + AgentTracking compatible with multi-step loopT1.8 — Verificar Observability con ReAct: ConversationTrace + AgentTracking compatibles con loop multi-step
• Unit tests for the loop (mock Claude responses, MAX_ROUNDS cutoff)Tests unitarios del loop (mock de respuestas Claude, corte MAX_ROUNDS)
• T1.21 — KB Phase 0 Fix duplicates: TRUNCATE before embed, embedded_at timestamp, CI Go 1.21→1.24T1.21 — KB Fase 0 Fix duplicados: TRUNCATE antes de embed, timestamp embedded_at, CI Go 1.21→1.24
• T1.22 — KB Phase 1 Contextual Retrieval: contextual prefix per chunk, Markdown section chunking, 150-char overlapT1.22 — KB Fase 1 Contextual Retrieval: prefijo contextual por chunk, chunking por secciones Markdown, overlap 150 chars
• T1.23 — KB content: 15-20 curated docs — MeLi best practices, Amazon policies, Shopify guidelines, pricing, photos, metrics, seller FAQT1.23 — Contenido KB: 15-20 docs curados — mejores prácticas MeLi, políticas Amazon, guías Shopify, pricing, fotos, métricas, FAQ vendedores
• T1.25 — 10 READ tool specs: name, LLM description, input_schema JSON Schema, risk level, credit cost per toolT1.25 — 10 specs tools READ: nombre, descripción LLM, JSON Schema input_schema, nivel riesgo, credit cost por tool
Dependencies: None — Mateo starts first. Sergio depends on HTTP endpoint being stable by end of W2.Dependencias: Ninguna — Mateo arranca primero. Sergio depende de endpoint HTTP estable para final de S2.
Deliverable: POST /conversation → send message → ReAct loop processes with multi-turn history → full REST response.Entregable: POST /conversation → enviar mensaje → loop ReAct procesa con historial multi-turno → respuesta REST completa.
Sprint 3-4 — Tool Registry + Context + StubsSprint 3-4 — Tool Registry + Contexto + Stubs
Goal: ToolRegistry with 10 READ stubs + 17 WRITE stubs + 1 SYSTEM tool registered. Policy filtering. HookLifecycle.Objetivo: ToolRegistry con 10 READ stubs + 17 WRITE stubs + 1 SYSTEM tool registrados. Filtrado de políticas. HookLifecycle.
• Define tools as Anthropic tool_use — name, description, input_schema (JSON Schema)Definir tools como Anthropic tool_use — name, description, input_schema (JSON Schema)
• T2.2 — IToolExecutor + ToolExecutor: execute(toolName, args, context) → ToolResult. T2.3 — ToolPolicyFilter: risk gate + marketplace gate. T2.4 — HookLifecycle: before_tool → execute → after_toolT2.2 — IToolExecutor + ToolExecutor: execute(toolName, args, context) → ToolResult. T2.3 — ToolPolicyFilter: risk gate + marketplace gate. T2.4 — HookLifecycle: before_tool → execute → after_tool
• Policy filtering: Free users — READ only; Pro users — READ + WRITE + ANALYSIS toolsFiltrado de políticas: usuarios Free — solo READ; Pro — READ + WRITE + ANALYSIS tools
• READ tools (10): get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metricsTools READ (10 stubs): get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics
• T2.5a — ToolResult domain model. T2.5b — update_user_profile SYSTEM tool. T2.5c — contextSummary. T2.5d — 17 WRITE tool stubs registered (no real execution in S3-4)T2.5a — Modelo de dominio ToolResult. T2.5b — SYSTEM tool update_user_profile. T2.5c — contextSummary. T2.5d — 17 stubs WRITE registrados (sin ejecución real en S3-4)
• T2.6 — IContextAssembler: KB + Brand Health RAG in parallel. T2.7 — structured health summary injected in system promptT2.6 — IContextAssembler: KB + Brand Health RAG en paralelo. T2.7 — resumen de salud estructurado inyectado en system prompt
• T2.8 — Anthropic prompt caching. T2.9 — Tool result in-memory cachingT2.8 — Prompt caching Anthropic. T2.9 — Tool result caching en memoria
• Integration tests: question → tool call → stub handler → result → responseTests de integración: pregunta → tool call → stub handler → resultado → respuesta
• T2.22 — KB Phase 2 Incremental processing: content hash SHA-256, is_current flag, only re-embed docs that changedT2.22 — KB Fase 2 Procesamiento incremental: content hash SHA-256, flag is_current, solo re-embeder docs que cambiaron
• T2.23 — KB Phase 3 Batch embeddings: up to 250 texts per Vertex AI call, goroutine pool max 5T2.23 — KB Fase 3 Batch embeddings: hasta 250 textos por llamada Vertex AI, goroutine pool max 5
Dependencies: T1.25 tool specs (own task). Handlers are stubs in S3-4 — do not depend on real adapters.Dependencias: T1.25 tool specs (tarea propia). Los handlers son stubs en S3-4 — no dependen de adaptadores reales.
Deliverable: "¿How are my metrics?" → Claude calls get_product_metrics → stub responds mock data → response with KB context.Entregable: "¿Cómo van mis métricas?" → Claude llama get_product_metrics → stub responde datos mock → respuesta con contexto KB.
Sprint 5-6 — WRITE Tools + Guardrails + Credits Gate + Enrichment + Billing + Token PipelineSprint 5-6 — WRITE Tools + Guardrails + Credits Gate + Enrichment + Billing + Token Pipeline
Goal: First WRITE tools working E2E with confirmation flow, InputGuard pre-LLM, HttpCreditGate, ProactiveEvaluator afterTool hook, and full Enrichment layer (8 ANALYSIS tools).Objetivo: Primeros tools WRITE funcionando E2E con flujo de confirmación, InputGuard pre-LLM, HttpCreditGate, hook ProactiveEvaluator afterTool, y capa Enrichment completa (8 tools ANALYSIS).
• T3.1 — 10 READ handlers connected to real Fast Data Layer or Marketplace Provider (replaces stubs from S3-4)T3.1 — 10 READ handlers reales conectados a Fast Data Layer o Marketplace Provider (reemplaza stubs de S3-4)
• T3.2 — ConfirmationFlow: when risk > read-only → pause loop → send preview → persist OrchestrationSession (DynamoDB, TTL 35min) → resume on confirmT3.2 — ConfirmationFlow: cuando riesgo > lectura → pausar loop → enviar preview → persistir OrchestrationSession (TTL 35min) → resumir al confirmar
• WRITE tools (phase 1): update_product_content, update_price, pause_product, activate_product — for all 3 marketplacesTools WRITE (fase 1): update_product_content, update_price, pause_product, activate_product — para los 3 marketplaces
• T3.5 — IGuardService + InputGuard: pattern matching + out-of-scope filtering pre-LLMT3.5 — IGuardService + InputGuard: pattern matching + filtrado fuera de scope pre-LLM
• T3.5a — HttpCreditGate: POST /internal/gate before each tool callT3.5a — HttpCreditGate: POST /internal/gate antes de cada tool call
• T3.14 — GCS pre-write snapshots for ConfirmationFlow (Andres provides endpoint)T3.14 — Snapshots GCS pre-write para ConfirmationFlow (Andres provee endpoint)
• T3.4 — ProactiveSuggestionService via afterTool hook: LLM inference post-tool — output: { hasSuggestion, message, suggestionType, priority, productId }T3.4 — ProactiveSuggestionService via hook afterTool: inferencia LLM post-tool — output: { hasSuggestion, message, suggestionType, priority, productId }
• T3.25 — KB BigQuery indexing: index 15-20 docs via Go pipeline, verify top-5 semantic search for 5 test queriesT3.25 — KB Indexación BigQuery: indexar 15-20 docs via pipeline Go, verificar top-5 semantic search para 5 queries de prueba
• T3.6–T3.11 — Enrichment complete: scaffold + MeLi market intelligence + Vision LLM + Redis cache + CDK + 8 ANALYSIS handlers (search_market_products, get_competitor_product, get_market_pricing, get_keyword_data, analyze_product_image, enhance_product_image, analyze_product_video, get_product_fee_estimate)T3.6–T3.11 — Enrichment completo: scaffold + MeLi market intelligence + Vision LLM + Redis cache + CDK + 8 ANALYSIS handlers
#18 Design System
• T3.32 — Token pipeline + Style Dictionary: Figma Variables → design-tokens.json → Style Dictionary build → CSS :root + tailwind.config.ts. CI validates token file on each PRT3.32 — Token pipeline + Style Dictionary: Figma Variables → design-tokens.json → build Style Dictionary → CSS :root + tailwind.config.ts. CI valida archivo de tokens en cada PR
Dependencies: Sergio's confirmation UI (S5-6) for the confirmation flow UX. Andres's Fast Data Layer for T3.1 real handlers.Dependencias: UI de confirmación de Sergio (S5-6) para flujo de confirmación. Fast Data Layer de Andres para T3.1 handlers reales.
Sprint 7-8 — Streaming + Proactivo + OutputGuard + ActionLog + FeedbackCaptureSprint 7-8 — Streaming + Proactivo + OutputGuard + ActionLog + FeedbackCapture
Goal: WebSocket streaming (T4.1), SystemPromptComposer L3 with WRITE guardrails, OutputGuard, ActionLog, FeedbackCapture, remaining WRITE tools, and p95 <3s optimization.Objetivo: Streaming WebSocket (T4.1), SystemPromptComposer L3 con guardrails WRITE, OutputGuard, ActionLog, FeedbackCapture, WRITE tools restantes, y optimización p95 <3s.
• T4.1 — WebSocket streaming: 8 server events, 4 client events (API Gateway WebSocket → Lambda → Electron WS client)T4.1 — WebSocket streaming: 8 server events, 4 client events (API Gateway WebSocket → Lambda → cliente WS Electron)
• T4.2 — SystemPromptComposer L3: conditional WRITE guardrails. Hard cap 1200 tokensT4.2 — SystemPromptComposer L3: guardrails WRITE condicionales. Hard cap 1200 tokens
• T4.3 — OutputGuard: cross-user data leak prevention + dangerous content filteringT4.3 — OutputGuard: prevención de fuga de datos cross-usuario + filtrado de contenido peligroso
• T4.4 — Remaining WRITE tools (circuit breaker): update_product_images, update_stock, publish_product, answer_question, etc.T4.4 — WRITE tools restantes (circuit breaker): update_product_images, update_stock, publish_product, answer_question, etc.
• T4.16 — KB batch + v2: batch Vertex AI embeddings 250/call, target >80% hit rate on 20 eval queriesT4.16 — KB batch + v2: batch embeddings Vertex AI 250/llamada, target >80% hit rate en 20 queries eval
• T4.5 — Performance optimization: target p95 <3sT4.5 — Optimización de performance: target p95 <3s
• T4.5a — FeedbackCapture in HookLifecycle: after_tool writes FeedbackEntry via HTTP to #15. Fire-and-forget.T4.5a — FeedbackCapture en HookLifecycle: after_tool escribe FeedbackEntry via HTTP a #15. Fire-and-forget.
• T4.5b — ActionLog entity + DynamoActionLogRepository: record of every WRITE executedT4.5b — Entidad ActionLog + DynamoActionLogRepository: registro de cada WRITE ejecutada
Dependencies: T4.1 WS server must be ready before Sergio builds T4.19 WS Electron client. T4.9a — API Gateway WebSocket CDK from Andres must be ready. All WRITE handlers stable from S5-6.Dependencias: Servidor WS T4.1 listo antes de que Sergio construya cliente WS Electron T4.19. T4.9a — API Gateway WebSocket CDK de Andres debe estar listo. Handlers WRITE estables desde S5-6.
Sprint 9-10 — Bug Fixes + Monitoring + Security SupportSprint 9-10 — Fix Bugs + Monitoreo + Soporte Seguridad
Goal: Production-stable agent. Security review support. System prompt v3 from real conversations. All P1/P2 intelligence bugs resolved.Objetivo: Agente estable en producción. Soporte en security review. System prompt v3 basado en conversaciones reales. Todos los bugs P1/P2 de inteligencia resueltos.
• LLMGuardChecker: Claude Haiku as classifier for ambiguous inputs (Phase 2 of InputGuard)LLMGuardChecker: Claude Haiku como clasificador para inputs ambiguos (Phase 2 de InputGuard)
• Bug fixes P1/P2 across the intelligence stack from beta feedbackBug fixes P1/P2 en todo el stack de inteligencia basado en feedback de beta
• System prompt v3 final: iteration based on real beta conversations, adjusted few-shot examplesSystem prompt v3 final: iteración con conversaciones reales de beta, ejemplos few-shot ajustados
• Security review support: OWASP top 10, injection path review, OutputGuard validationSoporte security review: OWASP top 10, revisión de paths de inyección, validación OutputGuard
Sprint 11-12 — Buffer: Intelligence Hardening + Deferred WRITE ToolsSprint 11-12 — Buffer: Hardening Inteligencia + WRITE Tools Diferidos
Goal: Clear intelligence P1/P2 backlog. Ship any WRITE tools cut by circuit breaker (advertising campaigns). p95 optimization if >3s. ProactiveSuggestions v2 if deferred.Objetivo: Limpiar backlog P1/P2 de inteligencia. Lanzar WRITE tools cortadas por circuit breaker (advertising campaigns). Optimización p95 si >3s. ProactiveSuggestions v2 si fue diferido.
• Bug fixes P1/P2 for intelligence reported by beta usersBug fixes P1/P2 de inteligencia reportados por usuarios de beta
• WRITE tools cut by circuit breaker (advertising campaigns if not in S7-S8)WRITE tools cortadas por circuit breaker (advertising campaigns si no entraron en S7-S8)
• p95 optimization if >3s: profiling hot paths, DynamoDB query optimization, prompt size reductionOptimización p95 si >3s: profiling hot paths, optimización queries DynamoDB, reducción tamaño prompt
• ProactiveSuggestions v2 (if deferred): afterToolWithContext() parallel to streaming, gate <40% turnsProactiveSuggestions v2 (si fue diferido): afterToolWithContext() paralelo al streaming, gate <40% turnos
Circuit breaker output: Advertising WRITE tools + ProactiveSuggestions v2 are the most likely candidates to be cut from S7-8.Output del circuit breaker: WRITE tools de advertising + ProactiveSuggestions v2 son los candidatos más probables a ser cortados de S7-8.
Key Technical DecisionsDecisiones Técnicas Clave
• Claude Sonnet 4 as primary LLM — tool_use native, fast, cost-effectiveClaude Sonnet 4 como LLM primario — tool_use nativo, rapido, costo-efectivo
• ToolRegistry with register(def, handler) / registerRemote(def, dispatcher). HookLifecycle: before_tool → execute → after_tool.ToolRegistry con register(def, handler) / registerRemote(def, dispatcher). HookLifecycle: before_tool → execute → after_tool.
• All 3 marketplace adapters (MeLi, Amazon, Shopify) owned by Andres (#12) — single owner policy [CORREGIDO: Shopify de vuelta a Andrés]Los 3 adaptadores de marketplace (MeLi, Amazon, Shopify) a cargo de Andres (#12) — politica de propietario único [CORREGIDO: Shopify de vuelta a Andrés]
9.6.2 — Andrés León — Data + Backend
Andres Leon — Data + Backend
APIs + Data Sync + Auth + InfrastructureAPIs + Data Sync + Auth + Infraestructura
Andres owns the data backbone. He builds the marketplace adapters for MeLi + Amazon + Shopify (#12), the data sync pipelines (#10), auth/token management — including SellerConnection (5-state machine), MarketplaceAction (action log), and IOAuth2Flow (generic OAuth2 port) — and the DevOps infrastructure (#14). Andrés es dueño del backbone de datos. Construye los adaptadores de marketplace MeLi + Amazon + Shopify (#12), los pipelines de data sync (#10), el manejo de auth/tokens — incluyendo SellerConnection (state machine 5 estados), MarketplaceAction (registro de acciones) e IOAuth2Flow (puerto genérico OAuth2) — y la infraestructura DevOps (#14).
Sprint 1-2 — Marketplace Adapters + OAuth2 + Domain EntitiesSprint 1-2 — Adaptadores + OAuth2 + Entidades de Dominio
Goal: MeLi and Amazon scaffold adapters returning data via IMarketplaceAdapter, OAuth2 flows for MeLi + Amazon LWA, AES-256-GCM token encryption, domain entities SellerConnection and MarketplaceAction.Objetivo: Adaptadores MeLi y Amazon scaffold retornando datos via IMarketplaceAdapter, flujos OAuth2 MeLi + Amazon LWA, cifrado tokens AES-256-GCM, entidades de dominio SellerConnection y MarketplaceAction.
• T1.9 — Scaffold Marketplace Provider: Clean Architecture + DDD, Value Objects, Error types, DI containerT1.9 — Scaffold Marketplace Provider: Clean Architecture + DDD, Value Objects, tipos de Error, DI container
• T1.10 — IMarketplaceAdapter: 23 methods, 4 domains (Catalog, Engagement, Advertising, Enrollment). ISKUResolver: SKU → native marketplace IDT1.10 — IMarketplaceAdapter: 23 métodos, 4 dominios (Catalog, Engagement, Advertising, Enrollment). ISKUResolver: SKU → ID nativo marketplace
• T1.11 — AES256GCMCipher + ITokenManager: encrypt tokens at rest, DynamoDB marketplace-credentials tableT1.11 — AES256GCMCipher + ITokenManager: cifrado tokens at rest, tabla DynamoDB marketplace-credentials
• T1.12 — MeLiOAuth2Flow + MeLiAdapter: OAuth2 code flow, REST API, standardized error mappingT1.12 — MeLiOAuth2Flow + MeLiAdapter: OAuth2 code flow, REST API, mapeo errores estandarizados
• T1.13 — AmazonLWAFlow + AmazonAdapter scaffold: OAuth2 LWA, SP-API SDK. Scaffold only — full impl in S3-4T1.13 — AmazonLWAFlow + AmazonAdapter scaffold: OAuth2 LWA, SP-API SDK. Solo scaffold — impl completa en S3-4
• T1.14 — Verify existing Terraform GCP: GCS, Cloud Run, Airflow, BigQuery operationalT1.14 — Verificar Terraform GCP existente: GCS, Cloud Run, Airflow, BigQuery operacionales
• T1.15 — Request external dependencies: Amazon SP-API, MeLi dev portal, Shopify Partners, Apple DeveloperT1.15 — Solicitar dependencias externas: Amazon SP-API, MeLi dev portal, Shopify Partners, Apple Developer
• T1.15a — SellerConnection aggregate: 5-state machine (disconnected → pending → active → expired → revoked)T1.15a — Aggregate SellerConnection: state machine 5 estados (disconnected → pending → active → expired → revoked)
• T1.15b — MarketplaceAction entity + IMarketplaceActionRepositoryT1.15b — Entidad MarketplaceAction + IMarketplaceActionRepository
• T1.15c — IOAuth2Flow interface: generic OAuth2 port (authorize, exchangeCode, refreshToken)T1.15c — Interfaz IOAuth2Flow: puerto genérico OAuth2 (authorize, exchangeCode, refreshToken)
• T1.28 — Collect missing WRITE API docs: MeLi 3, Amazon Ads 5, Amazon 2, Shopify 9 — required for #3 Tool Registry WRITE action mappingT1.28 — Recolectar docs APIs WRITE faltantes: MeLi 3, Amazon Ads 5, Amazon 2, Shopify 9 — necesario para mapeo acciones WRITE de #3 Tool Registry
• T1.29 — Collect user management provider docs: evaluate external auth provider (Auth0, Clerk, Memberstack), document service methods for consumer layersT1.29 — Recolectar docs gestor de usuarios: evaluar proveedor auth externo (Auth0, Clerk, Memberstack), documentar métodos de servicio para capas consumidoras
Build PipelinePipeline de Build
• T1.33 — GitHub Actions CI: electron-builder on release/* branch. Upload .dmg + .exe artifacts. Notify #deploys SlackT1.33 — GitHub Actions CI: electron-builder en rama release/*. Subir artifacts .dmg + .exe. Notificar Slack #deploys
Dependencies: None — Andres starts in parallel with Mateo. T1.33 depends on Sergio’s T1.32 canary build.Dependencias: Ninguna — Andrés arranca en paralelo con Mateo. T1.33 depende del build canary T1.32 de Sergio.
Deliverable: MeLiAdapter returns real data via IMarketplaceAdapter. AmazonAdapter scaffold ready. Tokens encrypted AES-256-GCM with auto-refresh.Entregable: MeLiAdapter retorna datos reales via IMarketplaceAdapter. Scaffold AmazonAdapter listo. Tokens cifrados AES-256-GCM con refresh automático.
Sprint 3-4 — Shopify + Amazon + TokenRefreshCron + Data Sync + CDK + CISprint 3-4 — Shopify + Amazon + TokenRefreshCron + Data Sync + CDK + CI
Goal: Shopify adapter complete, Amazon adapter complete (if E1 approved), TokenRefreshCron, Data Sync Clean Architecture, CDK base AWS, CI multi-repo.Objetivo: Shopify adapter completo, Amazon adapter completo (si E1 aprobado), TokenRefreshCron, Data Sync con Clean Architecture, CDK base AWS, CI multi-repo.
• T2.10 — ShopifyOAuth2Flow + ShopifyAdapter: OAuth2, GraphQL Admin API, cost-based rate limitingT2.10 — ShopifyOAuth2Flow + ShopifyAdapter: OAuth2, GraphQL Admin API, rate limiting cost-based
• T2.11 — AmazonAdapter complete (if E1 approved): SP-API SDK, Reports, Catalog Items, OrdersT2.11 — AmazonAdapter completo (si E1 aprobado): SP-API SDK, Reports, Catalog Items, Orders
• T2.12 — TokenRefreshCron: EventBridge every 5min, pre-refresh 30min, DynamoDB mutex, 3 failures → Slack alertT2.12 — TokenRefreshCron: EventBridge cada 5min, pre-refresh 30min, mutex DynamoDB, 3 fallos → alerta Slack
• T2.13 — Data Sync Phase 0.5: refactor Clean Architecture in services/api/ — no behavior changeT2.13 — Data Sync Fase 0.5: refactor Clean Architecture en services/api/ sin cambio de comportamiento
• T2.14 — Verify existing DAGs: MeLi + Shopify @hourly. Fix if neededT2.14 — Verificar DAGs existentes MeLi + Shopify @hourly. Fix si necesario
• T2.15 — CDK base AWS: DynamoDB conversation-api (corrected GSI) + Lambda + API Gateway v2 HTTP + VPC + NAT. Marketplace Provider: DynamoDB marketplace-credentials, Secrets Manager, EventBridgeT2.15 — CDK base AWS: DynamoDB conversation-api (GSI corregido) + Lambda + API Gateway v2 HTTP + VPC + NAT. Marketplace Provider: DynamoDB marketplace-credentials, Secrets Manager, EventBridge
• T2.16 — GitHub Actions CI multi-repo: lint + type-check + tests on each PR, 4 active reposT2.16 — GitHub Actions CI multi-repo: lint + type-check + tests en cada PR, 4 repos activos
• T2.16a — marketplace-actions DynamoDB table in CDK. T2.16b — AmazonAdsOAuth2Flow: separate OAuth2 for Amazon Ads API. T2.16c — ISKUResolver implementations: MeLi (ML prefix), Amazon (ASIN), Shopify (numeric ID)T2.16a — Tabla marketplace-actions en CDK. T2.16b — AmazonAdsOAuth2Flow: OAuth2 separado para Amazon Ads API. T2.16c — implementaciones ISKUResolver: MeLi (prefijo ML), Amazon (ASIN), Shopify (ID numérico)
Dependencies: MeLi adapter from S1-2 stable. E1 Amazon approval determines if T2.11 executes or defers to S5.Dependencias: MeLi adapter de S1-2 estable. E1 Amazon approval determina si T2.11 se ejecuta o difiere a S5.
Sprint 5-6 — Fast Data Layer + GCS Snapshots + DAG Amazon + Rate Limiting + CI/CDSprint 5-6 — Fast Data Layer + Snapshots GCS + DAG Amazon + Rate Limiting + CI/CD
Goal: Fast Data Layer with 11 operational endpoints, GCS snapshots for ConfirmationFlow, Amazon DAG, IRateLimiter per marketplace, onboarding trigger, CI/CD for 11 repos.Objetivo: Fast Data Layer con 11 endpoints operacionales, snapshots GCS para ConfirmationFlow, DAG Amazon, IRateLimiter por marketplace, onboarding trigger, CI/CD 11 repos.
• T3.13 — Fast Data Layer 11 endpoints: FastAPI 1:1 with Tool Registry, GCS Parquet via pyarrow, <500msT3.13 — Fast Data Layer 11 endpoints: FastAPI 1:1 con Tool Registry, GCS Parquet via pyarrow, <500ms
• T3.14 — GCS pre-write snapshots for ConfirmationFlow + snapshot_cleanup_dagT3.14 — Snapshots GCS pre-write para ConfirmationFlow + snapshot_cleanup_dag
• T3.15 — DAG Amazon: IExtractor + ILoader + AmazonAuthManager + AmazonExtractor + AmazonLoaderT3.15 — DAG Amazon: IExtractor + ILoader + AmazonAuthManager + AmazonExtractor + AmazonLoader
• T3.16 — IRateLimiter per marketplace: MeLi token bucket 1500/min, Amazon burst/restore, Shopify leaky bucket. Redis counterT3.16 — IRateLimiter por marketplace: MeLi token bucket 1500/min, Amazon burst/restore, Shopify leaky bucket. Contador Redis
• T3.17 — Onboarding trigger: first sync post-onboarding when user connects marketplaceT3.17 — Onboarding trigger: primer sync post-onboarding cuando usuario conecta marketplace
• T3.18 — CI/CD multi-repo complete: 11 repos with GitHub Actions, auto-deploy to stagingT3.18 — CI/CD multi-repo completado: 11 repos con GitHub Actions, deploy automático staging
Sprint 7-8 — Staging Deploy + Load Test + CloudWatch + WebSocket CDK + Silver/GoldSprint 7-8 — Staging Deploy + Load Test + CloudWatch + WebSocket CDK + Silver/Gold
• T4.6 — Staging deploy full stack: CDK AWS + Terraform GCP. Health-check greenT4.6 — Staging deploy full stack: CDK AWS + Terraform GCP. Health-check verde
• T4.7 — Load testing 50 users: Artillery/k6, target p95 <2sT4.7 — Load testing 50 usuarios: Artillery/k6, target p95 <2s
• T4.8 — CloudWatch dashboard + alerts: PagerDuty p95 >2s, Slack cost >$50/dayT4.8 — Dashboard CloudWatch + alertas: PagerDuty p95 >2s, Slack costo >$50/día
• T4.9 — Data Sync Silver + Gold (circuit breaker): INormalizer, SilverNormalizer, IAggregator, Brand Health spikeT4.9 — Data Sync Silver + Gold (circuit breaker): INormalizer, SilverNormalizer, IAggregator, Brand Health spike
• T4.9a — API Gateway v2 WebSocket CDK: routes $connect/$disconnect/$default, DynamoDB connection-idsT4.9a — API Gateway v2 WebSocket CDK: routes $connect/$disconnect/$default, DynamoDB connection-ids
• #12 + #10 integration testing: marketplace adapters + data sync E2ETesting integración #12 + #10: adaptadores marketplace + data sync E2E
Sprint 9-10 — Production Deploy + IaC + Rollback + OpenMetadataSprint 9-10 — Deploy Producción + IaC + Rollback + OpenMetadata
• T5.4 — Production deploy: CDK + Terraform prod, SSL + domain api.shopilot.aiT5.4 — Deploy producción: CDK + Terraform prod, SSL + dominio api.shopilot.ai
• T5.5 — IaC production complete: DynamoDB PITR 35d, PostgreSQL RDS backups, GCS lifecycle policiesT5.5 — IaC producción completo: DynamoDB PITR 35d, backups PostgreSQL RDS, lifecycle GCS
• T5.6 — Rollback testing: Lambda <1min, Cloud Run <1min, document runbookT5.6 — Rollback testing: Lambda <1min, Cloud Run <1min, documentar runbook
• T5.6a — Data Sync Phase 4: OpenMetadata FQNs + embedding DAGs → Cerebro KBT5.6a — Data Sync Fase 4: OpenMetadata FQNs + embedding DAGs → Cerebro KB
Sprint 11-12 — Buffer: Prod Hardening + Adapter Fixes + MonitoringSprint 11-12 — Buffer: Hardening Prod + Fix Adapters + Monitoring
Goal: Harden production based on real traffic data. Fix adapter edge cases found in beta. Expand monitoring dashboards.Objetivo: Hardening de producción con datos reales. Corregir edge cases de adapters encontrados en beta. Expandir dashboards de monitoring.
• Production hardening: refine CloudWatch alerts (based on real S9-S10 data), update runbooks, rollback drillsHardening producción: afinar alertas CloudWatch (con datos reales S9-S10), actualizar runbooks, drills de rollback
• Fix marketplace adapter bugs from beta: rate limit edge cases, OAuth unexpected states, marketplace API quirksFix bugs de adapters de marketplace de beta: edge cases de rate limits, estados inesperados OAuth, quirks de APIs
• Monitoring dashboards expanded: cost breakdown per tool, latency per marketplace, error rate per adapterDashboards de monitoring expandidos: desglose de costo por tool, latencia por marketplace, error rate por adapter
• DAG Silver→Gold (if cut in S7-8 by circuit breaker): cross-marketplace normalization completeDAG Silver→Gold (si fue cortado en S7-8 por circuit breaker): normalización cross-marketplace completa
• Rate limiter optimization with real production data: adjust thresholds, backoff policiesOptimización de rate limiters con datos reales de producción: ajustar thresholds, backoff policies
Circuit breaker output: DAG Silver→Gold + advanced monitoring were candidates for cut if staging slipped.Output del circuit breaker: DAG Silver→Gold + monitoring avanzado fueron candidatos a corte si staging se retrasaba.
Key Technical DecisionsDecisiones Técnicas Clave
• AWS Secrets Manager for backend (#2,#3,#12,#13) / GCP Secret Manager for data services (#9,#10,#11). AES-256-GCM for marketplace tokens in DynamoDBAWS Secrets Manager para backend (#2,#3,#12,#13) / GCP Secret Manager para servicios de datos (#9,#10,#11). AES-256-GCM para tokens marketplace en DynamoDB
• Redis (ElastiCache) from S5-6 for rate limiting (#12) and enrichment cache (#11). Cache TTL: 15min-24h by data typeRedis (ElastiCache) desde S5-6 para rate limiting (#12) y cache enrichment (#11). Cache TTL: 15min-24h por tipo de dato
• Andres owns #10 Data Sync, #12 Marketplace Provider (MeLi + Amazon + Shopify adapters), and #14 DevOps IaC — single owner policy for all marketplace adaptersAndres es dueno de #10 Data Sync, #12 Marketplace Provider (adaptadores MeLi + Amazon + Shopify), y #14 DevOps IaC — politica de propietario unico para todos los adaptadores de marketplace
9.6.3 — Sergio Murillo — Full-Stack
Sergio Murillo — Full-Stack
Native Shell + UI + Billing + Ship + MockupsShell Nativa + UI + Billing + Ship + Mockups
Sergio owns everything the user sees and touches. He builds the Electron desktop app with WebContentsView for marketplace browsing, the React sidebar with chat, the billing integration with Stripe, ships the final .dmg, the Feedback Loop (#15) that measures the impact of Coach actions at 7 days, and creates integration Mockups that validate UX/UI’s Figma components in real React context. He is the single-point-of-failure for the native shell — Pablo cross-trains on React/Electron basics by S4 as mitigation. Sergio es dueño de todo lo que el usuario ve y toca. Construye la app de escritorio Electron con WebContentsView para navegar marketplaces, el sidebar React con chat, la integración de billing con Stripe, entrega el .dmg final, el Feedback Loop (#15) que mide el impacto de las acciones del Coach a 7 días, y crea Mockups de integración que validan los componentes Figma del equipo UX/UI en contexto React real. Es el single-point-of-failure del shell nativo — Pablo hace cross-training en básicos React/Electron para S4 como mitigación.
Sprint 1-2 — Electron Shell + WebContentsView + AuthSprint 1-2 — Shell Electron + WebContentsView + Auth
Goal: Working Electron app with WebContentsView loading marketplace URLs, tab system, marketplace detector, sidebar container, and Memberstack auth.Objetivo: App Electron funcional con WebContentsView cargando URLs de marketplace, sistema de tabs, detector de marketplace, contenedor sidebar, y auth Memberstack.
• T1.16 — Scaffold Electron + electron-builder: Electron 28+, preload scripts with contextBridge, hot reload devT1.16 — Scaffold Electron + electron-builder: Electron 28+, preload scripts con contextBridge, hot reload dev
• T1.18 — MarketplaceDetector: URL patterns MeLi/Amazon/Shopify, detect page type, extract IDs, remote config JSON with local fallbackT1.18 — MarketplaceDetector: patterns URL MeLi/Amazon/Shopify, detectar tipo página, extraer IDs, remote config JSON con fallback local
• T1.19 — Tab system + Sidebar container +0.5d setup tokens: marketplace tabs + React sidebar 360px with design tokens from T0.BB, IPC main↔renderer, toggle Cmd+BT1.19 — Sistema de Tabs + Sidebar container +0.5d setup tokens: tabs marketplace + sidebar React 360px con tokens de diseño de T0.BB, IPC main↔renderer, toggle Cmd+B
• T1.20 — Auth Memberstack: JWT in electron-store encrypted with OS key, login/logout flow, AuthService in main processT1.20 — Auth Memberstack: JWT en electron-store cifrado con clave del OS, login/logout flow, AuthService en main process
Internal Beta BuildBuild Beta Interno
• T1.32 — First .dmg + .exe canary build (unsigned): run electron-builder, verify packaging, team install testT1.32 — Primer build canary .dmg + .exe (sin firmar): ejecutar electron-builder, verificar empaquetado, test de instalación
MockupsMockups
• T1.MK1 — Mockup shell container: assemble sidebar + tabs with tokens from T0.BB (0.5d)T1.MK1 — Mockup shell container: ensamble sidebar + tabs con tokens de T0.BB (0.5d)
Dependencies: None — Sergio starts in parallel. REST endpoint from Mateo (T1.7) needed by end of S2. T0.BB (Figma foundations) needed for T1.19 in week 2.Dependencias: Ninguna — Sergio arranca en paralelo. REST endpoint de Mateo (T1.7) necesario para final de S2. T0.BB (Figma foundations) necesario para T1.19 en semana 2.
Week 1 (no Figma): T1.16, T1.17, T1.18, T1.20, T1.32. Week 2 (with T0.BB): T1.19 + T1.MK1.Semana 1 (sin Figma): T1.16, T1.17, T1.18, T1.20, T1.32. Semana 2 (con T0.BB): T1.19 + T1.MK1.
Sprint 3-4 — Chat UI + WebSocket + Context Injection + Onboarding + MockupsSprint 3-4 — Chat UI + WebSocket + Inyección Contexto + Onboarding + Mockups
• T2.17 — Chat UI + Markdown rendering +0.5d integration T1.BB components. Total: 2.5d. User/assistant bubbles, thinking/executing/done indicators, syntax highlightingT2.17 — Chat UI + Markdown rendering +0.5d integración componentes T1.BB. Total: 2.5d. Burbujas usuario/asistente, indicadores pensando/ejecutando/listo, syntax highlighting
• T2.18 — CoachWebSocketService: WebSocket client in main process, exponential backoff reconnect, heartbeat 30s, REST polling fallbackT2.18 — CoachWebSocketService: WebSocket client en main process, reconexión backoff exponencial, heartbeat 30s, fallback REST polling
• T2.19 — URL context injection: MarketplaceDetector → extract marketplace, page type, IDs → metadata with each messageT2.19 — Inyección contexto URL: MarketplaceDetector → extraer marketplace, tipo página, IDs → metadata con cada mensaje
• T2.20 — react-router views: /chat, /profile, /billing, /enrollment, /onboarding. Bottom tab barT2.20 — Navegación react-router: /chat, /profile, /billing, /enrollment, /onboarding. Tab bar inferior
• T2.21 — OnboardingWizard +0.5d T1.BB components. Total: 2.5d. 5 steps: (1) Welcome, (2) Connect marketplace (OAuth inline), (3) Setup profile, (4) Guided first query, (5) Success + next steps. First launch only (localStorage flag). Skip from step 3T2.21 — OnboardingWizard +0.5d componentes T1.BB. Total: 2.5d. 5 pasos: (1) Bienvenida, (2) Conectar marketplace (OAuth inline), (3) Setup perfil, (4) Primera query guiada, (5) Éxito + próximos pasos. Solo primer launch (flag localStorage). Skip desde paso 3
MockupsMockups
• T2.MK1 — Mockup ChatView: MessageBubbles + ContextBar + ChatInputBar + AgentStatusBar assembled in React (1d, depends T1.BB)T2.MK1 — Mockup ChatView: MessageBubbles + ContextBar + ChatInputBar + AgentStatusBar ensamblado en React (1d, depende T1.BB)
• T2.MK2 — Mockup OnboardingWizard: 5 navigable steps with OnboardingStep (0.5d, depends T1.BB)T2.MK2 — Mockup OnboardingWizard: 5 pasos navegables con OnboardingStep (0.5d, depende T1.BB)
Internal Beta BuildBuild Beta Interno
• T2.40 — Gate 1 signed build: Apple codesign + notarytool .dmg, Windows signed .exe, distribute to team. Gate 1 build milestoneT2.40 — Build firmado Gate 1: Apple codesign + notarytool .dmg, Windows .exe firmado, distribuir al equipo. Hito build Gate 1
Sprint 5-6 — Billing + Lifecycle + Confirmations + Cards + MockupsSprint 5-6 — Billing + Ciclo de Vida + Confirmaciones + Cards + Mockups
• T3.19 — BillingView +0.5d T2.BB components. Total: 2.5d. Current plan, remaining credits, usage stats. Uses CreditEconomy + MarketplaceKPI + CreditDisplay. Buttons → Stripe Checkout in system browserT3.19 — BillingView +0.5d componentes T2.BB. Total: 2.5d. Plan actual, créditos restantes, stats uso. Usa CreditEconomy + MarketplaceKPI + CreditDisplay. Botones → Stripe Checkout en navegador del sistema
• T3.20 — WRITE confirmation dialogs: depends T2.BB, uses ConfirmDialog REVERSIBLE + IRREVERSIBLE. Red/green diff, 35min timeout with 5min reminderT3.20 — Diálogos confirmación WRITE: depende T2.BB, usa ConfirmDialog REVERSIBLE + IRREVERSIBLE. Diff rojo/verde, timeout 35min con reminder 5min
• T3.21 — Suggestion cards + tool progress +0.5d T2.BB components. Total: 1.5d. Uses ProactiveCard. Clickable cards, click opens pre-contextualized conversation, spinner with tool nameT3.21 — Cards sugerencias + progreso tools +0.5d componentes T2.BB. Total: 1.5d. Usa ProactiveCard. Cards clicables, click abre conversación pre-contextualizada, spinner con nombre tool
• T3.22 — ProfileView: depends T2.BB, uses EnrollmentCard + Toggle labeled. Connected marketplaces, stats, preferences, settingsT3.22 — ProfileView: depende T2.BB, usa EnrollmentCard + Toggle labeled. Marketplaces conectados, stats, preferencias, settings
• T3.23 — Stripe Checkout + Customer Portal: Pro $49/mo checkout, webhooks checkout.session.completedT3.23 — Stripe Checkout + Customer Portal: checkout Pro $49/mes, webhooks checkout.session.completed
• T3.24 — ICreditsGate + backend credits: POST /internal/gate, credit matrix READ=1 / ANALYSIS=2 / WRITE=3T3.24 — ICreditsGate + backend créditos: POST /internal/gate, matriz READ=1 / ANALYSIS=2 / WRITE=3
• T3.24a — Billing schema migration: ALTER TABLE clients + tables credit_packs, subscription_events, credit_transactionsT3.24a — Billing schema migration: ALTER TABLE clients + tablas credit_packs, subscription_events, credit_transactions
• T3.24b — SubscriptionLifecycleService: activate, cancel, upgrade, downgrade, grace period 7dT3.24b — SubscriptionLifecycleService: activate, cancel, upgrade, downgrade, grace period 7d
• T3.24c — Monthly credit reset cron: EventBridge + Lambda, reset plan credits monthlyT3.24c — Cron reset créditos mensual: EventBridge + Lambda, reset créditos de plan mensualmente
MockupsMockups
• T3.MK1 — Mockup BillingView: current plan + CreditEconomy + ProgressBar labeled + CreditDisplay + Stripe buttons (0.5d, depends T2.BB)T3.MK1 — Mockup BillingView: plan actual + CreditEconomy + ProgressBar labeled + CreditDisplay + botones Stripe (0.5d, depende T2.BB)
• T3.MK2 — Mockup ProfileView: EnrollmentCard list + Toggles labeled + InputFields (0.5d, depends T2.BB)T3.MK2 — Mockup ProfileView: lista EnrollmentCards + Toggles labeled + InputFields (0.5d, depende T2.BB)
• T3.MK3 — Mockup ConfirmDialog in chat context: ChatView + ConfirmDialog overlay (slide up + backdrop) × REVERSIBLE and IRREVERSIBLE (0.5d, depends T2.BB + T3.20)T3.MK3 — Mockup ConfirmDialog en contexto de chat: ChatView + ConfirmDialog superpuesto (slide up + backdrop) × REVERSIBLE e IRREVERSIBLE (0.5d, depende T2.BB + T3.20)
Sprint 7-8 — WebSocket Client + EnrollmentView + Sentry + Feedback Loop + MockupsSprint 7-8 — WebSocket Client + EnrollmentView + Sentry + Feedback Loop + Mockups
• T4.10 — WebSocket client progressive +0.5d T3.BB components for ReActStream + RollbackPanel. Total: 2.5d. Consume 8 server→client events (tool_start, tool_end, token_stream, suggestion_ready, confirmation_required, credits_updated, feedback_request, session_expired), update UI state machine accordinglyT4.10 — WebSocket client progresivo +0.5d componentes T3.BB para ReActStream + RollbackPanel. Total: 2.5d. Consumir 8 eventos server→client (tool_start, tool_end, token_stream, suggestion_ready, confirmation_required, credits_updated, feedback_request, session_expired), actualizar máquina de estados UI
• T4.11 — EnrollmentView standalone: dedicated BrowserWindow for OAuth redirect flows per marketplace. Reuses OAuth tokens from T2.21 OnboardingWizardT4.11 — EnrollmentView standalone: BrowserWindow dedicado para flujos OAuth redirect por marketplace. Reutiliza tokens OAuth de T2.21 OnboardingWizard
• T4.12 — Sentry crash reporting: init() in main + renderer, source maps, unhandledRejection + uncaughtException hooksT4.12 — Crash reporting Sentry: init() en main + renderer, source maps, hooks unhandledRejection + uncaughtException
• T4.13 — Feedback Loop scaffold (#15): FeedbackEvent model, IPC channel feedback:record, POST /feedback/explicit + /feedback/implicit endpoints wiredT4.13 — Scaffold Feedback Loop (#15): modelo FeedbackEvent, canal IPC feedback:record, endpoints POST /feedback/explicit + /feedback/implicit conectados
• T4.14 — calculateImpactScore: metric delta (listing_views, conversion_rate, revenue_7d) between action_date and +7d snapshot, normalized score 0–100T4.14 — calculateImpactScore: delta métrica (listing_views, conversion_rate, revenue_7d) entre action_date y snapshot +7d, score normalizado 0–100
• T4.15 — FeedbackMeasurerService: cron job +7d after each recorded action, fetch snapshot, compute score, persist FeedbackResultT4.15 — FeedbackMeasurerService: cron job +7d después de cada acción registrada, fetch snapshot, computar score, persistir FeedbackResult
• T4.15a — FeedbackGate anti-fatigue: max 1 explicit feedback request per session, suppress if user dismissed in last 3 daysT4.15a — FeedbackGate anti-fatiga: máximo 1 solicitud de feedback explícito por sesión, suprimir si usuario descartó en últimos 3 días
• T4.15b — Explicit feedback endpoint: POST /feedback/explicit — thumbs up/down + optional note, triggers FeedbackEvent immediatelyT4.15b — Endpoint feedback explícito: POST /feedback/explicit — thumbs up/down + nota opcional, dispara FeedbackEvent inmediatamente
• T4.15c — Implicit feedback endpoint: POST /feedback/implicit — captures re-visits to changed listing, time-on-page events, re-run of same toolT4.15c — Endpoint feedback implícito: POST /feedback/implicit — captura re-visitas a listing modificado, eventos time-on-page, re-ejecución del mismo tool
• T4.15d — Grace period billing UI: banner when subscription expired but within 7d grace — "Your plan expired, actions paused. Renew to continue."T4.15d — UI grace period billing: banner cuando suscripción expiró pero dentro de 7d grace — "Tu plan expiró, acciones pausadas. Renueva para continuar."
MockupsMockups
• T4.MK1 — Mockup EnrollmentView standalone: full marketplace list × all states Connected/Syncing/Error/Disconnected (0.5d, depends T3.BB)T4.MK1 — Mockup EnrollmentView standalone: lista completa de marketplaces × todos los estados Connected/Syncing/Error/Disconnected (0.5d, depende T3.BB)
• T4.MK2 — Mockup full WRITE flow: ChatView with AgentStatusBar in ToolUse → ToolAccordion expanded → ReActStream (3 phases) → ConfirmDialog → RollbackPanel post-execution (1d, depends T3.BB)T4.MK2 — Mockup flujo WRITE completo: ChatView con AgentStatusBar en ToolUse → ToolAccordion expandido → ReActStream (3 fases) → ConfirmDialog → RollbackPanel post-ejecución (1d, depende T3.BB)
Internal Beta BuildBuild Beta Interno
• T4.24 — Gate 2 signed build: full .dmg notarized + .exe signed with ALL S7-8 features. Team smoke test. Gate 2 build milestone — candidate for beta distributionT4.24 — Build firmado Gate 2: .dmg notarizado + .exe firmado con TODAS las features S7-8. Smoke test del equipo. Hito build Gate 2 — candidato para distribución beta
Sprint 9-10 — Code Signing + .dmg + Auto-updater + Stripe Live + MockupsSprint 9-10 — Code Signing + .dmg + Auto-updater + Stripe Live + Mockups
• T5.7 — Code signing + .dmg + auto-updater: Apple Developer certificate, notarization, electron-updater pointing to releases.shopilot.ai, silent update flowT5.7 — Code signing + .dmg + auto-updater: certificado Apple Developer, notarización, electron-updater apuntando a releases.shopilot.ai, flujo de update silencioso
• T5.8 — Electron security hardening: CSP headers, sandbox: true, nodeIntegration: false, contextIsolation: true, allowRunningInsecureContent: falseT5.8 — Hardening seguridad Electron: headers CSP, sandbox: true, nodeIntegration: false, contextIsolation: true, allowRunningInsecureContent: false
• T5.9 — Beta bug fixes + RAM profiling +0.5d post-audit T4.BB alignment. Total: 3.5d. Fix P1/P2 bugs from beta cohort, Chrome DevTools memory snapshots, lazy-load views, target RAM <500MBT5.9 — Bug fixes beta + profiling RAM +0.5d para alineación post-auditoría T4.BB. Total: 3.5d. Fix bugs P1/P2 de cohorte beta, snapshots memoria Chrome DevTools, lazy-load vistas, target RAM <500MB
• T5.10 — Billing Stripe live: switch from test keys to live keys, verify webhooks prod, smoke test checkout + portal + cancellation flowsT5.10 — Billing Stripe live: cambiar de test keys a live keys, verificar webhooks prod, smoke test checkout + portal + flujos de cancelación
MockupsMockups
• T5.MK1 — Mockup Dashboard view: MarketplaceKPI grid + FraudAlert + AuditLog of recent actions + quick access to chat (1d, depends T4.BB)T5.MK1 — Mockup Dashboard view: grid MarketplaceKPIs + FraudAlert + AuditLog de últimas acciones + acceso rápido al chat (1d, depende T4.BB)
Sprint 11-12 — Buffer: Beta Bug Fixes + Auto-updater + WindowsSprint 11-12 — Buffer: Bug Fixes Beta + Auto-updater + Windows
Goal: Clear the P1/P2 backlog from beta. Ship auto-updater pipeline. Windows build if deferred. Feedback UI with impact visualization.Objetivo: Limpiar backlog P1/P2 de beta. Lanzar pipeline de auto-updater. Build Windows si fue diferido. Feedback UI con visualización de impacto.
• Bug fixes UI/UX reported by beta users (P1/P2 priority)Bug fixes UI/UX reportados por beta users (prioridad P1/P2)
• Auto-updater S3 pipeline: push .dmg → S3 bucket → app detects update → downloads + installs silentlyPipeline auto-updater S3: push .dmg → bucket S3 → app detecta update → descarga + instala silenciosamente
• Windows build (if deferred): electron-builder Windows exe, code signing, E2E testsBuild Windows (si fue diferido): electron-builder exe, code signing Windows, tests E2E
• Feedback UI: visualize impact of past actions based on FeedbackSummary (ImpactScore per action)Feedback UI: visualizar impacto de acciones pasadas basado en FeedbackSummary (ImpactScore por acción)
• RAM optimization if >500MB: profiling with Chrome DevTools, lazy loading views, cleanup WebContentsViewOptimización RAM si >500MB: profiling Chrome DevTools, lazy loading vistas, cleanup WebContentsView
• FeedbackThrottle anti-fatigue refinement: tune suppression window from 3d to optimal based on beta engagement data, add per-action-type capsRefinamiento anti-fatiga FeedbackThrottle: ajustar ventana de supresión de 3d al óptimo según datos de engagement beta, agregar caps por tipo de acción
Circuit breaker output: Windows build + Feedback UI were candidates for cut in S7-10.Output del circuit breaker: Build Windows + Feedback UI fueron candidatos a corte en S7-10.
Risk: Single Point of FailureRiesgo: Punto Unico de Falla
Sergio is the only Electron/React engineer. Mitigation: Pablo cross-trains on basics by S4. If Sergio is blocked, Pablo covers UI fixes.Sergio es el unico ingeniero Electron/React. Mitigacion: Pablo hace cross-training en basicos para S4. Si Sergio se bloquea, Pablo cubre fixes de UI.
9.6.4 — Pablo Estrada — CEO / Product Engineer
Pablo Estrada — CEO / Product Engineer
Product + QA + Eval + UX/UI Approval + Launch + GTMProducto + QA + Eval + Aprobación UX/UI + Lanzamiento + GTM
Pablo wears three hats: CEO (strategy, beta users, launch, GTM), Product Engineer (system prompt, QA with real data), and Project Manager (sprint gates, decisions, team coordination). He also owns Project #17 CORE (Beautonomous) — the operational agent that makes the 4-person team operate as 10-15 engineers. First task: bootstrap CORE before any product code is written, and serves as the approval gate for UX/UI’s Design System (#18) deliverables — reviewing and signing off Figma components every 2 sprints. Pablo usa tres sombreros: CEO (estrategia, beta users, lanzamiento, GTM), Product Engineer (system prompt, QA con datos reales), y Project Manager (gates de sprint, decisiones, coordinacion de equipo). Tambien es dueno del Proyecto #17 CORE (Beautonomous) — el agente operacional que hace que el equipo de 4 opere como 10-15 ingenieros. Primera tarea: bootstrap CORE antes de escribir codigo de producto, y sirve como puerta de aprobación para los entregables del Design System (#18) de UX/UI — revisando y aprobando componentes Figma cada 2 sprints.
Sprint 0 (Pre-Sprint) — Project #17 CORE BootstrapSprint 0 (Pre-Sprint) — Bootstrap Proyecto #17 CORE
Goal: Beautonomous operational agent running in OpenClaw — all 4 engineers using it for task management, code review, and workflow orchestration before writing product code.Objetivo: Agente operacional Beautonomous corriendo en OpenClaw — los 4 ingenieros usandolo para manejo de tareas, code review, y orquestacion de workflows antes de escribir codigo de producto.
• Create OpenClaw project + authorize GitHub, Linear, Slack connectorsCrear proyecto OpenClaw + autorizar conectores GitHub, Linear, Slack
• Write system prompt: role mapping (El Capitan, El Mago, El Artesano), governance rules, repos, Slack channels, risk taxonomyEscribir system prompt: mapeo de roles (El Capitan, El Mago, El Artesano), reglas de gobernanza, repos, canales Slack, taxonomía de riesgo
• Configure 3 roles: Pablo=El Capitan, Mateo=El Mago, Andres+Sergio=El ArtesanoConfigurar 3 roles: Pablo=El Capitan, Mateo=El Mago, Andres+Sergio=El Artesano
• Validation: each team member runs 3 test queries successfullyValidacion: cada miembro del equipo ejecuta 3 queries de prueba exitosamente
Code Signing CertificatesCertificados Code Signing
• T0.9 — Apple Developer Program enrollment ($99/yr) + Developer ID Application certificate for .dmg code signing + notarizationT0.9 — Inscripción Apple Developer Program ($99/año) + certificado Developer ID Application para code signing + notarización .dmg
• T0.10 — Windows code signing certificate procurement (EV/OV) for SmartScreen trust on .exe buildsT0.10 — Adquisición certificado code signing Windows (EV/OV) para confianza SmartScreen en builds .exe
Brand BookBrand Book
• T0.11 — Brand Book delivery from external design team. Request following guidelines in core-product-design-system repo. Deliverable: complete visual identity (logo, colors, typography, usage rules). Required before T0.BB (Figma foundations delivery)T0.11 — Entrega Brand Book del equipo externo de diseño. Solicitar siguiendo lineamientos en repo core-product-design-system. Entregable: identidad visual completa (logo, colores, tipografía, reglas de uso). Requerido antes de T0.BB (entrega foundations Figma)
Dependencies: None — this is the FIRST thing that happens, before any product code.Dependencias: Ninguna — esto es lo PRIMERO que pasa, antes de cualquier codigo de producto.
CORE Governance: All subsequent projects must pass through Beautonomous for task tracking, PR review, and workflow execution.Gobernanza CORE: Todos los proyectos subsiguientes deben pasar por Beautonomous para tracking de tareas, review de PRs, y ejecucion de workflows.
Sprint 1-2 — Eval Scaffold + Brand RegistrationSprint 1-2 — Scaffold Eval + Registro de Marca
Goal: Eval Suite scaffold with initial golden dataset, brand registrations started, Apple/Windows store authorization.Objetivo: Scaffold Eval Suite con golden dataset inicial, registros de marca iniciados, autorización Apple/Windows store.
• T1.24 — Eval Fase 0 Setup + Golden Dataset: interfaces (IEvalPipeline, ILLMJudge, IGoldenDatasetManager), domain models, golden dataset 15-20 cases YAMLT1.24 — Eval Fase 0 Setup + Golden Dataset: interfaces (IEvalPipeline, ILLMJudge, IGoldenDatasetManager), modelos de dominio, golden dataset 15-20 casos YAML
• T1.26 — Brand registration in marketplaces: Amazon Brand Registry, Amazon Ads, MercadoLibre, Shopify. Weekly tracking of approval status. Coordinate with Andrés for API account alignmentT1.26 — Registro de marca ante marketplaces: Amazon Brand Registry, Amazon Ads, MercadoLibre, Shopify. Seguimiento semanal del estado de aprobación. Coordinar con Andrés para alineación de cuentas API
• T1.27 — Authorize app in Apple & Windows stores: Apple Developer Program enrollment ($99/yr) + code signing certificate. Microsoft Partner Center registration. Goal: verified publisher on both platformsT1.27 — Autorizar app en Apple & Windows Store: inscripción Apple Developer Program ($99/año) + certificado code signing. Registro en Microsoft Partner Center. Objetivo: publisher verificado en ambas plataformas
#18 Design System (UX/UI approves)#18 Design System (UX/UI aprueba)
• T0.BB — Approve Brand book + Foundations Figma delivery (end of week 1): HEX/RGB/HSL palette, typography .woff2, logo SVG, [LIB] Foundations & Tokens, [LIB] Iconography, [LIB] Core Components partial (Button, Icon, StatusDot, Spinner, Divider, TabBar)T0.BB — Aprobar entrega Brand book + Foundations Figma (fin de semana 1): paleta HEX/RGB/HSL, tipografía .woff2, logo SVG, [LIB] Foundations & Tokens, [LIB] Iconography, [LIB] Core Components parcial (Button, Icon, StatusDot, Spinner, Divider, TabBar)
• T1.BB — Approve Atoms + AI-native Atoms + Molecules base + Chat Organisms delivery (end of week 2): Input, Badge, Toggle, Tooltip, AvatarInitials, CreditBadge, ProgressBar, KbdShortcut + StreamingCursor, ThinkingPulse, ToolBadge, AgentStatusBar, RiskBadge, TTLCountdown + MessageBubble, ChatInputBar, ContextBar, OnboardingStepT1.BB — Aprobar entrega Atoms + AI-native Atoms + Molecules base + Organisms de chat (fin de semana 2): Input, Badge, Toggle, Tooltip, AvatarInitials, CreditBadge, ProgressBar, KbdShortcut + StreamingCursor, ThinkingPulse, ToolBadge, AgentStatusBar, RiskBadge, TTLCountdown + MessageBubble, ChatInputBar, ContextBar, OnboardingStep
Sprint 3-4 — LLM Judge + Linear Bootstrap + E2ESprint 3-4 — LLM Judge + Bootstrap Linear + E2E
• T2.24 — Eval Fase 1 LLM Judge + EvalRunner: AnthropicLLMJudge (Haiku standard, Sonnet critical), YamlDatasetLoader, CLI eval.ts + check-threshold.ts, 20 golden cases minimumT2.24 — Eval Fase 1 LLM Judge + EvalRunner: AnthropicLLMJudge (Haiku estándar, Sonnet crítico), YamlDatasetLoader, CLI eval.ts + check-threshold.ts, 20 golden cases mínimo
• T2.25 — E2E testing via Playground: full flows with real Sellerfy data, document QA → Linear via BeautonomousT2.25 — Testing E2E via Playground: flujos completos con datos reales Sellerfy, documentar QA → Linear via Beautonomous
• T2.26 — Bootstrap ~150 tasks in Linear: 6 cycles, L/M/S labels, critical path dependenciesT2.26 — Bootstrap ~150 tareas en Linear: 6 ciclos, labels L/M/S, dependencias ruta crítica
• T2.26a — Quality gate 5-step Beautonomous: structure → lint → tests → architecture review → convention checkT2.26a — Quality gate 5 pasos Beautonomous: structure → lint → tests → architecture review → convention check
#18 Design System (UX/UI approves)#18 Design System (UX/UI aprueba)
• T2.BB — Approve remaining Molecules + Data & Flow Organisms (end of S4): Select, Dropdown, Toggle labeled, Tooltip rich, ConfirmDialog REVERSIBLE/IRREVERSIBLE, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCard. Publish [LIB] Core Components completeT2.BB — Aprobar Molecules restantes + Organisms de datos y flujos (fin de S4): Select, Dropdown, Toggle labeled, Tooltip rich, ConfirmDialog REVERSIBLE/IRREVERSIBLE, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCard. Publicar [LIB] Core Components completo
Sprint 5-6 — Eval CI + Golden Dataset 50 + QASprint 5-6 — Eval CI + Golden Dataset 50 + QA
• T3.26 — Eval Fase 2 CI integration: eval-on-pr.yml in GitHub Actions, Coach staging → LLM Judge → EvalReport, PR blocked if !passedT3.26 — Eval Fase 2 CI integration: eval-on-pr.yml en GitHub Actions, Coach staging → LLM Judge → EvalReport, PR bloqueado si !passed
• T3.27 — Golden dataset 50 cases: 15 product, 10 pricing, 8 WRITE, 7 proactive, 10 edge casesT3.27 — Golden dataset 50 casos: 15 producto, 10 pricing, 8 WRITE, 7 proactivo, 10 edge cases
• T3.28 — QA conversation flows 3 marketplaces: test all flows with Sellerfy data, document issues → LinearT3.28 — QA flujos conversación 3 marketplaces: probar todos los flujos con datos Sellerfy, documentar issues → Linear
Eval Extension — Figma Quality Pipeline (7.5d)Extensión Eval — Pipeline Figma Quality (7.5d)
• T3.40 — Extend EvalConfig + CLI: add desktop_build and figma_quality as pipelineType. Models: DesktopBuildReport, FigmaQualityReport. CLI flags --pipeline=desktop_build / --pipeline=figma_quality (1d)T3.40 — Extender EvalConfig + CLI: agregar desktop_build y figma_quality como pipelineType. Modelos: DesktopBuildReport, FigmaQualityReport. Flags CLI --pipeline=desktop_build / --pipeline=figma_quality (1d)
• T3.41 — FigmaRESTClient: IFigmaAPIClient with getFile, getFileVariables, getFileComponents, getFileStyles. Auth via FIGMA_ACCESS_TOKEN (1.5d)T3.41 — FigmaRESTClient: IFigmaAPIClient con getFile, getFileVariables, getFileComponents, getFileStyles. Auth via FIGMA_ACCESS_TOKEN (1.5d)
• T3.42 — FigmaQualityRunner + variable checks: variable_architecture (3 collections), code_syntax (Code Syntax Web), semantic_aliasing (Semantic aliases Primitives), light_dark_modes (2 modes in Semantic) (2d)T3.42 — FigmaQualityRunner + checks de variables: variable_architecture (3 colecciones), code_syntax (Code Syntax Web), semantic_aliasing (Semantic aliasea Primitives), light_dark_modes (2 modos en Semantic) (2d)
• T3.43 — Component checks: auto_layout, naming_convention (slash naming), states_coverage (min states per type), color_hardcoding (no direct hex), spacing_hardcoding (no numeric values) (2d)T3.43 — Checks de componentes: auto_layout, naming_convention (slash naming), states_coverage (states mínimos por tipo), color_hardcoding (sin hex directo), spacing_hardcoding (sin valores numéricos) (2d)
• T3.44 — Quality checks + report: wcag_contrast (4.5:1 text, 3:1 UI), descriptions (published components), mcp_compatibility (semantic layer names). Generate compliance report per file (1d)T3.44 — Checks de calidad + reporte: wcag_contrast (4.5:1 texto, 3:1 UI), descriptions (componentes publicados), mcp_compatibility (nombres semánticos). Generar reporte compliance por archivo (1d)
#18 Design System (UX/UI approves)#18 Design System (UX/UI aprueba)
• T3.BB — Approve advanced Organisms + close Pattern Components (end of S6): ReActStream, DataTable, AuditLog, RollbackPanel, FraudAlert, ErrorRecovery A/B/C. Publish [LIB] Pattern Components completeT3.BB — Aprobar Organisms avanzados + cierre Pattern Components (fin de S6): ReActStream, DataTable, AuditLog, RollbackPanel, FraudAlert, ErrorRecovery A/B/C. Publicar [LIB] Pattern Components completo
Sprint 7-8 — Eval Automated + Beta Prep + Contract TestingSprint 7-8 — Eval Automatizado + Prep Beta + Contract Testing
• T4.17 — Automated Eval in CI: 50 golden cases on every push to main, fails if score <0.70T4.17 — Eval automatizado en CI: 50 golden cases en cada push a main, falla si score <0.70
• T4.18 — Proactive suggestions testing with real data: verify triggers, message quality, dedup, max 2/turnT4.18 — Testing proactivas datos reales: verificar triggers, calidad mensaje, dedup, máximo 2/turno
• T4.19 — Beta user selection + onboarding prep: 10-15 Sellerfy sellers, 2-min video, setup doc, 1-on-1 callsT4.19 — Selección beta users + prep onboarding: 10-15 vendedores Sellerfy, video 2 min, doc setup, calls 1-on-1
• T4.19a — Eval contract testing pipeline: consumer-driven contracts between reposT4.19a — Pipeline contract testing eval: contratos consumer-driven entre repos
• T4.19b — KB quality eval pipeline: precision@5, recall, hit rate, CI fails if <80%T4.19b — Pipeline eval calidad KB: precision@5, recall, hit rate, CI falla si <80%
Eval Extension — Desktop Build Pipeline (7d)Extensión Eval — Pipeline Desktop Build (7d)
• T4.25 — Code signing secrets: configure macOS certificates (Developer ID + Apple notarization) and Windows (Authenticode) in GitHub Secrets. Verify electron-builder recognizes them (1d)T4.25 — Secrets code signing: configurar certificados macOS (Developer ID + notarización Apple) y Windows (Authenticode) en GitHub Secrets. Verificar que electron-builder los reconoce (1d)
• T4.26 — DesktopBuildRunner + core checks: compilation (build + artifact exists), code signing (codesign/signtool verify), notarization (spctl, macOS only), app startup (headless <5s), bundle size (<250MB), native modules (require without error) (3d)T4.26 — DesktopBuildRunner + checks core: compilation (build + artefacto existe), code signing (codesign/signtool verify), notarization (spctl, solo macOS), arranque app (headless <5s), bundle size (<250MB), módulos nativos (require sin error) (3d)
• T4.27 — Secondary checks: auto-updater (feed URL resolves), deep links (shopilot:// in Info.plist/Windows registry), window rendering (console.error), IPC channels (ping/pong). Warnings, not blockers (1d)T4.27 — Checks secundarios: auto-updater (URL feed resuelve), deep links (shopilot:// en Info.plist/registro Windows), window rendering (console.error), canales IPC (ping/pong). Warnings, no blockers (1d)
• T4.28 — GitHub Actions desktop-build-eval.yml: 3 jobs — build-macos (macos-14 runner), build-windows (windows-latest), report (aggregate + PR comment). Trigger: PRs touching desktop-client (1.5d)T4.28 — GitHub Actions desktop-build-eval.yml: 3 jobs — build-macos (runner macos-14), build-windows (runner windows-latest), report (agregar + comentario PR). Trigger: PRs que tocan desktop-client (1.5d)
• T4.29 — GitHub Actions figma-quality-eval.yml: triggers workflow_dispatch + weekly cron (Monday 8:00 UTC). Publishes report as GitHub issue or Slack #engineering message (0.5d)T4.29 — GitHub Actions figma-quality-eval.yml: triggers workflow_dispatch + cron semanal (lunes 8:00 UTC). Publica reporte como GitHub issue o mensaje Slack #engineering (0.5d)
#18 Design System (UX/UI approves)#18 Design System (UX/UI aprueba)
• T4.BB — Approve Figma quality audit + corrections (end of S8): all frames “Ready for development”, zero generic names, variables verified DevMode, all interactive states, changelog updated, Figma annotationsT4.BB — Aprobar auditoría calidad Figma + correcciones (fin de S8): todos los frames “Ready for development”, cero nombres genéricos, variables verificadas DevMode, todos los states interactivos, changelog actualizado, annotations Figma
Sprint 9-10 — Beta Onboarding + Feedback + Security + Go/No-GoSprint 9-10 — Onboarding Beta + Feedback + Seguridad + Go/No-Go
• T5.11 — Beta onboarding 10-15 sellers: .dmg → connect marketplace → first query → first action. 30-min 1-on-1 callsT5.11 — Onboarding beta 10-15 vendedores: .dmg → conectar marketplace → primera query → primera acción. Calls 1-on-1 30 min
• T5.12 — Feedback calls + iteration: 15-min with each beta user, top 5 issues → LinearT5.12 — Feedback calls + iteración: 15 min con cada beta user, top 5 issues → Linear
• T5.13 — OWASP top 10 security review: document findings + fix P1sT5.13 — Review seguridad OWASP top 10: documentar hallazgos + arreglar P1s
• T5.14 — Beautonomous System Prompt v2: iteration based on 10 weeks real usageT5.14 — System Prompt v2 Beautonomous: iteración basada en 10 semanas uso real
• T5.15 — Go/No-Go: 60-min final sync, full checklist, Pablo signs off GoT5.15 — Go/No-Go: sync final 60 min, checklist completo, Pablo firma Go
• T5.15a — E2E eval pipeline: full query→tools→response flow, 10+ scenariosT5.15a — Pipeline E2E eval: flujo completo query→tools→response, 10+ escenarios
#18 Design System#18 Design System
No BB task in S9-10 — Figma pipeline closed. UX/UI available only for ad-hoc queries.No hay tarea BB en S9-10 — pipeline Figma cerrado. UX/UI disponible solo para consultas puntuales.
Sprint 11-12 — Buffer: Eval Iteration + DocumentationSprint 11-12 — Buffer: Iteración Eval + Documentación
Goal: Push Eval score from 0.70 → 0.80 using real beta conversation data. Second feedback round. Technical documentation and postmortem.Objetivo: Subir Eval score de 0.70 → 0.80 usando conversaciones reales de beta. Segundo round de feedback. Documentación técnica y postmortem.
• Eval iteration: new golden cases derived from observed failures in beta conversationsIteración Eval: nuevos golden cases derivados de fallos observados en conversaciones de beta
• Eval score target 0.80: refine rubrics, add edge cases, calibrate LLM JudgeTarget Eval score 0.80: refinar rubrics, agregar edge cases, calibrar LLM Judge
• Second beta feedback round: 15-min calls with active users, document usage patterns, most-requested featuresSegundo round feedback beta: calls 15 min con usuarios activos, documentar patrones, features más pedidas
• Technical documentation + postmortem: architecture decisions, lessons learned, runbook for v2Documentación técnica + postmortem: decisiones de arquitectura, lecciones aprendidas, runbook para v2
Circuit breaker output: Any eval work cut from S7-10 lands here.Output del circuit breaker: Cualquier trabajo de eval cortado de S7-10 llega aquí.
Key Role: Three HatsRol Clave: Tres Sombreros
CEO (decisions, strategy, beta, GTM) + Product Engineer (prompt, QA) + PM (sprint gates, go/no-go calls, team coordination). Only person with full product+technical+business context.CEO (decisiones, estrategia, beta, GTM) + Product Engineer (prompt, QA) + PM (gates de sprint, calls go/no-go, coordinacion de equipo). Unica persona con contexto completo de producto+tecnico+negocio.
9.7 Sprint Execution — 100% Task Breakdown Ejecucion Sprint — 100% Desglose de Tareas
CTO + PM perspective. Every task from all 19 active projects. Linear-exportable. Project #17 CORE governance referenced per project. Perspectiva CTO + PM. Cada tarea de los 19 proyectos activos. Exportable a Linear. Gobernanza Proyecto #17 CORE referenciada por proyecto.
Phase 0 — Pre-Sprint: #17 CORE BootstrapFase 0 — Pre-Sprint: Bootstrap #17 CORE
Week 0 • 11 tasks • Pablo (lead) + Mateo (support)Semana 0 • 11 tareas • Pablo (líder) + Mateo (soporte)Beautonomous is the operational agent that makes 4 engineers operate as 10-15. Provides task management (Linear), code review (GitHub), and governance — all via OpenClaw. No product code until CORE is operational. Source: core-internal-team-workflow/.claude/specs/development-plan.md Phases 0–2.Beautonomous es el agente operacional que hace que 4 ingenieros operen como 10–15. Provee gestión de tareas (Linear), code review (GitHub) y gobernanza — todo vía OpenClaw. Sin código de producto hasta que CORE esté operacional. Fuente: core-internal-team-workflow/.claude/specs/development-plan.md Fases 0–2.
| ID | TaskTarea | OwnerDueño | Proj | TimeTiempo | DependsDepende |
|---|---|---|---|---|---|
| T0.1 | Crear Proyecto OpenClaw Proyecto ‘Beautonomous’, tipo agente operacional, 4 miembros invitados | Pablo | #17 | 30m | — |
| T0.2 | Conectar OAuth GitHub Autorizar organización, seleccionar 11 repos | Mateo | #17 | 30m | T0.1 |
| T0.3 | Conectar OAuth Linear Workspace ‘beautonomous’, equipo AUT, lectura+escritura | Pablo | #17 | 30m | T0.1 |
| T0.4 | Conectar OAuth Slack Canales #engineering, #deploys, #general. Lectura + envío | Mateo | #17 | 30m | T0.1 |
| T0.5 | System Prompt v1 Beautonomous Identidad, roles (Capitán/Mago/Artesano), 6 reglas gobernanza, repos, canales Slack. ~500 palabras. NO es el prompt del Coach | Pablo | #17 | 4h | T0.1 |
| T0.6 | Configurar Mapeo de Roles pablo→Capitán, mateo→Mago, andres/sergio→Artesano. Permisos por rol per spec F2.1–F2.3 | Pablo | #17 | 1h | T0.5 |
| T0.7 | Crear Estructura Linear 17 proyectos, 6 ciclos (2 sem c/u incl buffer), labels L/M/S + Track-{ingeniero} + Risk-level. Workflow: Backlog→Todo→In Progress→In Review→Done | Pablo | #17 | 2h | T0.3 |
| T0.8 | Validación 4 miembros × 3 queries de prueba (1 lectura GitHub/Linear, 1 creación tarea, 1 lectura código). Verificar permisos de rol | Los 4 | #17 | 1h | T0.2–T0.6 |
| T0.9 | Apple Developer Program enrollmentInscripción Apple Developer Program Enroll in Apple Developer Program ($99/yr). Request Developer ID Application certificate for code signing + notarization. Required for signed .dmg builds at Gate 1 (S4)Inscribirse en Apple Developer Program ($99/año). Solicitar certificado Developer ID Application para code signing + notarización. Requerido para builds .dmg firmados en Gate 1 (S4) | Pablo | #1 | 1h | — |
| T0.10 | Windows code signing certificate procurementAdquisición certificado code signing Windows Procure EV or OV code signing certificate for Windows .exe builds. Required for SmartScreen trust at Gate 1 (S4). Vendor options: DigiCert, Sectigo, GlobalSignAdquirir certificado code signing EV u OV para builds .exe Windows. Requerido para confianza SmartScreen en Gate 1 (S4). Opciones: DigiCert, Sectigo, GlobalSign | Pablo | #1 | 1h | — |
| T0.11 | Brand Book delivery from external design teamEntrega Brand Book del equipo externo de diseño Request brand book from external design team following the guidelines documented in | Pablo | #18 | 1d | — |
✓ Checkpoint: Beautonomous operational. From here everything is tracked in Linear via Beautonomous.Checkpoint: Beautonomous operacional. Desde aquí todo se trackea en Linear vía Beautonomous.
Sprints 1-2 — Walking SkeletonSprints 1-2 — Walking Skeleton
Weeks 1-2 • 37 tasksSemanas 1-2 • 37 tareas| ID | TaskTarea | Proj | TimeTiempo | DependsDepende |
|---|---|---|---|---|
| Mateo#2 Orchestrator · #4 Personality · #5 Context · #8 Observability · #9 Cerebro KB — 12 tareas | ||||
| T1.1 | Corrección DynamoDB (Fase -1) IDs UUID→ULID, Trace SK a | #2 | 3d | T0.8 |
| T1.2 | UserProfile entity pk: | #2 | 1d | T1.1 |
| T1.3 | Historial en el prompt (Fase 0.1) Últimos N mensajes en prompt. Método | #2 | 2d | T1.1 |
| T1.4 | ILLMClient update
| #2 | 2d | — |
| T1.5 | SystemPromptComposer L1+L2 L1 identidad base (~500 tok, | #4 | 2d | T1.2 |
| T1.6 | AgentLoopOrchestrator (Fase 0.3) Loop ReAct | #2 | 3d | T1.3, T1.4, T1.5 |
| T1.7 | RestResponseEventEmitter Modo REST (Phase 0.3, sin streaming). Respuesta completa después de todas las rondas. Eventos internos para logging | #2 | 4h | T1.6 |
| T1.8 | Verificar Observability con ReAct ConversationTrace + AgentTracking existentes compatibles con loop multi-step. Agregar tool calls, round count, cost por turno a trazas | #8 | 1d | T1.6 |
| T1.21 | KB Fase 0 — Fix duplicados TRUNCATE antes de embed. | #9 | 2d | — |
| T1.22 | KB Fase 1 — Contextual Retrieval Prefijo contextual | #9 | 2d | T1.21 |
| T1.23 | Contenido KB: 15-20 docs curados Mejores prácticas MeLi, políticas Amazon, guías Shopify, estrategias pricing, optimización fotos, métricas, FAQ vendedores | #9 | 5d | — |
| T1.25 | 10 READ tool specs Para cada tool: name, description LLM, input_schema JSON Schema, risk level, credit cost. 10 tools: get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics | #9 | 2d | — |
| Andrés#12 Marketplace Provider · #10 Data Sync · #14 DevOps — 13 tareas | ||||
| T1.9 | Scaffold Marketplace Provider Clean Architecture + DDD. Value Objects (Marketplace, SKU, MarketplaceCredential). Error types (MarketplaceAPIError, AuthenticationError, RateLimitError). DI container | #12 | 1d | T0.8 |
| T1.10 | IMarketplaceAdapter interface 23 métodos, 4 dominios (Catalog, Engagement, Advertising, Enrollment). ISKUResolver: SKU → marketplace ID nativo | #12 | 4h | T1.9 |
| T1.11 | AES256GCMCipher + ITokenManager Cifrado tokens at rest. DynamoDB table | #12 | 2d | T1.9 |
| T1.12 | MeLiOAuth2Flow + MeLiAdapter OAuth2 code flow. REST API MeLi (/users/me/items, /orders/search). Mapeo errores MeLi → errores estandarizados. Código reutilizado de | #12 | 3d | T1.10, T1.11 |
| T1.13 | AmazonLWAFlow + AmazonAdapter scaffold OAuth2 LWA. SP-API SDK. Rate limiting por familia de API. Solo scaffold — full impl en S3-4 (depende E1 approval 2-4 sem) | #12 | 2d | T1.10, T1.11 |
| T1.14 | Verificar Terraform GCP existente Confirmar GCS buckets, Cloud Run, Airflow, BigQuery operacionales. Fix si necesario | #14 | 1d | T0.8 |
| T1.15 | Solicitar dependencias externas (E1-E5) Amazon SP-API dev account (día 1), MeLi dev portal, Shopify Partners, Apple Developer Program. Documentar en Linear | #14 | 4h | T0.8 |
| T1.15a | SellerConnection aggregate State machine 5 estados (disconnected→pending→active→expired→revoked). Transiciones validadas. Persiste en DynamoDB | #12 | 1d | T1.9 |
| T1.15b | MarketplaceAction entity + IMarketplaceActionRepository Registro de cada acción. Campos: actionId, sellerId, marketplace, method, status, requestPayload, responsePayload, latencyMs | #12 | 4h | T1.9 |
| T1.15c | IOAuth2Flow interface (domain port) Puerto genérico para flujos OAuth2 (authorize, exchangeCode, refreshToken). MeLi/Amazon/Shopify implementan | #12 | 4h | T1.9 |
| T1.28 | Collect missing WRITE API docsRecolectar docs APIs WRITE faltantes Map WRITE actions for #3 Tool Registry. Existing docs collected; complete remaining: MeLi: 3, Amazon Ads: 5, Amazon: 2, Shopify: 9. Organize per marketplace in shared repoMapear acciones WRITE para #3 Tool Registry. Docs existentes recolectados; completar faltantes: MeLi: 3, Amazon Ads: 5, Amazon: 2, Shopify: 9. Organizar por marketplace en repo compartido | #12 | 3d | T0.8 |
| T1.29 | Collect user management provider docsRecolectar docs gestor de usuarios externo Research external auth provider (authentication, authorization, credential management). Document service methods exposed to consumer layers. Evaluate options (Auth0, Clerk, Memberstack)Investigar proveedor externo de auth (autenticación, autorización, administración de credenciales). Documentar métodos de servicio expuestos a capas consumidoras. Evaluar opciones (Auth0, Clerk, Memberstack) | #12 | 2d | T0.8 |
| T1.33 | GitHub Actions CI: electron-builder on release/* branchGitHub Actions CI: electron-builder en rama release/* GitHub Actions workflow: trigger on | #14 | 0.5d | T1.32 |
| Sergio#1 Native Shell — 7 tareas | ||||
| T1.16 | Scaffold Electron + electron-builder Electron 28+. Entry point main process. Preload scripts con contextBridge. Hot reload dev. Scripts: dev/build/pack | #1 | 1d | T0.8 |
| T1.17 | MainWindow + WebContentsView WebContentsView (NO BrowserView — deprecado E26). 70% ancho ventana. Controles navegación. Persistencia sesión marketplace | #1 | 2d | T1.16 |
| T1.18 | MarketplaceDetector Patterns URL MeLi/Amazon/Shopify. Detectar tipo página (product/dashboard/orders), extraer IDs. Remote config JSON con fallback local | #1 | 1d | T1.17 |
| T1.19 | Sistema de Tabs + Sidebar container Tabs marketplace + sidebar React 360px derecha. Componentes UI de | #1 | 2.5d | T1.17, T0.BB |
| T1.20 | Auth Memberstack JWT en electron-store cifrado con clave del OS. Login/logout flow. AuthService en main process | #1 | 1d | T1.16 |
| T1.32 | First .dmg + .exe canary build (unsigned)Primer build canary .dmg + .exe (sin firmar) Run | #1 | 1d | T1.16 |
| T1.MK1 | Mockup shell containerMockup shell container Assemble sidebar + tabs with T0.BB tokens in React. Validate visual integration of Figma foundations in real Electron contextEnsamblar sidebar + tabs con tokens de T0.BB en React. Validar integración visual de foundations Figma en contexto Electron real | #1 | 0.5d | T0.BB |
| Pablo#16 Eval Suite · #17 Beautonomous · #10 Data Sync · #1 Native Shell — 3 tareas | ||||
| T1.24 | Eval Fase 0 — Setup + Golden Dataset package.json (sin servidor). Interfaces dominio (IEvalPipeline, ILLMJudge, IGoldenDatasetManager). Golden dataset 15-20 casos YAML (fees, scope, metrics) | #16 | 3d | — |
| T1.26 | Brand registration in marketplacesRegistro de marca ante marketplaces Start brand registration process in Amazon Brand Registry, Amazon Ads, MercadoLibre, and Shopify. Track approval status weekly. Coordinate with Andrés for API developer account alignmentIniciar proceso de registro de marca en Amazon Brand Registry, Amazon Ads, MercadoLibre y Shopify. Dar seguimiento semanal al estado de aprobación. Coordinar con Andrés para alinear con cuentas developer de API | #10 | 3d | T0.8 |
| T1.27 | Authorize app in Apple & Windows storesAutorizar app en Apple & Windows Store Apple Developer Program enrollment ($99/yr) + code signing certificate request. Microsoft Partner Center registration for Windows Store publishing. Goal: app recognized as verified publisher on both platformsInscripción en Apple Developer Program ($99/año) + solicitud de certificado code signing. Registro en Microsoft Partner Center para publicación en Windows Store. Objetivo: app reconocida como publisher verificado en ambas plataformas | #1 | 2d | T0.8 |
| UX/UI + Pablo#18 Design System — 2 tareas | ||||
| T0.BB | Brand book + Foundations Figma (week 1 delivery)Brand book + Foundations Figma (entrega semana 1) Brand book (D1–D9 resolved, HEX/RGB/HSL palette, typography .woff2, logo SVG). [LIB] Foundations & Tokens (00 Primitives, 01 Semantic Light/Dark, Code Syntax, Text Styles). [LIB] Iconography (Lucide 40+ icons). [LIB] Core Components partial (Button, Icon, StatusDot, Spinner, Divider, TabBar). Owner: UX/UI executes + Pablo approves end week 1Brand book (D1–D9 resueltas, paleta HEX/RGB/HSL, tipografía .woff2, logo SVG). [LIB] Foundations & Tokens (00 Primitives, 01 Semantic Light/Dark, Code Syntax, Text Styles). [LIB] Iconography (Lucide 40+ íconos). [LIB] Core Components parcial (Button, Icon, StatusDot, Spinner, Divider, TabBar). Owner: UX/UI ejecuta + Pablo aprueba fin semana 1 | #18 | 4d | T0.11 |
| T1.BB | Atoms + AI-native Atoms + Molecules base + Chat Organisms (week 2 delivery)Atoms + Atoms AI-nativos + Molecules base + Organismos de chat (entrega semana 2) Atoms: Input, Badge, Toggle, Tooltip, AvatarInitials, CreditBadge, ProgressBar, KbdShortcut. AI-native: StreamingCursor, ThinkingPulse, ToolBadge, AgentStatusBar, RiskBadge, TTLCountdown. Molecules: InputField, SearchBar, CreditDisplay. Organisms: MessageBubble, ChatInputBar, ContextBar, OnboardingStep. Owner: UX/UI executes + Pablo approves end week 2Atoms: Input, Badge, Toggle, Tooltip, AvatarInitials, CreditBadge, ProgressBar, KbdShortcut. AI-nativos: StreamingCursor, ThinkingPulse, ToolBadge, AgentStatusBar, RiskBadge, TTLCountdown. Molecules: InputField, SearchBar, CreditDisplay. Organismos: MessageBubble, ChatInputBar, ContextBar, OnboardingStep. Owner: UX/UI ejecuta + Pablo aprueba fin semana 2 | #18 | 6d | T0.BB |
Sprints 3-4 — Core EnginesSprints 3-4 — Motores Core
Weeks 3-4 • 38 tasksSemanas 3-4 • 38 tareas| ID | TaskTarea | Proj | TimeTiempo | DependsDepende |
|---|---|---|---|---|
| Mateo#3 Tool Registry · #5 Context Agg · #2 Orchestrator · #11 Enrichment · #9 Cerebro KB — 15 tareas | ||||
| T2.1 | ToolRegistry + ToolDefinition
| #3 | 2d | T1.6, T1.25 |
| T2.2 | IToolExecutor + ToolExecutor Interfaz: | #3 | 1d | T2.1 |
| T2.3 | ToolPolicyFilter Risk gate (irreversible → confirmación obligatoria) + marketplace gate (tool no disponible si MP no configurado). Extensible sin tocar executor | #3 | 1d | T2.1 |
| T2.4 | HookLifecycle
| #3 | 1d | T2.2 |
| T2.5 | 10 READ tool handlers (stubs) Handlers para las 10 READ tools. Stubs HTTP con datos mock. Estructura: | #3 | 2d | T2.1 |
| T2.5a | ToolResult domain model toolName, args, result, isError, latencyMs, cached, creditCost. Valor inmutable usado por HookLifecycle, caching y trazas | #3 | 4h | T2.1 |
| T2.5b | update_user_profile SYSTEM tool handler Actualiza UserProfile (marketplaces, categories, goals). El LLM lo invoca cuando detecta info nueva del vendedor en la conversación | #3 | 4h | T2.1, T1.2 |
| T2.5c | contextSummary Resumen automático de conversación cuando historial supera threshold de tokens. Campos opcionales en Conversation: | #5 | 1d | T1.3 |
| T2.5d | 17 WRITE tool stubs Registrar las 17 WRITE tools en ToolRegistry con ConfirmationRequired policy. Sin handler real — retornan NotImplemented. Permite al LLM “verlas” y planificar | #3 | 4h | T2.1 |
| T2.6 | IContextAssembler Formalizar RagOrchestrator como IContextAssembler. KB + Brand Health RAG en paralelo, single embedding. Degradación graceful: fallo en KB o brand health nunca bloquea respuesta | #5 | 2d | T1.6 |
| T2.7 | Health summary estructurado
| #5 | 1d | T2.6 |
| T2.8 | Prompt caching Anthropic SystemPromptBlock[] con | #2 | 1d | T1.5 |
| T2.9 | Tool result caching in-memory
| #3 | 4h | T2.2 |
| T2.22 | KB Fase 2 — Procesamiento incremental Content hash SHA-256 por documento. | #9 | 2d | T1.21 |
| T2.23 | KB Fase 3 — Batch embeddings Enviar hasta 250 textos por llamada Vertex AI (vs 1-by-1). Retry con backoff en 429/5xx. Goroutine pool con semáforo (max 5). ~6000 calls → ~24 calls | #9 | 2d | T1.21 |
| Andrés#12 Marketplace Provider · #10 Data Sync · #14 DevOps — 10 tareas | ||||
| T2.10 | ShopifyOAuth2Flow + ShopifyAdapter OAuth2 Shopify (requiere URL tienda del vendedor). GraphQL Admin API. Rate limiting (throttling basado en costo Shopify). Queries productos, órdenes, inventario | #12 | 3d | T1.10, T1.11 |
| T2.11 | AmazonAdapter completo (si E1 aprobado) SP-API SDK completo. Reports, Catalog Items, Orders. Rate limit 5 req/s con backoff exponencial. Si E1 no aprobado → diferir a S5 | #12 | 3d | T1.13 |
| T2.12 | TokenRefreshCron EventBridge rule cada 5min. Pre-refresh 30min antes de expiración. Mutex DynamoDB (evitar race condition). Umbral 3 fallos → alerta Slack | #12 | 1d | T1.11, T1.12 |
| T2.13 | Data Sync Fase 0.5 — Clean Architecture API Refactor services/api/ sin cambio de comportamiento. IDataReader, ITokenProvider, VOs (UserId, Marketplace, DateRange) en dominio | #10 | 2d | T0.8 |
| T2.14 | DAGs existentes verificados Verificar DAGs MeLi + Shopify @hourly sin errores. Verificar schemas Bronze. Fix si necesario | #10 | 1d | T2.13 |
| T2.15 | CDK base AWS DynamoDB conversation-api (GSI corregido de T1.1), Lambda + API Gateway v2 HTTP, VPC + NAT. Marketplace Provider: DynamoDB marketplace-credentials, Secrets Manager, EventBridge | #14 | 2d | T1.1 |
| T2.16 | GitHub Actions CI multi-repo lint + type-check + unit tests en cada PR para los 4 repos activos. Build cache via actions/cache. Status checks obligatorios | #14 | 1d | — |
| T2.16a | marketplace-actions DynamoDB table en CDK Tabla para MarketplaceAction entity. pk sellerId, sk actionId. GSI por marketplace+status | #14 | 4h | T2.15 |
| T2.16b | AmazonAdsOAuth2Flow (dual OAuth) Flujo OAuth2 separado para Amazon Ads API (distinto de LWA para SP-API). Credenciales separadas en Secrets Manager | #12 | 1d | T1.13 |
| T2.16c | ISKUResolver implementations MeLi (ML prefix + item ID), Amazon (ASIN), Shopify (numeric product ID). Mapeo bidireccional SKU interno ↔ ID nativo marketplace | #12 | 1d | T1.10 |
| Sergio#1 Native Shell — 8 tareas | ||||
| T2.17 | Chat UI + Markdown rendering Input texto + markdown en sidebar. Burbujas usuario/asistente. Indicadores: “pensando”, “ejecutando tool X”, “listo”. Syntax highlighting en bloques código. +0.5d integración componentes T1.BB | #1 | 2.5d | T1.19, T1.BB |
| T2.18 | CoachWebSocketService WebSocket client en main process. Reconexión backoff exponencial (1s→2s→4s...max 30s). Heartbeat ping/pong 30s. Fallback: REST polling cada 2s | #1 | 1d | T1.7 |
| T2.19 | Inyección contexto URL→metadata Detectar URL actual en WebContentsView → extraer marketplace, tipo página, product IDs via MarketplaceDetector → enviar como metadata con cada mensaje | #1 | 1d | T1.18 |
| T2.20 | Navegación vistas react-router /chat (default), /profile, /billing, /enrollment, /onboarding. Tab bar inferior. Estado chat persistente entre cambios de vista | #1 | 1d | T2.17 |
| T2.21 | OnboardingWizard 5 pasos (1) Bienvenida, (2) Conectar marketplace (OAuth inline), (3) Setup perfil, (4) Primera query guiada, (5) Éxito + próximos pasos. Solo primer launch (flag localStorage). Skip desde paso 3. +0.5d componentes T1.BB (OnboardingStep) | #1 | 2.5d | T2.17, T1.12, T1.BB |
| T2.40 | Gate 1 signed build: .dmg notarized + .exe signedBuild firmado Gate 1: .dmg notarizado + .exe firmado Apple | #1 | 1d | T0.9, T0.10 |
| T2.MK1 | Mockup ChatViewMockup ChatView Assemble complete chat view in React: MessageBubbles + ContextBar (top) + ChatInputBar (bottom) + AgentStatusBar. Validate T1.BB components integrationEnsamblar vista completa de chat en React: MessageBubbles + ContextBar (arriba) + ChatInputBar (abajo) + AgentStatusBar. Validar integración componentes T1.BB | #1 | 1d | T1.BB |
| T2.MK2 | Mockup OnboardingWizardMockup OnboardingWizard Assemble 5 navigable steps with OnboardingStep component from T1.BB. Validate step transitions and progress indicatorsEnsamblar 5 pasos navegables con componente OnboardingStep de T1.BB. Validar transiciones de pasos e indicadores de progreso | #1 | 0.5d | T1.BB |
| Pablo#16 Eval Suite · #17 Beautonomous — 4 tareas | ||||
| T2.24 | Eval Fase 1 — LLM Judge + EvalRunner AnthropicLLMJudge (Haiku standard, Sonnet critical). YamlDatasetLoader. EvalRunner orquesta pipeline: dataset → coach → judge → report. CLI eval.ts + check-threshold.ts. 20 golden cases mínimo | #16 | 3d | T1.24 |
| T2.25 | Testing E2E via Playground Probar flujos completos con datos reales Sellerfy. Documentar QA findings → issues Linear via Beautonomous | #16 | 2d | T1.6, T2.1 |
| T2.26 | Bootstrap ~150 tareas en Linear Crear masivamente tareas vía Beautonomous. 6 ciclos, labels L/M/S, dependencias ruta crítica. Aprobación 4 ingenieros antes de S1 | #17 | 4h | T0.7 |
| T2.26a | Quality gate 5-step Beautonomous Configurar pipeline: structure → lint → tests → architecture review → convention check. Se ejecuta antes de aprobar PRs via OpenClaw | #17 | 1d | T0.5 |
| UX/UI + Pablo#18 Design System — 1 tarea | ||||
| T2.BB | Remaining Molecules + Data & Flow OrganismsMolecules restantes + Organismos de datos y flujos Molecules: Select, Dropdown, Toggle labeled, Tooltip rich, ProgressBar labeled, KbdShortcut combo. Publish [LIB] Core Components complete. Organisms: ConfirmDialog REVERSIBLE/IRREVERSIBLE, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCard. Owner: UX/UI executes + Pablo approves end S4Molecules: Select, Dropdown, Toggle labeled, Tooltip rich, ProgressBar labeled, KbdShortcut combo. Publicar [LIB] Core Components completo. Organismos: ConfirmDialog REVERSIBLE/IRREVERSIBLE, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCard. Owner: UX/UI ejecuta + Pablo aprueba fin S4 | #18 | 6d | T1.BB |
Sprints 5-6 — WRITE Tools + Billing + EnrichmentSprints 5-6 — Tools WRITE + Billing + Enrichment
Weeks 5-6 • 42 tasksSemanas 5-6 • 42 tareas| ID | TaskTarea | Proj | TimeTiempo | DependsDepende |
|---|---|---|---|---|
| Mateo#3 Tool Registry · #6 Proactive · #7 Guardrails · #11 Enrichment · #2 Orchestrator · #9 Cerebro KB · #18 Design System — 15 tareas | ||||
| T3.1 | 10 READ handlers reales Conectar handlers a Fast Data Layer (11 endpoints FastAPI) o directo a Marketplace Provider si FDL no disponible. Cada handler: validación Zod → llamada HTTP → mapeo respuesta → ToolResult | #3 | 3d | T2.5, T2.13 |
| T3.2 | ConfirmationFlow WRITE detectada → pausar ejecución → mostrar diff (before/after) al usuario → esperar Aceptar/Rechazar → ejecutar o cancelar. Timeout 35min. OrchestrationSession en DynamoDB (TTL 35min) | #2 | 2d | T1.6, T2.3 |
| T3.3 | 4 WRITE tool handlers update_product_content (reversible), update_price (irreversible), pause_product (reversible), activate_product (reversible). Snapshot pre-write → confirmación → execute via IMarketplaceAdapter → verify → log | #3 | 3d | T3.2 |
| T3.4 | ProactiveSuggestionService afterTool hook. LLM evalúa resultado → | #6 | 2d | T2.4 |
| T3.5 | IGuardService + InputGuard Detección prompt injection (pattern matching) + filtrado fuera de scope. Degradación graciosa: si guard falla → deja pasar, log warning | #7 | 1d | T1.6 |
| T3.5a | HttpCreditGate en conversation-api Cliente HTTP → POST /internal/gate de #13 Billing antes de ejecutar cada tool. Credit matrix: READ=1, ANALYSIS=2, WRITE=3. Fail-open si billing no responde | #2 | 1d | T3.24 |
| T3.6 | Enrichment scaffold + interfaces IEnrichmentService, IMarketIntelligenceAdapter, IContentAnalysisAdapter, IEnrichmentCache en dominio. Modelos: MarketProduct, ImageAnalysisResult, EnrichmentResult. EnrichmentContainer DI | #11 | 1d | T0.8 |
| T3.7 | MeliMarketIntelligenceAdapter MeLi Search API + Items API (gratis, sin credenciales). | #11 | 2d | T3.6 |
| T3.8 | VisionLLMContentAdapter Claude Vision para | #11 | 1d | T3.6 |
| T3.9 | RedisEnrichmentCache + EnrichmentService Cache con TTL por tool (15min-24h). Router: marketplace → adapter correcto. Fallo provider → EnrichmentResult con error, nunca excepción | #11 | 1d | T3.7, T3.8 |
| T3.10 | Enrichment CDK Stack Lambda + API Gateway + ElastiCache Redis + VPC | #11 | 1d | T3.9 |
| T3.11 | 8 ANALYSIS tool handlers Conectar a IEnrichmentService. 5 operativos (search_market, competitor, pricing, image, video) + get_keyword_data + get_product_fee_estimate + enhance_image (NotImplemented) | #3 | 2d | T3.9 |
| T3.12 | HallucinationChecker Verificar claims numéricos (fees, métricas) contra tool results post-generación. Log pero no bloquear (Phase 1) | #2 | 1d | T3.1 |
| T3.25 | KB Indexación BigQuery Indexar 15-20 docs en BigQuery vía pipeline Go. Verificar top-5 semantic search para 5 queries de prueba | #9 | 1d | T1.22, T1.23 |
| T3.32 | Token pipeline + Style DictionaryToken pipeline + Style Dictionary Extract | #18 | 2d | T0.BB |
| Andrés#10 Data Sync · #12 Marketplace Provider · #14 DevOps — 6 tareas | ||||
| T3.13 | Fast Data Layer — 11 endpoints FastAPI 1:1 con Tool Registry. GET | #10 | 3d | T2.13 |
| T3.14 | GCS snapshots para ConfirmationFlow Router | #10 | 1d | T3.13 |
| T3.15 | DAG Amazon IExtractor, ILoader para Amazon. AmazonAuthManager + AmazonExtractor + AmazonLoader. Verificar schemas Bronze MeLi + Shopify | #10 | 3d | T2.14, T2.11 |
| T3.16 | IRateLimiter por marketplace 3 implementaciones: MeLi token bucket 1500/min, Amazon burst/restore, Shopify leaky bucket cost-points. Contador Redis. Retorna 429 con retry-after | #12 | 1d | T1.12 |
| T3.17 | Onboarding trigger Primer sync post-onboarding. Cuando usuario conecta marketplace → trigger DAG sincronización inicial | #12 | 1d | T1.12, T2.14 |
| T3.18 | CI/CD multi-repo completado 11 repos con GitHub Actions. Deploy automático staging en merge a main. Secrets en GitHub Org Secrets | #14 | 2d | T2.16 |
| Sergio#1 Native Shell · #13 Billing — 12 tareas | ||||
| T3.19 | BillingView Current plan, remaining credits, usage stats. Buttons → Stripe Checkout in system browser (not in-app). Low credit alerts. +0.5d integration of T2.BB components (CreditEconomy, MarketplaceKPI, CreditDisplay)Plan actual, créditos restantes, stats uso. Botones → Stripe Checkout en navegador del sistema (no in-app). Alertas créditos bajos. +0.5d integración componentes T2.BB (CreditEconomy, MarketplaceKPI, CreditDisplay) | #1 | 2.5d | T2.20, T2.BB |
| T3.20 | Diálogos confirmación WRITE Diff-style display (red/green). “I will change title from X to Y” → Accept/Reject. Timeout 35min with 5min reminder. Integrates with ConfirmationFlow T3.2. Uses ConfirmDialog REVERSIBLE + IRREVERSIBLE from T2.BBDisplay estilo diff (rojo/verde). “Voy a cambiar título de X a Y” → Aceptar/Rechazar. Timeout 35min con reminder 5min. Integra con ConfirmationFlow T3.2. Usa ConfirmDialog REVERSIBLE + IRREVERSIBLE de T2.BB | #1 | 1d | T2.17, T3.2, T2.BB |
| T3.21 | Cards sugerencias + progreso tools Clickable cards (“Review competitor prices”). Click opens pre-contextualized conversation. Tool progress: spinner with tool name. +0.5d ProactiveCard integration from T2.BBCards clicables (“Revisar precios competencia”). Click abre conversación pre-contextualizada. Progreso tool: spinner con nombre tool. +0.5d integración ProactiveCard de T2.BB | #1 | 1.5d | T2.17, T2.18, T2.BB |
| T3.22 | ProfileView Connected marketplaces, usage stats, preferences. Hook useProfile. Settings: language, notifications, default marketplace. Uses EnrollmentCard + Toggle labeled from T2.BBMarketplaces conectados, stats uso, preferencias. Hook useProfile. Settings: idioma, notificaciones, marketplace default. Usa EnrollmentCard + Toggle labeled de T2.BB | #1 | 1d | T2.20, T2.BB |
| T3.23 | Stripe Checkout + Customer Portal Checkout Pro ($49/mes). Customer Portal autoservicio (cancelar, actualizar pago). Webhook: | #13 | 3d | T3.19 |
| T3.24 | ICreditsGate + backend créditos POST /internal/gate. READ=1cr, ANALYSIS=2cr, WRITE=3cr. DynamoDB conditional write (previene race). Free 50cr/mes, Pro 500cr/mes. Credit Packs ($5/100, $20/500, $35/1000). Fail-open si billing no responde | #13 | 2d | T3.23 |
| T3.24a | Billing schema migration ALTER TABLE clients (agregar campos Stripe). Nuevas tablas: credit_packs, subscription_events, credit_transactions. Script migración idempotente | #13 | 1d | T3.23 |
| T3.24b | SubscriptionLifecycleService activate (post-checkout), cancel (grace period 7d), upgrade, downgrade. Evento → subscription_events table. Webhook | #13 | 1d | T3.23 |
| T3.24c | Monthly credit reset cron EventBridge + Lambda cada 1ro del mes. Reset plan credits (no pack credits). Pack credits expiran 12 meses. Log en credit_transactions | #13 | 4h | T3.24 |
| T3.MK1 | Mockup BillingViewMockup BillingView Current plan + CreditEconomy + ProgressBar labeled + CreditDisplay + Stripe buttons. Validate T2.BB component integrationPlan actual + CreditEconomy + ProgressBar labeled + CreditDisplay + botones Stripe. Validar integración componentes T2.BB | #1 | 0.5d | T2.BB |
| T3.MK2 | Mockup ProfileViewMockup ProfileView EnrollmentCard list + Toggle labeled + InputFields assembled. Validate T2.BB component integrationLista EnrollmentCards + Toggles labeled + InputFields ensamblados. Validar integración componentes T2.BB | #1 | 0.5d | T2.BB |
| T3.MK3 | Mockup ConfirmDialog in chat contextMockup ConfirmDialog en contexto de chat ChatView + ConfirmDialog overlay (slide up + backdrop) × REVERSIBLE and IRREVERSIBLE variants. End-to-end confirmation UX validationChatView + ConfirmDialog superpuesto (slide up + backdrop) × variantes REVERSIBLE e IRREVERSIBLE. Validación UX de confirmación end-to-end | #1 | 0.5d | T2.BB, T3.20 |
| Pablo#16 Eval Suite — 8 tareas | ||||
| T3.26 | Eval Fase 2 — CI integration eval-on-pr.yml en GitHub Actions. Coach staging → LLM Judge → EvalReport. Si !passed → PR bloqueado. Comentario automático en PR. Update baseline en merge a main. Target: <10 min para 20-30 cases | #16 | 2d | T2.24 |
| T3.27 | Golden dataset 50 casos Expandir: 15 producto, 10 pricing, 8 WRITE, 7 proactivo, 10 edge cases (injection, off-scope, datos vacíos, intención ambigua) | #16 | 3d | T2.24 |
| T3.28 | QA flujos conversación (3 marketplaces) Probar todos los flujos con datos Sellerfy. Documentar issues → Linear vía Beautonomous | #16 | 2d | T3.1, T3.3 |
| T3.40 | Extend EvalConfig + CLIExtender EvalConfig + CLI Add | #16 | 1d | T2.24 |
| T3.41 | FigmaRESTClient
| #16 | 1.5d | T3.40 |
| T3.42 | FigmaQualityRunner + variable checksFigmaQualityRunner + checks de variables Runner iterates files and executes configured checks. 4 variable checks: variable_architecture (3 collections), code_syntax (Code Syntax Web), semantic_aliasing (Semantic aliases Primitives), light_dark_modes (2 modes in Semantic)Runner itera archivos y ejecuta checks configurados. 4 checks de variables: variable_architecture (3 colecciones), code_syntax (Code Syntax Web), semantic_aliasing (Semantic aliasea Primitives), light_dark_modes (2 modos en Semantic) | #16 | 2d | T3.41 |
| T3.43 | Component checksChecks de componentes 5 checks: auto_layout (all components use Auto Layout), naming_convention (slash naming, no generic names), states_coverage (min states per type), color_hardcoding (no direct hex), spacing_hardcoding (no direct numeric values)5 checks: auto_layout (todo componente usa Auto Layout), naming_convention (slash naming, sin nombres genéricos), states_coverage (states mínimos por tipo), color_hardcoding (sin hex directo), spacing_hardcoding (sin valores numéricos directos) | #16 | 2d | T3.42 |
| T3.44 | Quality checks + reportChecks de calidad + reporte 3 checks: wcag_contrast (4.5:1 text, 3:1 UI), descriptions (published components have description), mcp_compatibility (semantic layer names). Generate report: compliance per file, violations by severity, correction suggestion per component3 checks: wcag_contrast (4.5:1 texto, 3:1 UI), descriptions (componentes publicados tienen description), mcp_compatibility (nombres semánticos en layers). Generar reporte: compliance por archivo, violaciones por severidad, sugerencia de corrección por componente | #16 | 1d | T3.42 |
| UX/UI + Pablo#18 Design System — 1 tarea | ||||
| T3.BB | Advanced Organisms + close Pattern ComponentsOrganismos avanzados + cierre Pattern Components ReActStream (3 collapsible blocks: Thought/Action/Observation), DataTable (sortable, skeleton loading), AuditLog (dot-line + JSON accordion), RollbackPanel (TTLCountdown + revert button), FraudAlert, ErrorRecovery A (amber)/B (red)/C (blue). Publish [LIB] Pattern Components complete. Owner: UX/UI executes + Pablo approves end S6ReActStream (3 bloques colapsables: Thought/Action/Observation), DataTable (sortable, skeleton loading), AuditLog (dot-line + acordeón JSON), RollbackPanel (TTLCountdown + botón revertir), FraudAlert, ErrorRecovery A (amber)/B (red)/C (blue). Publicar [LIB] Pattern Components completo. Owner: UX/UI ejecuta + Pablo aprueba fin S6 | #18 | 5d | T2.BB |
Sprints 7-8 — Hardening + StagingSprints 7-8 — Hardening + Staging
Weeks 7-8 • 37 tasksSemanas 7-8 • 37 tareas| ID | TaskTarea | Proj | TimeTiempo | DependsDepende |
|---|---|---|---|---|
| Mateo#2 Orchestrator · #4 Personality · #7 Guardrails · #3 Tool Registry · #9 Cerebro KB — 8 tareas | ||||
| T4.1 | WebSocket streaming Reemplazar REST. 8 eventos server→client: thinking, tool_start, tool_result, text_delta, suggestion, confirmation_required, error, done. 4 client→server. Restaurar sesión en reconexión | #2 | 2d | T1.7, T3.3 |
| T4.2 | SystemPromptComposer L3 Bloque ejecución cuando writeCapable=true. Guardrails de escritura inyectados condicionalmente. Hard cap 1200 tokens total | #4 | 1d | T1.5, T3.3 |
| T4.3 | OutputGuard Validación post-LLM: prevención fuga datos (verificar respuesta no contiene datos otro usuario), filtrado contenido peligroso. Alerta crítica si fuga detectada | #7 | 1d | T3.5 |
| T4.4 | WRITE tools restantes (si caben) Hasta 13 tools WRITE adicionales: update_product_images, update_product_video, update_stock, close_product, publish_product, answer_question, hide_question, send_buyer_message, request_review. Circuit breaker: lo que no quepa se corta a S11-12 | #3 | 3d | T3.3 |
| T4.5 | Optimización performance Target p95 <3s. Compactación ventana contexto, cache hit prompt, paralelización tools donde sea seguro. Perfilar y arreglar cuellos de botella | #2 | 2d | T4.1 |
| T4.5a | FeedbackCapture en HookLifecycle after_tool hook en conversation-api escribe FeedbackEntry a DynamoDB de #15 vía HTTP POST /feedback/capture. Solo para WRITE tools exitosas. Fire-and-forget | #2 | 1d | T2.4, T4.13 |
| T4.5b | ActionLog entity + DynamoActionLogRepository pk | #2 | 1d | T2.4, T3.3 |
| T4.16 | KB batch + v2 Batch embeddings Vertex AI (250/llamada). Si pipeline >5min, activar procesamiento incremental. Target: >80% hit rate retrieval en 20 queries eval | #9 | 2d | T2.22, T2.23 |
| Andrés#14 DevOps · #10 Data Sync — 5 tareas | ||||
| T4.6 | Staging deploy full stack AWS CDK deploy: Lambda, API Gateway v2, DynamoDB, ElastiCache Redis, RDS PostgreSQL, Secrets Manager. Terraform GCP: Cloud Run, BigQuery, GCS, Airflow. Health-check verde. URL: api-staging.shopilot.ai | #14 | 3d | T2.15, T3.18 |
| T4.7 | Load testing 50 usuarios Artillery/k6. Target: p95 <2s endpoints API (excluyendo latencia LLM). Identificar cuellos de botella: Redis, DynamoDB, API Gateway | #14 | 2d | T4.6 |
| T4.8 | Dashboard CloudWatch + alertas Latencia API, tasa error, costo LLM/conversación, tool executions, créditos. Alertas PagerDuty: p95 >2s, error >1%. Alertas Slack: costo LLM/día >$50 | #14 | 2d | T4.6 |
| T4.9 | Data Sync Silver + Gold (si cabe) INormalizer, SilverNormalizer por marketplace. transform_to_silver_dag. IAggregator, DailySummaryAggregator. compute_gold_dag. Brand Health spike + IBrandHealthCalculator. Circuit breaker si no cabe | #10 | 3d | T3.13 |
| T4.9a | API Gateway v2 WebSocket en CDK Routes $connect/$disconnect/$default, DynamoDB connection-ids table, Lambda authorizer, IAM policies. Prerequisito de streaming en producción | #14 | 1d | T4.6 |
| Sergio#1 Native Shell · #15 Feedback Loop · #13 Billing — 13 tareas | ||||
| T4.10 | WebSocket client progresivo Sidebar connects to WebSocket conversation-api. Handles 8 server→client events: text_delta (progressive render), tool_start (spinner), tool_result, suggestion (card), confirmation_required, error, done. Reconnection backoff. +0.5d integration of T3.BB components (ReActStream + RollbackPanel)Sidebar conecta a WebSocket conversation-api. Maneja 8 eventos server→client: text_delta (render progresivo), tool_start (spinner), tool_result, suggestion (card), confirmation_required, error, done. Backoff reconexión. +0.5d integración componentes T3.BB (ReActStream + RollbackPanel) | #1 | 2.5d | T2.18, T4.1, T3.BB |
| T4.11 | EnrollmentView standalone Componente dedicado para flujos OAuth redirect de cada marketplace. BrowserWindow standalone (no popup). Reutiliza OAuth tokens de T2.21 OnboardingWizard | #1 | 1d | T2.21 |
| T4.12 | Sentry crash reporting main + renderer. Source maps upload en build. Agrupación errores. Diálogo feedback en crash | #1 | 4h | T1.16 |
| T4.13 | Feedback Loop scaffold package.json, tsconfig, interfaces dominio (IFeedbackRepository, IFeedbackGate, IDataSyncClient). Modelos: FeedbackEntry, ExplicitFeedbackEntry, ImplicitFeedbackEntry | #15 | 1d | T0.8 |
| T4.14 | calculateImpactScore + DynamoFeedbackRepository Lógica pura (sales×0.4 + conversion×0.3 + visits×0.2 + position×-0.1). Repo DynamoDB: save, findPendingEntries (GSI1 status=pending), update, findByUser | #15 | 2d | T4.13 |
| T4.15 | FeedbackMeasurerService + Lambdas processPendingEntries (entries >7 días). DataSyncClient HTTP. Retry 3x, unmeasurable si falla. EventBridge rate(6h). FeedbackAPIHandler: GET /feedback/:userId/summary + /history. CDK stack | #15 | 2d | T4.14 |
| T4.15a | FeedbackGate anti-fatigue should-prompt: max 1 prompt explicit feedback/día, skip si <3 interacciones en sesión, cooldown 24h post-feedback. Endpoint GET /feedback/:userId/should-prompt | #15 | 4h | T4.13 |
| T4.15b | Explicit feedback endpoint POST /feedback/:userId/explicit. Payload: rating (1-5), comment?, conversationId, toolName?. Persiste ExplicitFeedbackEntry | #15 | 4h | T4.14 |
| T4.15c | Implicit feedback endpoint POST /feedback/:userId/implicit. Payload: action (accepted/rejected/edited), conversationId, toolName, originalValue?, editedValue?. Persiste ImplicitFeedbackEntry | #15 | 4h | T4.14 |
| T4.15d | Grace period 7d billing Mantener acceso Pro 7 días post-cancelación. Webhook | #13 | 4h | T3.24b |
| T4.MK1 | Mockup EnrollmentView standaloneMockup EnrollmentView standalone Complete marketplace list in dedicated BrowserWindow × all states (Connected/Syncing/Error/Disconnected). Uses EnrollmentCard + ErrorRecovery for OAuth error statesLista completa de marketplaces en BrowserWindow dedicado × todos los estados (Connected/Syncing/Error/Disconnected). Usa EnrollmentCard + ErrorRecovery para estados error OAuth | #1 | 0.5d | T3.BB |
| T4.MK2 | Mockup complete WRITE flowMockup flujo WRITE completo ChatView with end-to-end mock flow: AgentStatusBar in ToolUse → ToolAccordion expanded → ReActStream (3 phases) → ConfirmDialog → RollbackPanel post-execution. Full WRITE UX validationChatView con flujo end-to-end mock: AgentStatusBar en ToolUse → ToolAccordion expandido → ReActStream (3 fases) → ConfirmDialog → RollbackPanel post-ejecución. Validación UX completa de WRITE | #1 | 1d | T3.BB |
| T4.24 | Gate 2 signed build: full .dmg + .exe, all S8 featuresBuild firmado Gate 2: .dmg + .exe completo, todas las features S8 Full .dmg notarized + .exe signed with ALL S7-8 features integrated. Team smoke test on macOS + Windows. Gate 2 build milestone — candidate for beta distributionFull .dmg notarizado + .exe firmado con TODAS las features S7-8 integradas. Smoke test del equipo en macOS + Windows. Hito build Gate 2 — candidato para distribución beta | #1 | 0.5d | T2.40 |
| Pablo#16 Eval Suite · #17 Beautonomous — 10 tareas | ||||
| T4.17 | Eval automatizado en CI GitHub Action ejecuta 50 golden cases en cada push a main. Falla CI si score <0.70 o caso crítico falla. Resultados → #engineering Slack vía Beautonomous | #16 | 2d | T3.26, T3.27 |
| T4.18 | Testing proactivas datos reales Probar ProactiveSuggestionService con datos Sellerfy. Verificar: triggers correctos, calidad mensaje, dedup, max 2/turno. Iterar prompt | #16 | 2d | T3.4 |
| T4.19 | Selección beta users + prep onboarding 10-15 vendedores Sellerfy (mix pequeño/mediano/grande). Video walkthrough 2 min, doc setup, formulario feedback. Calls 1-on-1 30min | #17 | 2d | T2.21 |
| T4.19a | Eval contract testing pipeline Consumer-driven contracts entre repos: Tool Registry → Data Sync, Tool Registry → Marketplace Provider, Tool Registry → Enrichment cumplen contratos | #16 | 2d | T3.26 |
| T4.19b | KB quality eval pipeline Métricas retrieval automatizadas: precision@5, recall, hit rate. 20 queries de eval con expected chunks. Falla CI si hit rate <80% | #16 | 1d | T4.16 |
| T4.25 | Code signing secretsSecrets de code signing Configure in GitHub: macOS certificates (Developer ID + Apple notarization) and Windows (Authenticode). Verify electron-builder recognizes themConfigurar en GitHub: certificados macOS (Developer ID + notarización Apple) y Windows (Authenticode). Verificar que electron-builder los reconoce | #16 | 1d | — |
| T4.26 | DesktopBuildRunner + core checksDesktopBuildRunner + checks core 6 checks: compilation (build completes + artifact exists), code signing (codesign/signtool verify), notarization (spctl, macOS only), app startup (headless <5s), bundle size (<250MB delta vs baseline), native modules (require without error)6 checks: compilación (build completa + artefacto existe), code signing (codesign/signtool verify), notarización (spctl, solo macOS), arranque app (headless <5s), bundle size (<250MB delta vs baseline), módulos nativos (require sin error) | #16 | 3d | T3.40 |
| T4.27 | Secondary checksChecks secundarios Auto-updater (feed URL resolves), deep links (shopilot:// in Info.plist/Windows registry), window rendering (console.error), IPC channels (ping/pong). Warnings, not blockersAuto-updater (URL feed resuelve), deep links (shopilot:// en Info.plist/registro Windows), window rendering (console.error), canales IPC (ping/pong). Warnings, no blockers | #16 | 1d | T4.26 |
| T4.28 | GitHub Actions: desktop-build-eval.ymlGitHub Actions: desktop-build-eval.yml 3 jobs: build-macos (macos-14 runner), build-windows (windows-latest), report (aggregate + PR comment). Trigger: PRs touching | #16 | 1.5d | T4.26 |
| T4.29 | GitHub Actions: figma-quality-eval.ymlGitHub Actions: figma-quality-eval.yml Triggers: | #16 | 0.5d | T3.42-T3.44 |
| UX/UI + Pablo#18 Design System — 1 tarea | ||||
| T4.BB | Figma quality audit + correctionsAuditoría de calidad Figma + correcciones Review all frames from S0–S6 against figma-best-practices.md checklist. All [LIB] Core Components and [LIB] Pattern Components marked “Ready for development”. Zero generic layer names. All colors/spacings/radii using variables (DevMode verified). All interactive states present. Changelog updated with version + date. Annotations for hover states, transitions, responsive notes. Owner: UX/UI executes corrections + Pablo validates complete checklistRevisar todos los frames de S0–S6 contra checklist figma-best-practices.md. Todos los [LIB] Core Components y [LIB] Pattern Components marcados “Ready for development”. Cero nombres de capas genéricos. Todos los colores/spacings/radios usando variables (verificado DevMode). Todos los states interactivos presentes. Changelog actualizado con versión y fecha. Annotations para hover states, transiciones, notas responsive. Owner: UX/UI ejecuta correcciones + Pablo valida checklist completo | #18 | 3d | T3.BB |
Sprints 9-10 — LaunchSprints 9-10 — Launch
Weeks 9-10 • 18 tasksSemanas 9-10 • 18 tareas| ID | TaskTarea | Proj | TimeTiempo | DependsDepende |
|---|---|---|---|---|
| Mateo#7 Guardrails · #2 Orchestrator · #3 Tool Registry · #4 Personality — 3 tareas | ||||
| T5.1 | LLMGuardChecker Clasificador LLM ligero (Haiku) para inputs que pasan pattern matching pero podrían ser injection/off-scope. Fallback: si checker falla → deja pasar | #7 | 1d | T3.5, T4.3 |
| T5.2 | Bug fixes backend Todos los bugs P1/P2 de beta. Edge cases: resultados vacíos tools, LLM rehúsa usar tool, WRITE concurrentes, tokens expirados mid-conversación | #2, #3 | 4d | T4.1 |
| T5.3 | System Prompt v3 final Ajuste basado en feedback beta. Arreglar problemas tono, patrones incorrectos selección tool, edge cases | #4 | 1d | T5.10 |
| Andrés#14 DevOps · #10 Data Sync — 4 tareas | ||||
| T5.4 | Deploy producción CDK deploy (Lambda + API Gateway v2 prod). Terraform apply (Cloud Run Data API prod). SSL + dominio api.shopilot.ai. Health checks | #14 | 3d | T4.6 |
| T5.5 | IaC producción completo CDK: DynamoDB point-in-time recovery 35d, Secrets Manager, IAM roles, Lambda concurrency. Terraform: lifecycle policies GCS. Backup Redis. PostgreSQL backups diarios | #14 | 2d | T5.4 |
| T5.6 | Rollback testing Revertir Lambda version (<1 min), Cloud Run revision rollback (<1 min). Documentar runbook | #14 | 1d | T5.4 |
| T5.6a | Data Sync Fase 4 — OpenMetadata + Embeddings FQNs Amazon + Fast Data en OpenMetadata. embed_fast_dag (Bronze → Cerebro KB). embed_health_dag (Gold → KB). Linaje visible | #10 | 2d | T4.9 |
| Sergio#1 Native Shell · #13 Billing — 5 tareas | ||||
| T5.7 | Code signing + .dmg + auto-updater Cert Apple Developer. electron-builder DMG macOS. Notarización vía notarytool. Stapling ticket. Auto-updater → releases.shopilot.ai (S3). Probar en Mac limpio sin dev tools | #1 | 2d | T4.12 |
| T5.8 | Hardening seguridad Electron CSP headers, sandbox habilitado, nodeIntegration=false, webSecurity=true. Telemetría básica anónima (opt-out) | #1 | 1d | T5.7 |
| T5.9 | Bug fixes UI/UX beta All UI/UX bugs from beta feedback. RAM profiling (<500MB target). Polish: animations, transitions, loading states. +0.5d post-audit T4.BB alignment to fix visual inconsistencies accumulated S1-8Todos los bugs UI/UX feedback beta. RAM profiling (<500MB target). Polish: animaciones, transiciones, loading states. +0.5d alineación post-auditoría T4.BB para corregir inconsistencias visuales acumuladas S1-8 | #1 | 3.5d | T4.19, T4.BB |
| T5.10 | Billing Stripe live Switch test → live. Verificar: checkout, webhooks, créditos, packs. SSL en billing endpoints | #13 | 1d | T3.23, T5.4 |
| T5.MK1 | Mockup Dashboard viewMockup Dashboard view Only view not previously built — grid of MarketplaceKPIs + FraudAlert + AuditLog of recent actions + quick access to chat. Uses all audited T4.BB components (DataTable, MarketplaceKPI, AuditLog)Única vista no construida anteriormente — grid de MarketplaceKPIs + FraudAlert + AuditLog de últimas acciones + acceso rápido al chat. Usa todos los componentes auditados T4.BB (DataTable, MarketplaceKPI, AuditLog) | #1 | 1d | T4.BB |
| Pablo#17 Beautonomous · #16 Eval Suite — 6 tareas | ||||
| T5.11 | Onboarding beta 10-15 vendedores Descargar .dmg → conectar marketplace → primera query → primera acción. Llamada 1-on-1 30min. Monitorear activación (1+ tool primera sesión) | #17 | 3d | T5.7, T5.4 |
| T5.12 | Feedback calls + iteración 15min con cada beta user. Documentar qué funcionó/no funcionó. Top 5 issues → Linear vía Beautonomous | #17 | 2d | T5.11 |
| T5.13 | Review seguridad OWASP top 10 Injection, auth roto, exposición datos, XSS, SSRF, etc. Documentar hallazgos + arreglar P1s | #16 | 1d | T5.4, T5.7 |
| T5.14 | System Prompt v2 Beautonomous Iteración basada en 10 semanas uso real. Actualizar gobernanza. Indexar docs técnicos en OpenClaw KB | #17 | 1d | — |
| T5.15 | Go/No-Go Sync final 60min, 4 ingenieros. Checklist: tools respondiendo, Stripe live, 10+ beta, .dmg firmado, OWASP P1s, p95 <3s, costo guard, eval ≥0.70. Pablo firma Go | #17 | 4h | T5.1–T5.14 |
| T5.15a | E2E eval pipeline Flujo completo end-to-end (query → tool selection → execution → response). 10+ escenarios. Diferente de LLM Judge (evalúa respuesta) — esto evalúa el flujo completo | #16 | 2d | T4.17 |
| #18 Design System: No BB task in S9-10 — Figma pipeline closed. UX/UI available for point queries from Sergio on visual edge cases only.#18 Design System: Sin tarea BB en S9-10 — pipeline Figma cerrado. UX/UI disponible solo para consultas puntuales de Sergio sobre edge cases visuales. | ||||
S11-12 — Buffer (Weeks 11-12)S11-12 — Buffer (Semanas 11-12)
Circuit breaker absorbs scope + beta bugs + hardeningCircuit breaker absorbe scope diferido + bugs beta + hardeningShape Up circuit breaker: tasks not completed at S10 deadline are cut here rather than delaying launch. S11 = hardening + P0/P1 bugs. S12 = deferred scope that cleared circuit breaker.Circuit breaker Shape Up: tareas no completadas al deadline de S10 se cortan aquí en vez de retrasar el launch. S11 = hardening + bugs P0/P1. S12 = scope diferido que pasó el circuit breaker.
| EngineerIngeniero | S11 — HardeningS11 — Hardening | S12 — Deferred scopeS12 — Scope diferido |
|---|---|---|
| Mateo | Bug fixes P1/P2 inteligencia. Optimización p95. WRITE tools cortadas en S7-8. | Advertising tools Fase 5: 4 WRITE (create/update/pause/activate_campaign). Enrichment Rainforest API adapter (Amazon market intelligence). ProactiveSuggestions v2. LLMGuardChecker Phase 2. KB v3: docs de preguntas reales de beta que KB v2 no cubría. |
| Andrés | Hardening producción: alertas, runbooks, rollback drills. Fix bugs adapters. | DAG Silver→Gold (si cortado). Rate limiters datos reales. Monitoring expandido. |
| Sergio | Bug fixes UI/UX beta. RAM profiling. .dmg hotfix si necesario. | Auto-updater S3. Windows build (si alcanza). FeedbackThrottle anti-fatigue refinement. Feedback UI mejorada. |
| Pablo | Iteración Eval conversaciones reales. Expansión golden dataset edge cases. | Eval score target 0.80. Documentación técnica + postmortem. |
Visual Gantt — 12 Weeks (10+2 buffer) × 4 Engineers + UX/UI Team × 19 Projects Gantt Visual — 12 Semanas (10+2 buffer) × 4 Ingenieros + Equipo UX/UI × 19 Proyectos
|
W0 Pre-Sprint |
W1–W2 Walking Skeleton |
W3–W4 ★G1 Core Engines |
W5–W6 WRITE + Billing |
W7–W8 ★G2 Hardening |
W9–W10 ★G3 Launch |
|
|---|---|---|---|---|---|---|
| Mateo CTO |
||||||
| Andrés Data+BE |
||||||
| Sergio Full-stack |
||||||
| UX/UI #18 DS |
||||||
| Pablo CEO |
||||||
| Gates |
Parallel infra: T1.1 → T2.15 (Andrés W3) → T4.6 (Andrés W7) → T5.4 (Andrés W9)
▶ Linear CSV Export — All 183 Tasks (copy & import) ▶ Exportación CSV Linear — 183 Tareas (copiar e importar)
Import in Linear: Team Settings → Import → CSV. Columns: ID, Title, Assignee, Project, Cycle, Priority, EstimateHours, DependsOn. Source: 70-EXEC-BACKLOG-CORREGIDO.md v2.0Importar en Linear: Team Settings → Import → CSV. Columnas: ID, Title, Assignee, Project, Cycle, Priority, EstimateHours, DependsOn. Fuente: 70-EXEC-BACKLOG-CORREGIDO.md v2.0
ID,Title,Assignee,Project,Cycle,Priority,EstimateHours,DependsOn T0.1,Create OpenClaw Project,Pablo,#17 Beautonomous,Pre-Sprint,Urgent,0.5, T0.2,Connect GitHub OAuth,Mateo,#17 Beautonomous,Pre-Sprint,Urgent,0.5,T0.1 T0.3,Connect Linear OAuth,Pablo,#17 Beautonomous,Pre-Sprint,Urgent,0.5,T0.1 T0.4,Connect Slack OAuth,Mateo,#17 Beautonomous,Pre-Sprint,Urgent,0.5,T0.1 T0.5,Write System Prompt v1 Beautonomous,Pablo,#17 Beautonomous,Pre-Sprint,Urgent,4,T0.1 T0.6,Configure Role Mapping (Capitan/Mago/Artesano),Pablo,#17 Beautonomous,Pre-Sprint,High,1,T0.5 T0.7,Create Linear Structure 17 Projects 6 Cycles,Pablo,#17 Beautonomous,Pre-Sprint,High,2,T0.3 T0.8,Validation 4 Members x3 Queries Verify Permissions,All 4,#17 Beautonomous,Pre-Sprint,Urgent,1,"T0.2,T0.3,T0.4,T0.5,T0.6" T0.9,Apple Developer Program enrollment + cert,Pablo,#14 DevOps,Pre-Sprint,High,1,- T0.10,Windows code signing cert procurement,Pablo,#14 DevOps,Pre-Sprint,High,1,- T0.11,Brand Book delivery from external design team — request following core-product-design-system repo guidelines,Pablo,#18 Design System,Cycle 0,High,0, T0.BB,Figma Foundations Delivery: Brand book + [LIB] Foundations & Tokens + [LIB] Iconography + [LIB] Core Components partial,UX/UI,#18 Design System,Cycle 1,Urgent,32,T0.11 T1.1,DynamoDB Fix: IDs UUID->ULID Trace SK GSI fix CDK Stack,Mateo,#2 Orchestrator,Cycle 1,Urgent,24,T0.8 T1.2,UserProfile Entity IUserProfileRepository DynamoUserProfileRepository,Mateo,#2 Orchestrator,Cycle 1,High,8,T1.1 T1.3,Conversation History in Prompt findWindowForPrompt 200K budget,Mateo,#2 Orchestrator,Cycle 1,High,16,T1.1 T1.4,ILLMClient update: toolDefinitions thinkingBudget ContentBlock[] all clients,Mateo,#2 Orchestrator,Cycle 1,High,16, T1.5,SystemPromptComposer L1+L2: identity base + session UserProfile cache_control,Mateo,#4 Personality,Cycle 1,High,16,T1.2 T1.6,AgentLoopOrchestrator ReAct Reason-Act-Observe MAX_ROUNDS=10 200K budget,Mateo,#2 Orchestrator,Cycle 1,Urgent,24,"T1.3,T1.4,T1.5" T1.7,RestResponseEventEmitter REST mode no streaming,Mateo,#2 Orchestrator,Cycle 1,Medium,4,T1.6 T1.8,Verify Observability with ReAct ConversationTrace multi-step tool calls round count,Mateo,#8 Observability,Cycle 1,Medium,8,T1.6 T1.9,Scaffold Marketplace Provider Clean Architecture DDD Value Objects DI container,Andres,#12 Marketplace Provider,Cycle 1,Urgent,8,T0.8 T1.10,IMarketplaceAdapter Interface 23 methods 4 domains ISKUResolver,Andres,#12 Marketplace Provider,Cycle 1,Urgent,4,T1.9 T1.11,AES256GCMCipher + ITokenManager DynamoDB marketplace-credentials auto-refresh 15min,Andres,#12 Marketplace Provider,Cycle 1,High,16,T1.9 T1.12,MeLiOAuth2Flow + MeLiAdapter OAuth2 code flow REST API error mapping,Andres,#12 Marketplace Provider,Cycle 1,Urgent,24,"T1.10,T1.11" T1.13,AmazonLWAFlow + AmazonAdapter scaffold SP-API SDK rate limiting stub only,Andres,#12 Marketplace Provider,Cycle 1,High,16,"T1.10,T1.11" T1.14,Verify Terraform GCP: GCS Cloud Run Airflow BigQuery operational,Andres,#14 DevOps,Cycle 1,High,8,T0.8 T1.15,Request External Dependencies E1-E5: Amazon SP-API MeLi Shopify Apple,Andres,#14 DevOps,Cycle 1,Urgent,4,T0.8 T1.15a,SellerConnection Aggregate state machine 5 states DynamoDB persist,Andres,#12 Marketplace Provider,Cycle 1,High,8,T1.9 T1.15b,MarketplaceAction Entity + IMarketplaceActionRepository actionId sellerId latencyMs,Andres,#12 Marketplace Provider,Cycle 1,Medium,4,T1.9 T1.15c,IOAuth2Flow Interface domain port authorize exchangeCode refreshToken,Andres,#12 Marketplace Provider,Cycle 1,Medium,4,T1.9 T1.28,Collect missing WRITE API docs MeLi 3 AmazonAds 5 Amazon 2 Shopify 9 for Tool Registry mapping,Andres,#12 Marketplace Provider,Cycle 1,High,24,T0.8 T1.29,Collect user management provider docs auth external Auth0 Clerk Memberstack service methods,Andres,#12 Marketplace Provider,Cycle 1,High,16,T0.8 T1.16,Scaffold Electron + electron-builder Electron 28 main process preload contextBridge hot reload,Sergio,#1 Native Shell,Cycle 1,Urgent,8,T0.8 T1.17,MainWindow + WebContentsView 70% width navigation persistence (not BrowserView deprecated E26),Sergio,#1 Native Shell,Cycle 1,High,16,T1.16 T1.18,MarketplaceDetector URL patterns MeLi Amazon Shopify page type extraction remote config,Sergio,#1 Native Shell,Cycle 1,High,8,T1.17 T1.19,Tab System + Sidebar Container React 360px design-system tokens IPC main-renderer Toggle Cmd+B (+0.5d setup tokens T0.BB),Sergio,#1 Native Shell,Cycle 1,High,20,T1.17 T1.20,Auth Memberstack JWT electron-store OS key AuthService main process,Sergio,#1 Native Shell,Cycle 1,High,8,T1.16 T1.MK1,Mockup shell container: validate Figma tokens/components in real Electron+React context,Sergio,#1 Native Shell,Cycle 1,Medium,4,"T0.BB,T1.19" T1.BB,Atoms + Molecules Base + Chat Organisms delivery,UX/UI,#18 Design System,Cycle 1,Urgent,48,T0.BB T1.32,First .dmg + .exe canary build unsigned,Sergio,#1 Native Shell,Cycle 1,High,8,T1.16 T1.33,GitHub Actions CI electron-builder,Andres,#14 DevOps,Cycle 1,High,4,T1.32 T1.21,KB Phase 0 Fix Duplicates TRUNCATE before embed embedded_at CI Go 1.21->1.24,Mateo,#9 Cerebro KB,Cycle 1,High,16, T1.22,KB Phase 1 Contextual Retrieval prefix chunking Markdown overlap 150 chars,Mateo,#9 Cerebro KB,Cycle 1,High,16,T1.21 T1.23,KB Content 15-20 Curated Docs MeLi Amazon Shopify pricing photos FAQ,Mateo,#9 Cerebro KB,Cycle 1,High,40, T1.24,Eval Phase 0 Setup + Golden Dataset IEvalPipeline ILLMJudge 15-20 YAML cases,Pablo,#16 Eval Suite,Cycle 1,High,24, T1.25,10 READ Tool Specs: name description inputSchema riskLevel creditCost for all 10 tools,Mateo,#9 Cerebro KB,Cycle 1,Urgent,16, T1.26,Brand registration marketplaces Amazon Brand Registry AmazonAds MeLi Shopify weekly tracking,Pablo,#10 Data Sync,Cycle 1,High,24,T0.8 T1.27,Authorize app Apple Windows Store: Apple Developer Program $99yr code signing cert + Microsoft Partner Center,Pablo,#1 Native Shell,Cycle 1,High,16,T0.8 T2.1,ToolRegistry + ToolDefinition register registerRemote getDefinitions Zod schema categories,Mateo,#3 Tool Registry,Cycle 2,Urgent,16,"T1.6,T1.25" T2.2,IToolExecutor + ToolExecutor execute toolName args context -> ToolResult,Mateo,#3 Tool Registry,Cycle 2,High,8,T2.1 T2.3,ToolPolicyFilter risk gate irreversible confirmation marketplace gate extensible,Mateo,#3 Tool Registry,Cycle 2,High,8,T2.1 T2.4,HookLifecycle before_tool -> execute -> after_tool after_tool runs even on failure,Mateo,#3 Tool Registry,Cycle 2,High,8,T2.2 T2.5,10 READ Tool Handlers stubs HTTP mock data handlers/read/ directory,Mateo,#3 Tool Registry,Cycle 2,Urgent,16,T2.1 T2.5a,ToolResult Domain Model toolName args isError latencyMs cached creditCost immutable,Mateo,#3 Tool Registry,Cycle 2,High,4,T2.1 T2.5b,update_user_profile SYSTEM Tool Handler updates UserProfile from conversation,Mateo,#3 Tool Registry,Cycle 2,Medium,4,"T2.1,T1.2" T2.5c,contextSummary Auto-summarize conversation when token threshold exceeded,Mateo,#5 Context Aggregator,Cycle 2,Medium,8,T1.3 T2.5d,17 WRITE Tool Stubs ConfirmationRequired policy NotImplemented handlers visible to LLM,Mateo,#3 Tool Registry,Cycle 2,High,4,T2.1 T2.6,IContextAssembler KB + Brand Health RAG parallel single embedding graceful degradation,Mateo,#5 Context Aggregator,Cycle 2,High,16,T1.6 T2.7,Health Summary BrandHealthContextService.getHealthSummary always in system prompt,Mateo,#5 Context Aggregator,Cycle 2,High,8,T2.6 T2.8,Prompt Caching Anthropic SystemPromptBlock cache_control ephemeral 90% hit rate,Mateo,#2 Orchestrator,Cycle 2,Medium,8,T1.5 T2.9,Tool Result Caching In-Memory Mapper session READ ANALYSIS only,Mateo,#3 Tool Registry,Cycle 2,Medium,4,T2.2 T2.10,ShopifyOAuth2Flow + ShopifyAdapter GraphQL Admin API rate limiting cost-based,Andres,#12 Marketplace Provider,Cycle 2,High,24,"T1.10,T1.11" T2.11,AmazonAdapter Complete SP-API Reports Catalog Orders rate limit 5req/s (if E1 approved),Andres,#12 Marketplace Provider,Cycle 2,High,24,T1.13 T2.12,TokenRefreshCron EventBridge 5min pre-refresh 30min DynamoDB mutex 3 fails -> Slack alert,Andres,#12 Marketplace Provider,Cycle 2,High,8,"T1.11,T1.12" T2.13,Data Sync Phase 0.5 Clean Architecture API IDataReader ITokenProvider VOs domain,Andres,#10 Data Sync,Cycle 2,High,16,T0.8 T2.14,Verify Existing DAGs MeLi + Shopify @hourly Bronze schemas fix if needed,Andres,#10 Data Sync,Cycle 2,High,8,T2.13 T2.15,CDK Base AWS: DynamoDB Lambda API Gateway v2 HTTP VPC NAT Secrets EventBridge,Andres,#14 DevOps,Cycle 2,Urgent,16,T1.1 T2.16,GitHub Actions CI Multi-Repo lint type-check unit tests build cache status checks,Andres,#14 DevOps,Cycle 2,High,8, T2.16a,marketplace-actions DynamoDB Table CDK pk sellerId sk actionId GSI marketplace+status,Andres,#14 DevOps,Cycle 2,Medium,4,T2.15 T2.16b,AmazonAdsOAuth2Flow dual OAuth Amazon Ads API separate from LWA SP-API,Andres,#12 Marketplace Provider,Cycle 2,Medium,8,T1.13 T2.16c,ISKUResolver Implementations MeLi ASIN Shopify numeric bidirectional mapping,Andres,#12 Marketplace Provider,Cycle 2,Medium,8,T1.10 T2.17,Chat UI + Markdown Rendering bubbles indicators thinking/executing-tool/done syntax highlight (+0.5d integration T1.BB),Sergio,#1 Native Shell,Cycle 2,Urgent,20,T1.19 T2.18,CoachWebSocketService main process backoff reconnect 1s-30s heartbeat 30s REST fallback 2s,Sergio,#1 Native Shell,Cycle 2,Urgent,8,T1.7 T2.19,URL->Metadata Injection WebContentsView URL -> marketplace page-type product IDs as metadata,Sergio,#1 Native Shell,Cycle 2,High,8,T1.18 T2.20,react-router View Navigation /chat /profile /billing /enrollment /onboarding persistent chat,Sergio,#1 Native Shell,Cycle 2,Medium,8,T2.17 T2.21,OnboardingWizard 5 Steps welcome->OAuth->profile->guided-query->success localStorage skip from 3 (+0.5d T1.BB),Sergio,#1 Native Shell,Cycle 2,High,20,"T2.17,T1.12" T2.MK1,Mockup ChatView: integrate chat organisms from T1.BB in real ChatView component,Sergio,#1 Native Shell,Cycle 2,Medium,8,"T1.BB,T2.17" T2.MK2,Mockup OnboardingWizard: integrate onboarding components from T1.BB,Sergio,#1 Native Shell,Cycle 2,Medium,4,"T1.BB,T2.21" T2.BB,Molecules remaining + Data/Flow Organisms delivery,UX/UI,#18 Design System,Cycle 2,Urgent,48,T1.BB T2.40,Gate 1 signed build .dmg + .exe,Sergio,#1 Native Shell,Cycle 2,High,8,"T0.9,T0.10" T2.22,KB Phase 2 Incremental Processing SHA-256 hash is_current flag re-embed changed only,Mateo,#9 Cerebro KB,Cycle 2,High,16,T1.21 T2.23,KB Phase 3 Batch Embeddings 250 texts/call Vertex AI goroutine pool semaphore max 5,Mateo,#9 Cerebro KB,Cycle 2,High,16,T1.21 T2.24,Eval Phase 1 LLM Judge + EvalRunner AnthropicLLMJudge YamlDatasetLoader CLI eval.ts 20 cases,Pablo,#16 Eval Suite,Cycle 2,High,24,T1.24 T2.25,E2E Testing via Playground real Sellerfy data document QA findings -> Linear Beautonomous,Pablo,#16 Eval Suite,Cycle 2,High,16,"T1.6,T2.1" T2.26,Bootstrap ~150 Tasks in Linear via Beautonomous 6 cycles L/M/S labels critical path deps,Pablo,#17 Beautonomous,Cycle 2,Medium,4,T0.7 T2.26a,Quality Gate 5-Step Beautonomous structure->lint->tests->arch-review->convention before PRs,Pablo,#17 Beautonomous,Cycle 2,High,8,T0.5 T3.1,10 READ Handlers Real connect to Fast Data Layer 11 FastAPI endpoints Zod -> HTTP -> ToolResult,Mateo,#3 Tool Registry,Cycle 3,Urgent,24,"T2.5,T2.13" T3.2,ConfirmationFlow WRITE pauses -> shows diff -> Aceptar/Rechazar -> 35min timeout DynamoDB TTL,Mateo,#2 Orchestrator,Cycle 3,Urgent,16,"T1.6,T2.3" T3.3,4 WRITE Tool Handlers update_product_content update_price pause_product activate_product,Mateo,#3 Tool Registry,Cycle 3,Urgent,24,T3.2 T3.4,ProactiveSuggestionService afterTool LLM evaluates result max 2/turn dedup 7d no hardcoded rules,Mateo,#6 Proactive Suggestions,Cycle 3,High,16,T2.4 T3.5,IGuardService + InputGuard prompt injection pattern matching out-of-scope graceful degradation,Mateo,#7 Guardrails,Cycle 3,High,8,T1.6 T3.5a,HttpCreditGate in conversation-api HTTP POST /internal/gate READ=1 ANALYSIS=2 WRITE=3 fail-open,Mateo,#2 Orchestrator,Cycle 3,Urgent,8,T3.24 T3.6,Enrichment Scaffold + Interfaces IEnrichmentService IMarketIntelligenceAdapter IEnrichmentCache DI,Mateo,#11 Enrichment,Cycle 3,High,8,T0.8 T3.7,MeliMarketIntelligenceAdapter Search+Items API free search_market_products competitor pricing,Mateo,#11 Enrichment,Cycle 3,High,16,T3.6 T3.8,VisionLLMContentAdapter Claude Vision analyze_product_image analyze_product_video,Mateo,#11 Enrichment,Cycle 3,Medium,8,T3.6 T3.9,RedisEnrichmentCache + EnrichmentService TTL per tool router marketplace->adapter fail gracefully,Mateo,#11 Enrichment,Cycle 3,High,8,"T3.7,T3.8" T3.10,Enrichment CDK Stack Lambda + API Gateway + ElastiCache Redis + VPC,Mateo,#11 Enrichment,Cycle 3,High,8,T3.9 T3.11,8 ANALYSIS Tool Handlers connect to IEnrichmentService 5 ops + keyword + fee + enhance(NotImpl),Mateo,#3 Tool Registry,Cycle 3,High,16,T3.9 T3.12,HallucinationChecker numeric claims fees metrics vs tool results post-gen log no block Phase 1,Mateo,#2 Orchestrator,Cycle 3,Medium,8,T3.1 T3.13,Fast Data Layer 11 FastAPI Endpoints GET /data/{user_id}/fast/{tool} GCS Parquet <500ms no Redis,Andres,#10 Data Sync,Cycle 3,Urgent,24,T2.13 T3.14,GCS Snapshots for ConfirmationFlow snapshot/{tool}/{ts} pre-write state rollback cleanup DAG,Andres,#10 Data Sync,Cycle 3,High,8,T3.13 T3.15,DAG Amazon IExtractor ILoader AmazonAuthManager AmazonExtractor AmazonLoader Bronze schemas,Andres,#10 Data Sync,Cycle 3,High,24,"T2.14,T2.11" T3.16,IRateLimiter per Marketplace: MeLi token bucket 1500/min Amazon burst Shopify leaky bucket Redis,Andres,#12 Marketplace Provider,Cycle 3,High,8,T1.12 T3.17,Onboarding Trigger first sync post-onboarding marketplace connect -> trigger DAG initial sync,Andres,#12 Marketplace Provider,Cycle 3,Medium,8,"T1.12,T2.14" T3.18,CI/CD Multi-Repo Complete all 11 repos GitHub Actions staging auto-deploy on main merge,Andres,#14 DevOps,Cycle 3,High,16,T2.16 T3.19,BillingView current plan credits remaining usage stats Stripe Checkout in system browser alerts (+0.5d T2.BB),Sergio,#1 Native Shell,Cycle 3,High,20,T2.20 T3.20,WRITE Confirmation Dialogs diff red/green Aceptar/Rechazar 35min timeout ConfirmationFlow T3.2,Sergio,#1 Native Shell,Cycle 3,Urgent,8,"T2.17,T3.2" T3.21,Suggestion Cards + Tool Progress clickable cards pre-contextualized chat spinner tool name (+0.5d T2.BB),Sergio,#1 Native Shell,Cycle 3,High,12,"T2.17,T2.18" T3.22,ProfileView connected marketplaces usage stats preferences language notifications,Sergio,#1 Native Shell,Cycle 3,Medium,8,T2.20 T3.23,Stripe Checkout + Customer Portal Pro $49/mo cancel update-payment checkout.session.completed 500cr,Sergio,#13 Billing,Cycle 3,Urgent,24,T3.19 T3.24,ICreditsGate + Credits Backend POST /internal/gate READ=1 ANALYSIS=2 WRITE=3 conditional write,Sergio,#13 Billing,Cycle 3,Urgent,16,T3.23 T3.24a,Billing Schema Migration ALTER TABLE + credit_packs subscription_events credit_transactions,Sergio,#13 Billing,Cycle 3,High,8,T3.23 T3.24b,SubscriptionLifecycleService activate cancel-7d-grace upgrade downgrade invoice.payment_failed,Sergio,#13 Billing,Cycle 3,High,8,T3.23 T3.24c,Monthly Credit Reset Cron EventBridge Lambda 1st of month plan credits pack credits 12mo TTL,Sergio,#13 Billing,Cycle 3,Medium,4,T3.24 T3.25,KB BigQuery Indexing 15-20 docs pipeline Go verify top-5 semantic search 5 test queries,Mateo,#9 Cerebro KB,Cycle 3,High,8,"T1.22,T1.23" T3.MK1,Mockup BillingView: integrate CreditEconomy + billing components from T2.BB,Sergio,#1 Native Shell,Cycle 3,Medium,4,"T2.BB,T3.19" T3.MK2,Mockup ProfileView: integrate profile components from T2.BB,Sergio,#1 Native Shell,Cycle 3,Medium,4,"T2.BB,T3.22" T3.MK3,Mockup ConfirmDialog: integrate ConfirmDialog organism from T2.BB in WRITE flow,Sergio,#1 Native Shell,Cycle 3,Medium,4,"T2.BB,T3.20" T3.BB,Advanced Organisms + [LIB] Pattern Components complete delivery,UX/UI,#18 Design System,Cycle 3,Urgent,40,T2.BB T3.32,Token Pipeline + Style Dictionary extract design-tokens.json via MCP -> CSS :root + tailwind.config.ts validate naming modes,Mateo,#18 Design System,Cycle 3,High,16,T0.BB T3.26,Eval Phase 2 CI Integration eval-on-pr.yml PR blocked if !passed auto-comment <10 min 20 cases,Pablo,#16 Eval Suite,Cycle 3,High,16,T2.24 T3.27,Golden Dataset 50 Cases 15 product 10 pricing 8 WRITE 7 proactive 10 edge cases injection,Pablo,#16 Eval Suite,Cycle 3,High,24,T2.24 T3.28,QA Conversation Flows 3 Marketplaces real Sellerfy data all flows issues -> Linear Beautonomous,Pablo,#16 Eval Suite,Cycle 3,High,16,"T3.1,T3.3" T4.1,WebSocket Streaming replace REST 8 server->client events 4 client->server session restore,Mateo,#2 Orchestrator,Cycle 4,Urgent,16,"T1.7,T3.3" T4.2,SystemPromptComposer L3 writeCapable=true block guardrails injected conditionally hard cap 1200tok,Mateo,#4 Personality,Cycle 4,High,8,"T1.5,T3.3" T4.3,OutputGuard post-LLM data leak prevention dangerous content filter critical alert on leak,Mateo,#7 Guardrails,Cycle 4,High,8,T3.5 T4.4,WRITE Tools Remaining up to 13 additional: images video stock close publish answer hide send,Mateo,#3 Tool Registry,Cycle 4,High,24,T3.3 T4.5,Performance Optimization p95 <3s context compaction cache hit parallelization profile bottlenecks,Mateo,#2 Orchestrator,Cycle 4,High,16,T4.1 T4.5a,FeedbackCapture Hook after_tool -> POST /feedback/capture DynamoDB #15 WRITE success fire-forget,Mateo,#2 Orchestrator,Cycle 4,Medium,8,"T2.4,T4.13" T4.5b,ActionLog Entity + DynamoActionLogRepository pk User#{userId} sk Action#{ULID} GSI Conv#{convId},Mateo,#2 Orchestrator,Cycle 4,Medium,8,"T2.4,T3.3" T4.6,Staging Deploy Full Stack AWS: CDK Lambda API-GW DynamoDB ElastiCache RDS + Terraform GCP CR BQ,Andres,#14 DevOps,Cycle 4,Urgent,24,"T2.15,T3.18" T4.7,Load Testing 50 Concurrent Users Artillery/k6 p95 <2s API excl LLM latency identify bottlenecks,Andres,#14 DevOps,Cycle 4,Urgent,16,T4.6 T4.8,CloudWatch Dashboard + Alerts latency error-rate LLM-cost/conv PagerDuty Slack cost/day $50,Andres,#14 DevOps,Cycle 4,High,16,T4.6 T4.9,Data Sync Silver + Gold INormalizer SilverNormalizer DailySummaryAggregator Brand Health spike,Andres,#10 Data Sync,Cycle 4,High,24,T3.13 T4.9a,API Gateway v2 WebSocket CDK $connect $disconnect $default DynamoDB connection-ids Lambda auth,Andres,#14 DevOps,Cycle 4,Urgent,8,T4.6 T4.10,WebSocket Client Progressive Rendering 8 server->client events text_delta tool_start suggestion (+0.5d T3.BB),Sergio,#1 Native Shell,Cycle 4,Urgent,20,"T2.18,T4.1" T4.11,EnrollmentView Standalone BrowserWindow OAuth redirect per marketplace reuses T2.21 tokens,Sergio,#1 Native Shell,Cycle 4,High,8,T2.21 T4.12,Sentry Crash Reporting main + renderer source maps error grouping crash feedback dialog,Sergio,#1 Native Shell,Cycle 4,Medium,4,T1.16 T4.13,Feedback Loop Scaffold package.json IFeedbackRepository IFeedbackGate IDataSyncClient models,Sergio,#15 Feedback Loop,Cycle 4,High,8,T0.8 T4.14,calculateImpactScore + DynamoFeedbackRepository sales*0.4 conversion*0.3 findPendingEntries GSI1,Sergio,#15 Feedback Loop,Cycle 4,High,16,T4.13 T4.15,FeedbackMeasurerService + Lambdas processEntries>7d EventBridge 6h /feedback/:userId/summary,Sergio,#15 Feedback Loop,Cycle 4,High,16,T4.14 T4.15a,FeedbackGate Anti-Fatigue max 1 explicit/day <3 interactions skip 24h cooldown should-prompt,Sergio,#15 Feedback Loop,Cycle 4,Medium,4,T4.13 T4.15b,Explicit Feedback Endpoint POST /feedback/:userId/explicit rating 1-5 comment conversationId,Sergio,#15 Feedback Loop,Cycle 4,Medium,4,T4.14 T4.15c,Implicit Feedback Endpoint POST /feedback/:userId/implicit accepted/rejected/edited originalValue,Sergio,#15 Feedback Loop,Cycle 4,Medium,4,T4.14 T4.15d,Grace Period 7d Billing customer.subscription.deleted -> grace_period_end cron check expired,Sergio,#13 Billing,Cycle 4,High,4,T3.24b T4.MK1,Mockup EnrollmentView: integrate EnrollmentCard organism from T3.BB,Sergio,#1 Native Shell,Cycle 4,Medium,4,"T3.BB,T4.11" T4.MK2,Mockup complete WRITE flow: integrate ReActStream + ConfirmDialog + ToolAccordion in real WS flow,Sergio,#1 Native Shell,Cycle 4,Medium,8,"T3.BB,T4.10" T4.BB,Figma Quality Audit + Corrections: All frames Ready for development,UX/UI,#18 Design System,Cycle 4,High,24,T3.BB T4.24,Gate 2 signed build .dmg notarized + .exe,Sergio,#1 Native Shell,Cycle 4,High,4,T2.40 T4.16,KB Batch v2 Vertex AI 250/call incremental if >5min >80% hit rate 20 eval queries,Mateo,#9 Cerebro KB,Cycle 4,High,16,"T2.22,T2.23" T4.17,Eval CI Automated GitHub Action 50 golden cases every push CI fails if score <0.70 Slack notify,Pablo,#16 Eval Suite,Cycle 4,Urgent,16,"T3.26,T3.27" T4.18,Test ProactiveSuggestions Real Data verify triggers message quality dedup max 2/turn iterate prompt,Pablo,#16 Eval Suite,Cycle 4,High,16,T3.4 T4.19,Beta User Selection + Onboarding Prep 10-15 Sellerfy sellers video walkthrough 1-on-1 30min,Pablo,#17 Beautonomous,Cycle 4,High,16,T2.21 T4.19a,Eval Contract Testing Pipeline consumer-driven ToolRegistry->DataSync->MP->Enrichment contracts,Pablo,#16 Eval Suite,Cycle 4,High,16,T3.26 T4.19b,KB Quality Eval Pipeline precision@5 recall hit rate 20 queries CI fails if hit-rate <80%,Pablo,#16 Eval Suite,Cycle 4,High,8,T4.16 T5.1,LLMGuardChecker Haiku classifier for ambiguous injection edge cases fallback pass-through,Mateo,#7 Guardrails,Cycle 5,High,8,"T3.5,T4.3" T5.2,Backend Bug Fixes P1/P2 empty results LLM refuses tool concurrent WRITE expired tokens mid-conv,Mateo,#2 Orchestrator,Cycle 5,Urgent,32,T4.1 T5.3,System Prompt v3 Final beta feedback tone issues tool selection patterns edge cases,Mateo,#4 Personality,Cycle 5,High,8,T5.10 T5.4,Production Deploy CDK Lambda+API-GW prod Terraform Cloud Run Data API prod SSL api.shopilot.ai,Andres,#14 DevOps,Cycle 5,Urgent,24,T4.6 T5.5,IaC Production Complete DynamoDB PITR 35d IAM Lambda concurrency GCS lifecycle Redis PG backups,Andres,#14 DevOps,Cycle 5,High,16,T5.4 T5.6,Rollback Testing Lambda version <1min Cloud Run revision <1min document runbook,Andres,#14 DevOps,Cycle 5,High,8,T5.4 T5.6a,Data Sync Phase 4 OpenMetadata FQNs embed_fast_dag embed_health_dag Bronze->Gold->KB lineage,Andres,#10 Data Sync,Cycle 5,Medium,16,T4.9 T5.7,Code Signing + .dmg + Auto-Updater Apple cert notarytool stapling S3 releases test clean Mac,Sergio,#1 Native Shell,Cycle 5,Urgent,16,T4.12 T5.8,Electron Security Hardening CSP sandbox nodeIntegration=false webSecurity=true telemetry opt-out,Sergio,#1 Native Shell,Cycle 5,High,8,T5.7 T5.9,UI/UX Beta Bug Fixes all feedback bugs RAM <500MB target polish animations loading states (+0.5d post-audit T4.BB),Sergio,#1 Native Shell,Cycle 5,Urgent,28,T4.19 T5.10,Billing Stripe Live switch test->live checkout webhooks credits packs SSL billing endpoints,Sergio,#13 Billing,Cycle 5,Urgent,8,"T3.23,T5.4" T5.MK1,Mockup Dashboard view: integrate final post-audit components in a dashboard-style view,Sergio,#1 Native Shell,Cycle 5,Medium,8,"T4.BB,T5.9" T5.11,Beta Onboarding 10-15 Sellers .dmg -> marketplace -> query -> action 1-on-1 30min activation,Pablo,#17 Beautonomous,Cycle 5,Urgent,24,"T5.7,T5.4" T5.12,Feedback Calls + Iteration 15min each user top 5 issues -> Linear via Beautonomous,Pablo,#17 Beautonomous,Cycle 5,High,16,T5.11 T5.13,OWASP Top 10 Security Review injection auth XSS SSRF data exposure fix P1s,Pablo,#16 Eval Suite,Cycle 5,High,8,"T5.4,T5.7" T5.14,System Prompt v2 Beautonomous 10-week iteration update governance index docs OpenClaw KB,Pablo,#17 Beautonomous,Cycle 5,Medium,8, T5.15,Go/No-Go 60min sync 4 engineers checklist: tools Stripe 10+ beta .dmg OWASP p95 eval. Pablo signs,Pablo,#17 Beautonomous,Cycle 5,Urgent,4,"T5.1,T5.2,T5.3,T5.4,T5.5,T5.6,T5.7,T5.8,T5.9,T5.10,T5.11,T5.12,T5.13,T5.14" T5.15a,E2E Eval Pipeline query->tool-selection->execution->response 10+ scenarios different from LLM Judge,Pablo,#16 Eval Suite,Cycle 5,High,16,T4.17 T3.40,Extend EvalConfig+CLI add desktop_build figma_quality pipelineType DesktopBuildReport FigmaQualityReport CLI flags,Pablo,#16 Eval Suite,Cycle 3,High,8,T2.24 T3.41,FigmaRESTClient IFigmaAPIClient getFile getFileVariables getFileComponents getFileStyles FIGMA_ACCESS_TOKEN,Pablo,#16 Eval Suite,Cycle 3,High,12,T3.40 T3.42,FigmaQualityRunner + variable checks variable_architecture code_syntax semantic_aliasing light_dark_modes,Pablo,#16 Eval Suite,Cycle 3,High,16,T3.41 T3.43,Component checks auto_layout naming_convention states_coverage color_hardcoding spacing_hardcoding,Pablo,#16 Eval Suite,Cycle 3,High,16,T3.42 T3.44,Quality checks + report wcag_contrast descriptions mcp_compatibility compliance per file violations by severity,Pablo,#16 Eval Suite,Cycle 3,High,8,T3.42 T4.25,Code Signing Secrets macOS Developer-ID+notarization Windows Authenticode GitHub Secrets electron-builder verify,Pablo,#16 Eval Suite,Cycle 4,High,8, T4.26,DesktopBuildRunner + core checks compilation code-signing notarization startup<5s bundle<250MB native-modules,Pablo,#16 Eval Suite,Cycle 4,High,24,T3.40 T4.27,Secondary checks auto-updater deep-links window-rendering IPC-channels ping/pong warnings not blockers,Pablo,#16 Eval Suite,Cycle 4,Medium,8,T4.26 T4.28,GitHub Actions desktop-build-eval.yml 3 jobs build-macos build-windows report PR comment trigger desktop-client,Pablo,#16 Eval Suite,Cycle 4,High,12,T4.26 T4.29,GitHub Actions figma-quality-eval.yml workflow_dispatch + cron weekly Monday 8UTC Slack #engineering report,Pablo,#16 Eval Suite,Cycle 4,Medium,4,"T3.42,T3.43,T3.44"
9.8Full Task Registry — Acceptance CriteriaRegistro Completo de Tareas — Criterios de Aceptación
Cross-reference with sprint tables in 9.11. 183 tasks across 6 phases (Phase 0 + S1-2 + S3-4 + S5-6 + S7-8 + S9-10) + S11-12 buffer. Each task has 2–4 measurable acceptance criteria. Source: 70-EXEC-BACKLOG-CORREGIDO.md v2.0 + 75-EVAL-EXTENSION-TAREAS.md. All tracked in Linear via Beautonomous (#17).Referencia cruzada con tablas de sprint en 9.11. 183 tareas en 6 fases + buffer S11-12. Cada tarea tiene 2–4 criterios de aceptación medibles. Fuente: 70-EXEC-BACKLOG-CORREGIDO.md v2.0 + 75-EVAL-EXTENSION-TAREAS.md. Todas trackeadas en Linear vía Beautonomous (#17).
Phase 0 — Pre-Sprint — T0.1–T0.11 (11 tasks) — #17 BeautonomousFase 0 — Pre-Sprint — T0.1–T0.11 (11 tareas) — #17 Beautonomous
- ✓Project ‘Beautonomous’ exists in OpenClaw with type = operational agent
- ✓All 4 members (pablo@, mateo@, andres@, sergio@) can log in and see the project
- ✓Project is ready to receive OAuth connectors
- ✓GitHub org authorized; Beautonomous can read all 11
core-(capa)-proyectorepos - ✓El Mago (Mateo) has full access; Artesano roles have limited read access per F2.1–F2.3
- ✓4+ core repos visible and indexed in Beautonomous project context
- ✓Workspace ‘beautonomous’, team AUT, linked with read+write for issues/cycles/members
- ✓Beautonomous can create and update issues in Linear programmatically
- ✓Team AUT visible in Beautonomous project context
- ✓#engineering, #deploys, #general channels connected with read+write
- ✓Beautonomous posts test message to #engineering successfully
- ✓All 3 channels return message history when queried
- ✓Prompt identifies as Beautonomous (NOT as Shopilot Coach — distinct identity)
- ✓Roles defined: Capitán (Pablo), Mago (Mateo), Artesano (Andrés/Sergio) with correct permissions
- ✓6 governance rules present and testable
- ✓Test: ‘quién eres?’ → responds as Beautonomous, not as Coach
- ✓pablo → Capitán, mateo → Mago, andres/sergio → Artesano verified in config
- ✓Each role has correct permission set per spec F2.1–F2.3
- ✓Role-based access test: Artesano cannot perform Capitán-only actions
- ✓17 projects visible in Linear team AUT
- ✓6 cycles created (2 weeks each, incl. S11-12 buffer)
- ✓Labels exist: L/M/S, Track-{Mateo,Andrés,Sergio,Pablo}, Risk-{low,medium,high}
- ✓Workflow states present: Backlog → Todo → In Progress → In Review → Done
- ✓All 4 engineers complete 3 queries each (12 total: GitHub read, Linear task create, code read)
- ✓Role permissions verified: each engineer sees only their allowed actions
- ✓Beautonomous checkpoint: operational — Linear task created to mark CORE live
- ✓Apple Developer account created and approved ($99/yr paid)Cuenta Apple Developer creada y aprobada ($99/año pagado)
- ✓Developer ID Application certificate generated and downloadedCertificado Developer ID Application generado y descargado
- ✓Certificate installed in team keychain,
codesign --verifypassesCertificado instalado en keychain del equipo,codesign --verifypasa
- ✓Code signing certificate (OV or EV) purchased from trusted CACertificado code signing (OV o EV) comprado de CA confiable
- ✓Certificate installed and signtool verify passes on WindowsCertificado instalado y signtool verify pasa en Windows
- ✓If EV: hardware token received and configuredSi EV: token de hardware recibido y configurado
- ✓Brand book requested from external design team following
core-product-design-systemrepo guidelinesBrand book solicitado al equipo de diseño externo siguiendo las guías del repocore-product-design-system - ✓Visual identity deliverable received: logo, colors, typography, usage rulesEntregable de identidad visual recibido: logo, colores, tipografía, reglas de uso
- ✓Brand book assets accessible to UX/UI team for T0.BB (Figma foundations delivery)Assets del brand book accesibles para equipo UX/UI para T0.BB (entrega Figma foundations)
- ✓Pablo approves delivery, tokens exported to design-tokens.jsonPablo aprueba entrega, tokens exportados a design-tokens.json
- ✓Components match brand book visual identityComponentes coinciden con identidad visual del brand book
- ✓[LIB] Foundations & Tokens + [LIB] Iconography published in Figma[LIB] Foundations & Tokens + [LIB] Iconography publicados en Figma
Sprints 1-2 — Walking Skeleton — T1.1–T1.33 (32 tasks)Sprints 1-2 — Walking Skeleton — T1.1–T1.33 (32 tareas)
Mateo — #2 Orchestrator · #4 Personality · #5 Context · #8 Observability · #9 Cerebro KB — 12 tasks
- ✓IDs are ULID format (verified in DynamoDB items — no UUID format present)
- ✓Trace SK =
Trace#{messageId}; queryEmbedding/answer fields absent from Trace - ✓GSI1 projection = INCLUDE (not ALL); GSI2 is sparse — verified in CDK stack code
- ✓Unit tests for KeyBuilders pass; CDK deploy completes without errors
- ✓
pk: User#{userId},sk: Profile— correct keys in DynamoDB - ✓Fields present: userId, marketplaces[], productCategories[], declaredGoals[], lastUpdatedAt
- ✓
DynamoUserProfileRepository.save()+findById()unit tests pass
- ✓
findWindowForPrompt(convId, windowSize)returns correct last-N messages - ✓Token budget respected: history block + system + rag + response ≤ 200K total
- ✓Test with 100-message conversation: correct window loaded, no overflow
- ✓
chat()signature acceptstoolDefinitions?andthinkingBudget? - ✓Return type is
{ content: ContentBlock[], stopReason }(not{ content: string }) - ✓All 3 clients (OpenRouter, Anthropic, Vertex) compile and pass updated interface
- ✓Tool response block parsed correctly from ContentBlock[] in unit test
- ✓L1 block has
cache_control: { type: “ephemeral” }and ~500 tokens - ✓L2 block includes UserProfile data and critical alerts when present
- ✓
compose(context)returns{ blocks[], estimatedTokens } - ✓estimatedTokens ≤ 1000 for typical context
- ✓Loop runs Reason → Act → Observe cycle for multi-turn conversations
- ✓Stops at MAX_ROUNDS=10 even without done signal; cost guard triggers at 50K tokens
- ✓Context budget = 200K − system − history − tools − 4000 enforced
- ✓Tool error →
tool_resultwithis_error: true, loop continues
- ✓POST /conversation returns complete response after all ReAct rounds finish (no streaming yet)
- ✓Full content assembled from all text ContentBlocks
- ✓Internal events logged: round count, tools used, cost per turn
- ✓ConversationTrace includes tool_calls[] with tool name, args, and result per round
- ✓round_count field populated correctly in trace
- ✓Existing trace tests still pass (no regressions from ReAct changes)
Andrés — #12 Marketplace Provider · #10 Data Sync · #14 DevOps — 11 tasks
- ✓
npm run buildpasses with 0 TypeScript errors - ✓Folders: domain/, application/, infrastructure/, presentation/ all present
- ✓Value Objects (Marketplace, SKU, MarketplaceCredential) instantiate with validation
- ✓DI container wires all dependencies without circular errors
- ✓Interface declares all 23 methods across 4 domains (Catalog, Engagement, Advertising, Enrollment)
- ✓
ISKUResolver.resolve(sku, marketplace)returns native marketplace ID - ✓TypeScript compiles with 0 errors
- ✓AES-256-GCM encrypt/decrypt round-trip unit test passes
- ✓DynamoDB table
marketplace-credentialscreated in CDK with correct schema - ✓
ITokenManager.get(sellerId, marketplace)auto-refreshes when remaining time < 15min - ✓Token never stored in plaintext in DynamoDB (verified by direct table scan)
- ✓OAuth2 code flow completes end-to-end for test MeLi developer account
- ✓
GET /users/me/itemsreturns real seller items with correct mapping - ✓MeLi 403 → AuthenticationError; 429 → RateLimitError (mapped correctly)
- ✓Code reused from
context/marketplace-connection/(no duplication of logic)
- ✓LWA OAuth flow scaffolded (or stubbed pending E1 approval — status documented in Linear)
- ✓SP-API SDK initialized with correct region config
- ✓Rate limiting config per API family defined (even if not yet active)
- ✓All methods return
NotImplementedErrorwith descriptive TODO comment
- ✓
terraform planreturns 0 changes on existing resources - ✓All GCS buckets, Cloud Run services, Airflow, BigQuery in
terraform state listare green - ✓Airflow health check endpoint returns OK
- ✓Amazon SP-API dev account requested (email/ticket link documented in Linear on day 1)
- ✓MeLi dev portal app created; Shopify Partners account created
- ✓Apple Developer Program enrollment submitted
- ✓Linear tasks created per dependency with expected approval date and owner
- ✓Valid state transitions work: disconnected→pending, pending→active, active→expired/revoked
- ✓Invalid transition throws
DomainError(e.g., disconnected→active not allowed) - ✓Persisted to DynamoDB and rehydrated correctly
- ✓
sellerConnection.isActive()returns true only in active state
- ✓Entity has all fields: actionId, sellerId, marketplace, method, status, requestPayload, responsePayload, latencyMs
- ✓
IMarketplaceActionRepository.save()andfindByIdAndSeller()pass unit tests - ✓latencyMs measured from call start to response completion
- ✓Interface declares
authorize(),exchangeCode(),refreshToken() - ✓MeLiOAuth2Flow implements interface (TypeScript compiles)
- ✓AmazonLWAFlow and ShopifyOAuth2Flow stubs implement interface without TypeScript errors
- ✓MeLi: 3 WRITE API docs collected (endpoints, auth, payloads, error codes)MeLi: 3 docs API WRITE recolectados (endpoints, auth, payloads, códigos error)
- ✓Amazon Ads: 5 WRITE API docs collected with campaign management endpointsAmazon Ads: 5 docs API WRITE recolectados con endpoints gestión campañas
- ✓Amazon SP-API: 2 WRITE API docs collectedAmazon SP-API: 2 docs API WRITE recolectados
- ✓Shopify: 9 WRITE API docs collected (GraphQL mutations documented)Shopify: 9 docs API WRITE recolectados (mutaciones GraphQL documentadas)
- ✓All docs organized per marketplace in shared repo, ready for #3 Tool Registry mappingTodos los docs organizados por marketplace en repo compartido, listos para mapeo de #3 Tool Registry
- ✓At least 2 external auth providers evaluated (Auth0, Clerk, Memberstack) with pros/cons matrixAl menos 2 proveedores auth externos evaluados (Auth0, Clerk, Memberstack) con matriz pros/contras
- ✓Service methods documented: authentication, authorization, credential management, user CRUDMétodos de servicio documentados: autenticación, autorización, gestión credenciales, CRUD usuarios
- ✓Integration points with consumer layers identified (Shell, API, Billing)Puntos de integración con capas consumidoras identificados (Shell, API, Billing)
- ✓GitHub Actions workflow triggers on push to release/* branchesWorkflow de GitHub Actions se activa en push a ramas release/*
- ✓Builds .dmg + .exe artifacts, uploads to GitHub Releases as draftGenera artefactos .dmg + .exe, los sube a GitHub Releases como borrador
- ✓Build completes in <10min, artifacts downloadable by teamBuild completa en <10min, artefactos descargables por el equipo
Sergio — #1 Native Shell — 6 tasks
- ✓
npm run devstarts Electron app without errors - ✓Main process, preload script, and renderer all load correctly
- ✓Hot reload: file change → app refreshes in <3s
- ✓
npm run buildproduces distributable artifact
- ✓WebContentsView (NOT BrowserView — deprecated in Electron 26+) loads MeLi URL correctly
- ✓Navigation controls (back/forward/refresh) function as expected
- ✓Marketplace session persists after app restart (electron-store verified)
- ✓Main window is ~70% of screen width
- ✓
articulo.mercadolibre.com/MLM-xxx→ detected as MeLi product page with item ID - ✓
amazon.com/dp/BXXXXX→ detected as Amazon product page with ASIN - ✓Remote config JSON fetched on startup; fallback to local patterns if fetch fails
- ✓Unknown URL → returns null (no crash)
- ✓Marketplace tabs render above WebContentsView and are clickable
- ✓Sidebar React panel renders at 360px on right side
- ✓IPC: renderer sends request, main responds with data (round-trip verified)
- ✓
Cmd+Btoggles sidebar visibility
- ✓JWT stored in electron-store with OS-level encryption key (not plaintext)
- ✓Login flow: Memberstack login → JWT stored → app navigates to /chat
- ✓Logout clears JWT from electron-store completely
- ✓
AuthService.isAuthenticated()returns correct value on app start
- ✓electron-builder produces .dmg (macOS) and .exe (Windows) without errorselectron-builder produce .dmg (macOS) y .exe (Windows) sin errores
- ✓.dmg installs on macOS, .exe installs on Windows — app launches and shows shell.dmg instala en macOS, .exe instala en Windows — app lanza y muestra shell
- ✓All 4 team members install and verify basic functionalityLos 4 miembros del equipo instalan y verifican funcionalidad básica
- ✓Tokens render correctly in React, component integration verifiedTokens renderizan correctamente en React, integración de componentes verificada
- ✓Design system CSS variables match Figma token valuesVariables CSS del design system coinciden con valores de tokens Figma
UX/UI — #18 Design System — 1 task
- ✓All atoms published in [LIB] Core ComponentsTodos los atoms publicados en [LIB] Core Components
- ✓Chat organisms ready for dev: MessageBubble, ChatInputBar, ContextBar, OnboardingStepOrganismos de chat listos para dev: MessageBubble, ChatInputBar, ContextBar, OnboardingStep
- ✓Pablo approves deliveryPablo aprueba entrega
Pablo — #16 Eval Suite — 1 tasks
- ✓
SELECT COUNT(*)on embeddings matches unique doc count (no duplicates) - ✓
embedded_atcolumn has real timestamps (not null or epoch 0) - ✓CI pipeline uses Go 1.24 (not 1.21)
- ✓created_at, updated_at, embedded_at are 3 separate fields in schema
- ✓Each stored chunk has
[namespace/type] titleprefix in text - ✓Chunking splits on ## and ### Markdown headers
- ✓Adjacent chunks share 150-char overlap (verified in chunk boundary test)
- ✓Retrieve chunk → prefix is present in stored text (not stripped)
- ✓15+ documents indexed in BigQuery (verified by row count)
- ✓Documents cover: MeLi best practices, Amazon policies, Shopify guides, pricing strategy, photo optimization, KPIs, FAQ
- ✓Each document has: title, source URL, language metadata
- ✓
SELECT COUNT(*) FROM embeddings WHERE is_current=true≥ 15
- ✓
npm run eval -- --dry-runexecutes without crashing - ✓IEvalPipeline, ILLMJudge, IGoldenDatasetManager interfaces exist in domain/
- ✓15+ YAML golden cases in dataset/ directory, each with: input, expected_scope, expected_tool, min_judge_score
- ✓EvalConfig and EvalReport models compile correctly
- ✓Spec document exists for all 10 READ tools (get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics)
- ✓Each spec has: name, LLM-friendly description, valid JSON Schema inputSchema, riskLevel=READ, creditCost=1
- ✓Specs importable by TypeScript ToolRegistry without type errors
- ✓Amazon Brand Registry application submitted with required documentationSolicitud Amazon Brand Registry enviada con documentación requerida
- ✓Amazon Ads developer account application submittedSolicitud cuenta developer Amazon Ads enviada
- ✓MercadoLibre developer portal registration completedRegistro en portal developer MercadoLibre completado
- ✓Shopify Partner Program application submittedSolicitud Shopify Partner Program enviada
- ✓Weekly tracking issue created in Linear with status per marketplaceIssue de seguimiento semanal creado en Linear con estado por marketplace
- ✓Apple Developer Program enrollment completed ($99/yr paid)Inscripción Apple Developer Program completada ($99/año pagado)
- ✓Code signing certificate requested and ready for useCertificado code signing solicitado y listo para uso
- ✓Microsoft Partner Center account created and verifiedCuenta Microsoft Partner Center creada y verificada
- ✓App recognized as verified publisher on both platforms (or application in progress with tracking)App reconocida como publisher verificado en ambas plataformas (o solicitud en progreso con seguimiento)
Sprints 3-4 — Core Engines — T2.1–T2.40 (38 tasks)Sprints 3-4 — Motores Core — T2.1–T2.40 (38 tareas)
Mateo — #3 Tool Registry · #5 Context · #2 Orchestrator · #9 Cerebro KB — 15 tasks
- ✓
registry.register(def, handler)stores and retrieves handler by name - ✓
registry.getDefinitions()returns array with all required fields per ToolDefinition schema - ✓Zod schema validation: valid input passes, invalid input throws
ZodError - ✓TypeScript compiles with 0 errors; unit tests cover register, getDefinitions, getHandler
- ✓
executor.execute(‘get_product’, { sku: ‘MLB-123’ }, ctx)returns ToolResult - ✓Unknown tool name → ToolResult with
isError: true - ✓ToolExecutor routes to the registered handler without hardcoded names
- ✓Irreversible tool (update_price) →
ConfirmationRequiredErrorthrown before execution - ✓Tool for unconnected marketplace →
MarketplaceNotConfiguredErrorthrown - ✓READ tool on connected marketplace → passes through without confirmation
- ✓Filter added via DI; ToolExecutor code not modified
- ✓before_tool hook executes before handler call (verified via execution log order)
- ✓after_tool hook executes even when handler throws exception
- ✓Hook order: before → execute → after (always, no skip on error)
- ✓Multiple hooks can be registered for same lifecycle event
- ✓All 10 READ handlers registered in ToolRegistry
- ✓Each handler returns hardcoded mock ToolResult with correct field shape
- ✓
execute(‘get_product’, { sku: ‘MLB-123’ })returns product mock data without error - ✓Files organized in
handlers/read/directory, one file per tool
- ✓ToolResult is immutable (Object.freeze or all fields readonly)
- ✓Fields present: toolName, args, result, isError, latencyMs, cached, creditCost
- ✓Instantiation unit test: all fields set correctly, mutation attempt throws
- ✓Tool registered as SYSTEM category in ToolRegistry
- ✓
{ marketplaces: [‘meli’] }input → updates UserProfile.marketplaces in DynamoDB - ✓LLM can invoke tool via tool_call block (correct name and schema in definitions)
- ✓UserProfile retrieved after update reflects new values
- ✓When
conversation.contextSummaryis set, it replaces full history in prompt - ✓
contextSummarizedUpToMessageIdtracks last summarized message ID - ✓Test: 50-message conversation at threshold → summary generated, token count drops below limit
- ✓contextSummary stored correctly in DynamoDB Conversation entity
- ✓All 17 WRITE tools registered with ConfirmationRequired policy in ToolRegistry
- ✓Each stub returns
NotImplementedError(does not call any marketplace) - ✓
getDefinitions()includes all 17 WRITE tools — LLM can “see” them - ✓S12 scope tracking: all 13 deferred WRITE tools have Linear tasks
- ✓KB RAG and Brand Health RAG execute in parallel (verified via
Promise.allor timing logs) - ✓Single embedding call used for both lookups
- ✓KB failure → assembler returns Brand Health context alone (no exception propagated)
- ✓Brand Health failure → assembler returns KB context alone (no exception propagated)
- ✓
getHealthSummary(userId, ‘meli’)returns{ critique: [], delicate: [] } - ✓Result always injected into system prompt (not optional)
- ✓Empty result → empty arrays (not null/undefined, no crash)
- ✓SystemPromptBlock JSON payload has
cache_control: { type: “ephemeral” } - ✓Only AnthropicLLMClient sends cache_control; OpenRouter/Vertex skip it
- ✓Second identical call: Anthropic API response shows
cache_read_input_tokens > 0
- ✓Same tool+args in same session → second call returns cached ToolResult with
cached: true - ✓Different session → fresh execution (cache is per-session, not global)
- ✓WRITE and ANALYSIS tools with side effects are not cached (only READ)
Andrés — #12 Marketplace Provider · #10 Data Sync · #14 DevOps — 10 tasks
- ✓OAuth2 flow completes with real Shopify Partner test store
- ✓GraphQL query for products returns correct data shape
- ✓Rate limit exceeded → throttle respected (backoff logged)
- ✓Products, orders, inventory queries all return expected shape
- ✓If E1 approved: Reports API, Catalog Items, Orders API all return real data
- ✓Rate limit 5 req/s enforced with exponential backoff on 429
- ✓If E1 not approved: defer confirmed, Linear task updated with ‘blocked:E1’ label
- ✓EventBridge rule fires every 5 minutes (confirmed in CloudWatch logs)
- ✓Token refreshed when remaining time < 30min
- ✓DynamoDB mutex prevents concurrent refresh for same seller+marketplace (race test passes)
- ✓3 consecutive failures → alert posted to #deploys Slack
- ✓All API endpoints behave identically before and after refactor (integration tests pass)
- ✓IDataReader, ITokenProvider interfaces exist in domain/ with correct signatures
- ✓VOs: UserId, Marketplace, DateRange validate and reject invalid inputs
- ✓
npm run testpasses with 0 regressions
- ✓MeLi and Shopify DAGs both show status=Success for last @hourly run in Airflow UI
- ✓Bronze schema for both marketplaces matches expected columns (no null PKs)
- ✓Zero failed DAG runs in last 24h
- ✓
cdk deploycompletes with 0 errors - ✓DynamoDB conversation-api table exists with corrected GSIs (ULID keys, INCLUDE projection)
- ✓Lambda + API Gateway v2 HTTP endpoint returns 200 on
GET /health - ✓marketplace-credentials table, Secrets Manager secret, EventBridge rule all provisioned
- ✓CI workflow file exists in all 4 active repos
- ✓PR → CI runs lint + type-check + unit tests; failing test blocks merge
- ✓Build cache via
actions/cachereduces CI runtime by >30% on cache hit
- ✓Table
marketplace-actionscreated via CDK - ✓pk = sellerId (string), sk = actionId (ULID) — format verified
- ✓GSI:
marketplace-status-indexwith pk=marketplace, sk=status
- ✓Separate OAuth2 flow for Amazon Ads API (distinct from LWA SP-API flow)
- ✓Credentials stored separately in Secrets Manager (not mixed with SP-API credentials)
- ✓
authorize() → exchangeCode()flow completes with Amazon Ads test account
- ✓MeLiSKUResolver: internal SKU “ML-123” ↔ MeLi item ID “123”
- ✓AmazonSKUResolver: internal SKU “ASIN-B0xxx” ↔ ASIN “B0xxx”
- ✓ShopifySKUResolver: internal SKU “shop-456” ↔ numeric product ID “456”
Sergio — #1 Native Shell — 6 tasks
- ✓User messages appear in right bubble; assistant messages in left bubble
- ✓Markdown bold, italic, code, lists render correctly in assistant messages
- ✓Indicators visible: ‘pensando...’ while waiting; ‘ejecutando tool X...’ during tool call
- ✓Syntax highlighting active in code blocks
- ✓WebSocket connects to conversation-api server successfully
- ✓On disconnect: reconnect with backoff 1s, 2s, 4s... max 30s
- ✓Heartbeat ping/pong every 30s (verified in network inspector)
- ✓If WebSocket unavailable: REST polling every 2s activates automatically
- ✓MeLi product URL → metadata includes
{ marketplace: ‘meli’, pageType: ‘product’, productId: ‘MLM-xxx’ } - ✓Metadata appended to message payload on every send
- ✓Works for Amazon ASIN URLs and Shopify product URLs
- ✓/chat route loads as default; /profile, /billing, /enrollment, /onboarding all load without crash
- ✓Tab bar visible at bottom, highlights active route
- ✓Chat state (messages, input text) preserved when navigating to /profile and back
- ✓Step 1 (Welcome) shown only on first launch (localStorage flag set after)
- ✓Step 2: OAuth inline popup opens correctly and connects marketplace
- ✓Step 3: Profile form saves to backend successfully
- ✓Step 4: Guided query executes at least one tool successfully
- ✓Step 5: Success screen with next steps shown; skip available from step 3
- ✓macOS: .dmg signed with Developer ID cert, notarized with notarytool, stapledmacOS: .dmg firmado con certificado Developer ID, notarizado con notarytool, stapled
- ✓Windows: .exe signed with code signing cert, SmartScreen doesn’t blockWindows: .exe firmado con certificado code signing, SmartScreen no bloquea
- ✓Both builds include all S3-4 features (chat UI, WebSocket, context injection)Ambos builds incluyen todas las funciones S3-4 (chat UI, WebSocket, inyección de contexto)
- ✓All 4 team members install signed builds, no security warningsLos 4 miembros del equipo instalan builds firmados, sin advertencias de seguridad
- ✓MessageBubble, ChatInputBar render correctly with real dataMessageBubble, ChatInputBar renderizan correctamente con datos reales
- ✓Chat organisms match Figma specifications pixel-perfectOrganismos de chat coinciden con especificaciones Figma pixel-perfect
- ✓OnboardingStep component works in 5-step wizard flowComponente OnboardingStep funciona en flujo de wizard de 5 pasos
- ✓Step transitions and animations match Figma prototypeTransiciones y animaciones de pasos coinciden con prototipo Figma
UX/UI — #18 Design System — 1 task
- ✓All molecules published in [LIB] Core ComponentsTodas las molecules publicadas en [LIB] Core Components
- ✓Data organisms ready for dev: ConfirmDialog, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCardOrganismos de datos listos para dev: ConfirmDialog, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCard
- ✓Pablo approves deliveryPablo aprueba entrega
Pablo — #16 Eval Suite · #17 Beautonomous — 4 tasks
- ✓SHA-256 hash stored per document in BigQuery
- ✓Re-run pipeline with unchanged doc → skipped (no re-embed); changed doc → re-embedded
- ✓
is_current = falseset for outdated embeddings - ✓Test: modify 1 doc → only 1 doc re-embedded (verified via embed count in logs)
- ✓250 texts sent per Vertex AI API call (verified in API logs)
- ✓429 response → retry with backoff (3 attempts max logged)
- ✓Goroutine pool: max 5 concurrent API calls (semaphore verified)
- ✓Performance: ~6000 individual texts → ~24 batch calls (10× speed improvement)
- ✓
npx ts-node eval.tsruns EvalRunner end-to-end without crash - ✓AnthropicLLMJudge scores each response 0.0–1.0 with justification text
- ✓YamlDatasetLoader loads all YAML files from dataset/ directory
- ✓ReportGenerator produces JSON report with per-case pass/fail; 20+ cases all evaluated
- ✓3 conversation flows tested with real Sellerfy data
- ✓At least 1 READ tool call and 1 proactive suggestion verified in test
- ✓Issues found documented as Linear tasks via Beautonomous
- ✓0 unhandled exceptions during test session
- ✓All ~150 tasks from backlog created in Linear with correct cycle assignment
- ✓Each task has L/M/S label, Track-{engineer} label, and dependencies linked
- ✓4 engineers confirm task list in Linear before S1 starts
- ✓Pipeline configured: structure → lint → tests → architecture review → convention check
- ✓Gate runs automatically when PR submitted via OpenClaw
- ✓Failed step blocks PR approval; result posted as comment on PR
Sprints 5-6 — WRITE Tools + Billing + Enrichment — T3.1–T3.44 (41 tasks)Sprints 5-6 — WRITE Tools + Billing + Enrichment — T3.1–T3.44 (41 tareas)
Mateo — #3 Tools · #6 Proactive · #7 Guard · #11 Enrichment · #2 Orch · #9 Cerebro KB — 14 tasks
- ✓Each handler calls correct Fast Data Layer endpoint:
GET /data/{user_id}/fast/{tool} - ✓Zod schema validates FDL API response before returning ToolResult
- ✓p95 latency < 500ms for all 10 handlers (verified in load test)
- ✓FDL unavailable → fallback to direct Marketplace Provider call (fallback tested)
- ✓WRITE tool triggered → execution paused; tool_result not yet sent to LLM
- ✓Diff shown to user:
{ before: {...}, after: {...} } - ✓Aceptar → execution resumes and completes; Rechazar → cancelled, no marketplace change
- ✓Timeout 35min: OrchestrationSession TTL expires in DynamoDB
- ✓Concurrent confirmations tracked independently by sessionId
- ✓update_product_content: GCS snapshot stored before write; content updated in MeLi
- ✓update_price: irreversible flag → confirmation always required even if user said ‘just do it’
- ✓pause_product / activate_product: listing status changes verified via get_product after write
- ✓All 4: verify() confirms change applied; ActionLog entry written
- ✓after_tool hook invokes service after every tool execution
- ✓LLM evaluates result and returns
{ hasSuggestion: bool, message, priority } - ✓Max 2 suggestions per turn enforced (3rd suggestion dropped)
- ✓Dedup: same suggestion not shown if offered in last 7 days (UserProfile.recentSuggestions checked)
- ✓No hardcoded if/else rules — pure LLM evaluation with no hand-crafted triggers
- ✓“Ignore previous instructions” → detected as injection, guard error returned
- ✓Off-scope query (weather, sports) → polite refusal, no tool call
- ✓Guard exception → request passes through with warning logged (degradation graceful)
- ✓InputGuard injected via DI; not hardcoded in AgentLoopOrchestrator
- ✓Before each tool:
POST /internal/gatecalled with toolCategory and sellerId - ✓READ=1 credit, ANALYSIS=2 credits, WRITE=3 credits deducted per execution
- ✓/internal/gate returns 503 → fail-open: tool executes anyway, event logged
- ✓Seller with 0 credits → gate returns 402, tool blocked with ‘insufficient credits’ message
- ✓
npm run buildpasses with 0 errors - ✓IEnrichmentService, IMarketIntelligenceAdapter, IContentAnalysisAdapter, IEnrichmentCache in domain/
- ✓EnrichmentContainer.wire() resolves all dependencies without circular errors
- ✓All interfaces compile with correct method signatures
- ✓
search_market_products(‘auriculares bluetooth’)returns list of competitor products - ✓
get_competitor_product(‘MLB-xxx’)returns product with price, title, photos - ✓
get_market_pricing(‘MLB-xxx’)returns PriceDistribution: min, max, median, avg - ✓No seller credentials required (MeLi public API only)
- ✓
analyze_product_image(url)returns ImageAnalysisResult with quality score and improvement suggestions - ✓
analyze_product_video(url)returns VideoAnalysisResult - ✓
enhance_product_image(url)→ throwsNotImplementedErrorwith descriptive message (MVP scope)
- ✓Cache TTL: pricing=30min, competitor=1h, search=15min, image=24h
- ✓Second call with same args → Redis cache hit; no adapter called
- ✓Adapter throws exception → EnrichmentResult returned with error field (not re-thrown to caller)
- ✓Redis unavailable → service falls back to direct adapter call
- ✓
cdk deploy enrichment-stackcompletes without errors - ✓Lambda function
enrichment-apicreated and reachable - ✓
GET /healthon API Gateway returns 200 - ✓ElastiCache Redis cluster reachable from Lambda (VPC subnet verified)
- ✓All 8 ANALYSIS handlers registered in ToolRegistry with creditCost=2
- ✓search_market_products handler calls IEnrichmentService and returns correct ToolResult shape
- ✓enhance_product_image handler returns graceful NotImplemented ToolResult (isError=false, result has message)
- ✓LLM says “comisión del 10%” but tool_result has 15% → mismatch logged with claim, actual, toolName, conversationId
- ✓Response is NOT blocked (Phase 1 = log only)
- ✓Checker exception → response passes through unchanged
Andrés — #10 Data Sync · #12 Marketplace Provider · #14 DevOps — 6 tasks
- ✓All 11 endpoints respond to
GET /data/{user_id}/fast/{tool} - ✓Response time p95 < 500ms (verified in k6/Locust test)
- ✓Reads GCS Parquet directly via pyarrow (no Redis intermediary)
- ✓Invalid user_id → 404; unknown tool → 422
- ✓Before WRITE: snapshot stored at
gs://shopilot-snapshots/{user_id}/{tool}/{ts}.parquet - ✓
GET /data/{user_id}/snapshot/{tool}/{ts}returns the pre-write state correctly - ✓snapshot_cleanup_dag runs daily and deletes snapshots older than 7 days
- ✓Amazon DAG runs with status=Success in Airflow (or uses test fixture if E1 pending)
- ✓AmazonExtractor fetches data from SP-API; AmazonLoader writes to GCS Bronze
- ✓Bronze schema for Amazon matches expected columns
- ✓MeLi and Shopify Bronze schemas still validate correctly (no regressions)
- ✓MeLi: 1501st request in 1-min window → 429 with retry-after header
- ✓Amazon: burst limit enforced; restore rate configured per API family
- ✓Shopify: leaky bucket cost-points respected
- ✓Redis counter incremented per request; verified via unit test with mock Redis
- ✓Seller connects first marketplace →
onboarding_initial_syncDAG triggered in Airflow - ✓DAG runs successfully with seller’s credentials
- ✓Seller’s data visible in Fast Data Layer within 5 minutes of connection
- ✓All 11 repos have GitHub Actions workflow file
- ✓PR to any repo → CI runs lint + test + build
- ✓Merge to main → staging auto-deploy triggered (confirmed in deploy logs)
- ✓Secrets stored in GitHub Org Secrets (no .env files committed)
Sergio — #1 Native Shell · #13 Billing — 9 tasks
- ✓Plan name (Free/Pro) and credits remaining displayed correctly from API
- ✓Usage stats (tools called today, this week) visible
- ✓‘Upgrade’ button opens Stripe Checkout in system browser (not in Electron WebContentsView)
- ✓Low credits alert (<10 credits) shows banner in sidebar
- ✓WRITE tool triggers confirmation modal automatically (no manual trigger needed)
- ✓Modal shows diff: current value in red, new value in green
- ✓Aceptar → sends confirmation to backend; Rechazar → cancels with no change
- ✓35min timer visible; reminder at 5min remaining
- ✓Suggestion card appears in sidebar after tool execution when service returns hasSuggestion=true
- ✓Click on card → new conversation opens with pre-loaded context
- ✓Tool execution shows spinner with text ‘ejecutando get_product...’
- ✓Spinner dismisses when tool_result event received
- ✓Connected marketplaces listed with status (e.g., ‘MeLi ✓’, ‘Amazon ×’)
- ✓Total credits used and tools called visible
- ✓Language setting (EN/ES) toggle persists after app restart
- ✓Default marketplace selection persists across sessions
- ✓‘Upgrade’ click → Stripe Checkout opens in system browser for Pro $49/mo
- ✓After payment: webhook
checkout.session.completedreceived → 500 credits granted - ✓Customer Portal: seller can cancel subscription without contacting support
- ✓Webhook
customer.subscription.deletedtriggers cancellation in backend
- ✓
POST /internal/gatewith READ tool → deducts exactly 1 credit (DynamoDB conditional write) - ✓Race condition test: 10 concurrent gate calls → exactly 10 credits deducted (no double-spend)
- ✓Seller with 0 credits → gate returns 402; tool blocked with ‘insufficient credits’ message
- ✓Billing service down → fail-open: tool executes anyway (logged)
- ✓Migration script runs idempotently (run twice → same result, no errors)
- ✓clients table has Stripe fields: stripe_customer_id, stripe_subscription_id, plan
- ✓credit_packs, subscription_events, credit_transactions tables exist with correct schema
- ✓Migration verified in staging database before production
- ✓activate(): plan updated, credits granted, event logged in subscription_events
- ✓cancel(): grace_period_end = now+7d set; plan stays active until grace expires
- ✓
invoice.payment_failedwebhook → seller notification sent - ✓All events logged in subscription_events table with timestamp and type
- ✓EventBridge rule fires on 1st of each month (CloudWatch confirms execution)
- ✓Plan credits reset: Free→50, Pro→500 each 1st of month
- ✓Pack credits NOT reset (pack expiry = 12 months from purchase)
- ✓credit_transactions log entry created for each reset (type=plan_reset)
- ✓
design-tokens.jsonextracted from Figma variables via MCP - ✓Style Dictionary configured, generates CSS
:rootvariables +tailwind.config.ts - ✓Tokens integrated in
core-product-desktop-client— zero hardcoded values in config
- ✓CreditEconomy component renders correctly with real billing dataComponente CreditEconomy renderiza correctamente con datos reales de billing
- ✓Billing view matches Figma design specificationsVista de billing coincide con especificaciones de diseño Figma
- ✓Profile view components render correctly with real user dataComponentes de vista de perfil renderizan correctamente con datos reales de usuario
- ✓Profile layout matches Figma specificationsLayout de perfil coincide con especificaciones Figma
- ✓ConfirmDialog organism renders correctly in WRITE confirmation flowOrganismo ConfirmDialog renderiza correctamente en flujo de confirmación WRITE
- ✓Diff visualization (current/new values) matches Figma designVisualización de diff (valores actual/nuevo) coincide con diseño Figma
UX/UI — #18 Design System — 1 task
- ✓Pattern library published complete in [LIB] Pattern ComponentsLibrería de patterns publicada completa en [LIB] Pattern Components
- ✓All frames ready for developmentTodos los frames listos para desarrollo
- ✓Pablo approves deliveryPablo aprueba entrega
Pablo — #16 Eval Suite — 8 tasks
- ✓
SELECT COUNT(*) FROM embeddings WHERE is_current=true≥ 15 in BigQuery - ✓Top-5 semantic search for ‘comisión Mercado Libre’ returns relevant chunks
- ✓Go pipeline completes without errors; 5 test queries all return meaningful results (human-reviewed)
- ✓
eval-on-pr.ymlworkflow exists in core-intelligence-conversation-api repo - ✓PR → CI evaluates Coach on staging; PR blocked with auto-comment if score < 0.70
- ✓Eval completes in < 10 minutes for 20–30 cases
- ✓Baseline updated on merge to main
- ✓50+ YAML cases in dataset/ directory
- ✓Distribution: 15 product, 10 pricing, 8 WRITE confirm, 7 proactive, 10 edge cases
- ✓Edge cases include: injection attempt, off-scope query, empty tool result, ambiguous intent
- ✓Each case has min_judge_score defined
- ✓End-to-end test with real MeLi seller data: query → tool call → response verified
- ✓End-to-end test with real Shopify data: same flow verified
- ✓All issues found documented as Linear tasks via Beautonomous
- ✓
EvalConfig.pipelineTypeacceptsdesktop_buildandfigma_qualityvalues without errorEvalConfig.pipelineTypeacepta valoresdesktop_buildyfigma_qualitysin error - ✓CLI
eval run --pipeline figma_qualityexecutes without crash and prints resultsCLIeval run --pipeline figma_qualityejecuta sin error e imprime resultados - ✓Existing LLM Judge pipeline unaffected (regression test passes)Pipeline LLM Judge existente sin regresión (test pasa)
- ✓All 4 Figma REST endpoints return parsed responses for a real Figma fileLos 4 endpoints REST de Figma devuelven respuestas parseadas para un archivo real
- ✓Figma API token injected via env var
FIGMA_TOKEN; missing token raises clear errorToken Figma inyectado viaFIGMA_TOKEN; token faltante genera error claro - ✓Rate limit exceeded → client retries with exponential backoff (max 3 attempts)Rate limit excedido → cliente reintenta con backoff exponencial (máx 3 intentos)
- ✓Runner executes against a Figma file and returns a structured
QualityReportRunner ejecuta contra un archivo Figma y devuelve unQualityReportestructurado - ✓Variable checks: token usage, missing bindings, and orphan variables detected correctlyChecks de variables: uso de tokens, bindings faltantes y variables huérfanas detectados
- ✓Report includes per-check pass/fail status and violation countReporte incluye estado pass/fail por check y conteo de violaciones
- ✓All 5 component checks fire and report violations for a test Figma file with known issuesLos 5 checks de componentes ejecutan y reportan violaciones en un archivo Figma de prueba
- ✓Clean Figma file returns 0 violations for all 5 checksArchivo Figma limpio devuelve 0 violaciones en los 5 checks
- ✓Each violation includes component name, frame path, and remediation hintCada violación incluye nombre de componente, ruta de frame y pista de remediación
- ✓WCAG contrast check flags text layers with contrast ratio < 4.5:1 (AA)Check WCAG contraste detecta capas de texto con ratio < 4.5:1 (AA)
- ✓Descriptions check flags components missing description fieldsCheck de descriptions detecta componentes sin campo de descripción
- ✓Final report is valid JSON with
summary.scorefield (0.0–1.0); CI fails if score < 0.80Reporte final es JSON válido con camposummary.score(0.0–1.0); CI falla si score < 0.80
Sprints 7-8 — Hardening + Staging — T4.1–T4.29 (38 tasks)Sprints 7-8 — Hardening + Staging — T4.1–T4.29 (38 tareas)
Mateo — #2 Orchestrator · #4 Personality · #7 Guard · #3 Tools · #9 Cerebro KB — 8 tasks
- ✓Client connects to
wss://api-staging.shopilot.ai/wssuccessfully - ✓Server emits all 8 events: thinking, tool_start, tool_result, text_delta, suggestion, confirmation_required, error, done
- ✓text_delta events render progressively in UI (words appear as they arrive)
- ✓4 client→server events work: send_message, confirm_action, reject_action, cancel
- ✓Session restores after disconnect: in-progress conversation recoverable
- ✓writeCapable=true → WRITE guardrail block injected into system prompt
- ✓writeCapable=false → WRITE block absent from prompt
- ✓Total system prompt ≤ 1200 tokens (verified by estimatedTokens field)
- ✓WRITE guardrail text includes irreversibility warning
- ✓Response containing another user’s data → alert logged and response sanitized
- ✓Response with dangerous instructions → blocked and replaced with safety message
- ✓Guard exception → response passes through with error logged (not swallowed silently)
- ✓Data leak detection → critical alert sent to #engineering Slack
- ✓At least update_product_images and update_stock handlers implemented and working
- ✓Circuit breaker applied: remaining deferred WRITE tools tracked in Linear as S12 scope
- ✓Implemented tools pass confirmation flow end-to-end
- ✓S12 deferred tools have Linear tasks with ‘circuit-breaker’ label
- ✓p95 response time ≤ 3000ms measured in Artillery test (excluding LLM time)
- ✓Context compaction reduces token count by >20% for conversations > 30 messages
- ✓Prompt cache hit rate ≥ 80% (Anthropic API metrics)
- ✓At least 1 bottleneck identified, fixed, and documented in Linear
- ✓after_tool hook fires for every successful WRITE execution
- ✓
POST /feedback/capturecalled with sellerId, toolName, conversationId, actionId - ✓Fire-and-forget: hook does not block tool response (async, not awaited in critical path)
- ✓Failed POST logged but does not affect tool result or user experience
- ✓
DynamoActionLogRepository.save(actionLog)writes to DynamoDB correctly - ✓pk =
User#{userId}, sk =Action#{ULID}— format verified in table - ✓
findByConversation(convId)uses GSI1Conv#{convId}and returns correct entries - ✓ActionLog created for every WRITE tool executed (verified in integration test)
Andrés — #14 DevOps · #10 Data Sync — 5 tasks
- ✓CDK deploy completes: Lambda, API GW v2, DynamoDB, ElastiCache Redis, RDS PostgreSQL, Secrets Manager
- ✓Terraform apply completes: Cloud Run (Data API), BigQuery, GCS, Airflow on GCP
- ✓
GET https://api-staging.shopilot.ai/health→ 200 OK - ✓All health checks green in monitoring dashboard
- ✓Artillery/k6 simulates 50 concurrent users for 10 minutes
- ✓p95 API response time < 2000ms (excluding LLM latency, measured separately)
- ✓0 5xx errors under sustained load
- ✓Bottleneck report identifies which component is limiting (Redis / DynamoDB / API GW)
- ✓Dashboard shows: API latency p50/p95, error rate %, LLM cost/conversation, tool execution count, credits deducted
- ✓PagerDuty alert fires when p95 > 2s (tested with synthetic traffic spike)
- ✓Slack alert fires in #deploys when daily LLM cost > $50
- ✓All alerts have runbook links attached
- ✓SilverNormalizer transforms Bronze → Silver for at least MeLi dataset
- ✓transform_to_silver_dag runs without errors in Airflow
- ✓If circuit breaker applied: Gold + Brand Health deferred to S12 with Linear tasks created
- ✓IBrandHealthCalculator interface defined even if implementation deferred
- ✓
cdk deploycreates WebSocket API with $connect/$disconnect/$default routes - ✓DynamoDB
connection-idstable populated on $connect, cleaned on $disconnect - ✓Lambda authorizer validates JWT before allowing connection
- ✓IAM policies grant Lambda correct permissions to post to connections
Sergio — #1 Native Shell · #15 Feedback Loop · #13 Billing — 11 tasks
- ✓text_delta events render text progressively in chat bubble (words appear as streamed)
- ✓tool_start event shows spinner with tool name
- ✓suggestion event renders suggestion card in sidebar
- ✓confirmation_required event triggers WRITE confirmation modal
- ✓Reconnection attempt with backoff on WebSocket disconnect
- ✓BrowserWindow opens for OAuth redirect URL (not WebContentsView)
- ✓After OAuth: auth code extracted from redirect URL
- ✓Auth code sent to backend via IPC; window closes automatically on success
- ✓WebContentsView is never navigated to OAuth URL during the flow
- ✓Step 5: Success screen shown; skip available from step 3 onwards
- ✓Main process crash → Sentry event created with full stack trace
- ✓Renderer crash → Sentry event created with source map resolved (not minified)
- ✓Source maps uploaded during build (not committed to repo)
- ✓Crash dialog with ‘Send Report’ button appears on unhandled error
- ✓
npm run buildpasses with 0 TypeScript errors - ✓IFeedbackRepository, IFeedbackGate, IDataSyncClient interfaces exist in domain/
- ✓FeedbackEntry, ExplicitFeedbackEntry, ImplicitFeedbackEntry models compile
- ✓CDK stack skeleton created (Lambdas and tables defined, not yet deployed)
- ✓
calculateImpactScore({ sales:1, conversion:1, visits:1, position:1 })returns correct value using formula (sales×0.4 + conversion×0.3 + visits×0.2 + position×−0.1) - ✓
DynamoFeedbackRepository.save(entry)writes to DynamoDB correctly - ✓
findPendingEntries()queries GSI1 status=pending and returns correct entries - ✓
findByUser(userId)returns all entries for user across feedback types
- ✓
processPendingEntries()finds entries > 7 days old and measures impact score - ✓DataSyncClient fetches current metrics from Fast Data Layer endpoint
- ✓EventBridge triggers FeedbackMeasurer Lambda every 6h (CloudWatch logs confirm)
- ✓GET /feedback/{userId}/summary returns aggregated score; /history returns paginated entries
- ✓GET /feedback/{userId}/should-prompt → false if user had explicit feedback prompt today
- ✓Returns false if session has < 3 interactions
- ✓24h cooldown enforced after explicit feedback given
- ✓Returns true correctly for eligible users (no cooldown, 3+ interactions)
- ✓POST /feedback/{userId}/explicit with
{ rating: 5, conversationId }→ 201 Created - ✓Rating validation: 1–5 accepted; 0 or 6 → 422 Unprocessable
- ✓ExplicitFeedbackEntry stored in DynamoDB with timestamp and conversationId
- ✓POST /feedback/{userId}/implicit with
{ action: ‘accepted’, toolName }→ 201 Created - ✓action values accepted: accepted, rejected, edited; other values → 422
- ✓When action=edited: originalValue and editedValue both stored
- ✓ImplicitFeedbackEntry linked to conversationId correctly
- ✓
customer.subscription.deletedwebhook → grace_period_end = now + 7 days - ✓Seller retains Pro access and Pro credit limits during 7-day grace window
- ✓Cron job (daily): finds sellers with grace_period_end expired → downgrades to Free
- ✓grace_period_end field visible in backend admin view
- ✓Full feature build: all S1-8 features including WS streaming, WRITE confirmation, billingBuild completo: todas las funciones S1-8 incluyendo streaming WS, confirmación WRITE, billing
- ✓.dmg notarized + .exe signed, zero security warnings on clean install.dmg notarizado + .exe firmado, cero advertencias de seguridad en instalación limpia
- ✓Team smoke test passes: onboarding → chat → tool execution → billing viewSmoke test del equipo pasa: onboarding → chat → ejecución de tool → vista billing
- ✓EnrollmentCard organism renders correctly in EnrollmentViewOrganismo EnrollmentCard renderiza correctamente en EnrollmentView
- ✓OAuth redirect flow works with integrated componentsFlujo de redirect OAuth funciona con componentes integrados
- ✓ReActStream, ConfirmDialog, ToolAccordion render correctly in real WebSocket flowReActStream, ConfirmDialog, ToolAccordion renderizan correctamente en flujo WebSocket real
- ✓Complete WRITE flow works end-to-end with Figma design system componentsFlujo WRITE completo funciona end-to-end con componentes del design system Figma
UX/UI — #18 Design System — 1 task
- ✓Figma QA checklist passed: AutoLayout, variables, naming, states verifiedChecklist QA de Figma aprobado: AutoLayout, variables, naming, estados verificados
- ✓All frames marked ‘Ready for development’ in FigmaTodos los frames marcados ‘Ready for development’ en Figma
- ✓Pablo approves deliveryPablo aprueba entrega
Pablo — #16 Eval Suite · #17 Beautonomous — 10 tasks
- ✓Vertex AI batch calls: 250 texts per call verified in API logs
- ✓If pipeline duration > 5min: incremental mode activates automatically
- ✓Semantic search on 20 eval queries: expected chunk in top-5 results for >80%
- ✓GitHub Action
eval-automated.ymltriggers on every push to main - ✓50 golden cases evaluated per run; JSON report generated
- ✓CI fails if overall score < 0.70 OR any critical-tagged case fails
- ✓Eval results posted to #engineering Slack channel via Beautonomous
- ✓After get_product with degraded listing: ProactiveSuggestionService generates appropriate suggestion
- ✓After get_product with healthy listing: no suggestion generated
- ✓Dedup verified: same suggestion not offered twice within 7 days in test session
- ✓Prompt iteration: at least 1 improvement made based on test results
- ✓10–15 Sellerfy sellers identified (mix: small/medium/large by GMV)
- ✓2-minute video walkthrough recorded and shareable
- ✓Setup doc prepared: download link, steps, support contact
- ✓Feedback form created; at least 1 confirmed 1-on-1 call scheduled
- ✓Consumer-driven contract tests exist for ToolRegistry → Data Sync integration
- ✓ToolRegistry → Marketplace Provider and ToolRegistry → Enrichment contracts verified
- ✓Contract failure in any integration → CI blocked
- ✓precision@5 and hit rate computed for 20 eval queries with expected chunks
- ✓Hit rate > 80% → CI passes; ≤ 80% → CI fails with failing queries listed
- ✓Metric report posted as PR comment
- ✓Apple cert + provisioning profile stored as GitHub Actions encrypted secretsCertificado Apple + provisioning profile almacenados como secrets cifrados en GitHub Actions
- ✓Windows code signing cert stored as GitHub Actions encrypted secretCertificado de firma Windows almacenado como secret cifrado en GitHub Actions
- ✓Secrets documented in team runbook; no plaintext credentials in codebaseSecrets documentados en runbook del equipo; sin credenciales en texto plano en el código
- ✓Runner builds macOS .app and Windows .exe artifacts without errorRunner compila artefactos macOS .app y Windows .exe sin error
- ✓
sign_valid: macOScodesign --verifyand Windowssigntool verifyboth passsign_valid: macOScodesign --verifyy Windowssigntool verifyambos pasan - ✓
launch_ok: app launches headlessly and exits with code 0 in CI environmentlaunch_ok: app lanza en modo headless y sale con código 0 en entorno CI - ✓Structured
BuildReportreturned with per-check pass/fail and artifact pathsBuildReportestructurado devuelto con pass/fail por check y rutas de artefactos
- ✓
bundle_size: .app ≤ 200 MB; check fails with clear message if exceededbundle_size: .app ≤ 200 MB; check falla con mensaje claro si se excede - ✓
update_channel: auto-update endpoint reachable and returns valid version manifestupdate_channel: endpoint de auto-update alcanzable y devuelve manifiesto de versión válido - ✓macOS notarization check passes (Apple notary service confirms ticket)Check de notarización macOS pasa (servicio notary de Apple confirma ticket)
- ✓
desktop-build-eval.ymlworkflow exists in core-product-desktop-client repoWorkflowdesktop-build-eval.ymlexiste en repo core-product-desktop-client - ✓PR → CI runs DesktopBuildRunner; PR blocked if any core check failsPR → CI ejecuta DesktopBuildRunner; PR bloqueado si algún check core falla
- ✓macOS (macos-latest) and Windows (windows-latest) runners both complete successfullyRunners macOS (macos-latest) y Windows (windows-latest) completan exitosamente
- ✓
figma-quality-eval.ymlworkflow exists in core-product-design-system repoWorkflowfigma-quality-eval.ymlexiste en repo core-product-design-system - ✓Workflow triggers on schedule (daily) and on manual dispatchWorkflow se dispara en schedule (diario) y en dispatch manual
- ✓Quality report posted as workflow summary; Slack notification sent on score < 0.80Reporte de calidad publicado como resumen del workflow; notificación Slack enviada si score < 0.80
Sprints 9-10 — Launch — T5.1–T5.MK1 (18 tasks)Sprints 9-10 — Launch — T5.1–T5.MK1 (18 tareas)
Mateo — #7 Guard · #2 Orchestrator · #4 Personality — 3 tasks
- ✓Haiku classifier identifies injection attempt that passes regex pattern matching
- ✓Off-scope query (e.g., weather forecast) → Haiku returns out-of-scope classification
- ✓Haiku API error → input passes through with warning logged (fail-open)
- ✓Adds < 500ms p95 latency to request processing
- ✓All P1/P2 bugs from beta report closed in Linear
- ✓Empty tool result → LLM receives ‘no data available’ message, no crash
- ✓LLM refuses tool → AgentLoop re-prompts once then provides partial answer
- ✓Concurrent WRITE: second WRITE blocked until first confirmation resolved
- ✓Token expired mid-conversation → ITokenManager auto-refreshes; conversation continues
- ✓Tone issues from beta feedback addressed (verified by Pablo review + re-eval)
- ✓Tool selection pattern issues fixed (tested with 5 edge case prompts from beta)
- ✓Eval score with v3 prompt ≥ v2 score
- ✓v3 deployed to staging and tested by at least 2 beta users
Andrés — #14 DevOps · #10 Data Sync — 4 tasks
- ✓
cdk deploy --allsucceeds on production AWS account - ✓
terraform applysucceeds on production GCP project - ✓GET https://api.shopilot.ai/health → 200 OK
- ✓SSL certificate valid; HTTP → HTTPS redirect active
- ✓DynamoDB PITR enabled with 35-day recovery window
- ✓Lambda concurrency limits configured per function
- ✓IAM roles follow least-privilege principle (reviewed by Mateo)
- ✓Redis backup enabled; PostgreSQL daily backup confirmed in RDS console
- ✓Lambda version rollback via
aws lambda update-aliascompletes in < 60s - ✓Cloud Run revision rollback via
gcloud run services update-trafficcompletes in < 60s - ✓Runbook documented with step-by-step commands; tested successfully in staging
- ✓Amazon and Fast Data FQNs visible in OpenMetadata UI
- ✓embed_fast_dag runs without errors (Bronze → Cerebro KB embedding)
- ✓embed_health_dag runs without errors (Gold → KB embedding)
- ✓Data lineage visible: source → transformation → KB in OpenMetadata
Sergio — #1 Native Shell · #13 Billing — 4 tasks
- ✓Apple Developer certificate installed and valid
- ✓electron-builder produces signed .dmg; notarization via notarytool passes
- ✓App installs and runs on clean Mac without developer tools installed
- ✓Auto-updater fetches update from S3 releases.shopilot.ai bucket and applies successfully
- ✓CSP headers verified in devtools (no unsafe-inline, no unsafe-eval)
- ✓
nodeIntegration: falseandsandbox: truein all BrowserWindow configs - ✓
webSecurity: trueenforced (no local CORS bypass) - ✓Opt-out telemetry: settings toggle works and persists after restart
- ✓All P1/P2 UI bugs from beta report closed in Linear
- ✓RAM usage < 500MB after 30 minutes of active use (Chrome DevTools Memory profiler)
- ✓All animations complete in < 300ms (no visible jank)
- ✓Loading states visible for all async operations (> 200ms)
- ✓Stripe environment switched from test to live keys (live keys in Secrets Manager)
- ✓End-to-end: real card checkout → credits granted → tool executes successfully
- ✓Webhook signature verification passes on live endpoint
- ✓SSL certificate valid on billing endpoint
- ✓Dashboard view integrates final post-audit design system componentsVista dashboard integra componentes finales del design system post-auditoría
- ✓All views use consistent Figma-derived design tokensTodas las vistas usan tokens de diseño derivados de Figma de forma consistente
Pablo — #17 Beautonomous · #16 Eval Suite — 6 tasks
- ✓10+ sellers download and install .dmg successfully
- ✓Each seller connects at least 1 marketplace via OAuth
- ✓Each seller executes at least 1 tool in first session (activation metric)
- ✓All sellers attended 1-on-1 30min call
- ✓15min structured call completed with each beta user; notes documented
- ✓Top 5 issues identified; Linear tasks created via Beautonomous
- ✓At least 3 sellers provide NPS score
- ✓Injection: all user inputs sanitized (manual test + linting pass)
- ✓Auth: JWT validation present on all protected endpoints
- ✓Data exposure: seller data not visible to other sellers (cross-account test)
- ✓All P1 OWASP findings fixed before Go/No-Go
- ✓Prompt updated to reflect 10 weeks of real usage patterns
- ✓Governance rules updated for new workflows discovered during sprint
- ✓Technical docs (dev plans) indexed in OpenClaw KB
- ✓Test: ask Beautonomous about Shopilot architecture → returns accurate, up-to-date answer
- ✓60-minute sync call with all 4 engineers completed
- ✓All checklist items green: tools responding, Stripe live, 10+ beta, .dmg signed, OWASP P1s resolved, p95 < 3s, LLM cost guard active, eval ≥ 0.70
- ✓Pablo explicitly signs off with ‘Go’ in #engineering Slack
- ✓Any ‘No’ item → owner and ETA documented in Linear before re-vote
- ✓Pipeline executes 10+ end-to-end scenarios (distinct from LLM Judge quality eval)
- ✓Each scenario: user query → tool selected → tool executes → response generated
- ✓Scenarios cover: READ, WRITE (with confirmation), ANALYSIS, proactive, off-scope rejection
- ✓All scenarios pass before Go/No-Go
9.9 Daily Execution Blueprint — Day-by-Day Schedule per Engineer Blueprint Ejecución Diaria — Cronograma Día a Día por Ingeniero
6 sprint cycles × 10 days (Pre-Sprint = 5 days). Every task from 70-EXEC-BACKLOG-CORREGIDO v2.0. Color = engineer lane. 6 ciclos × 10 días (Pre-Sprint = 5 días). Cada tarea de 70-EXEC-BACKLOG-CORREGIDO v2.0. Color = lane del ingeniero.
Pre-Sprint — W0 (5 days) — CORE SetupPre-Sprint — W0 (5 días) — Setup CORE Gate: Beautonomous operational. All 4 members validated.Gate: Beautonomous operacional. 4 miembros validados.
| Día | Mateo | Andrés | Sergio | Pablo |
|---|---|---|---|---|
| D1 | T0.2OAuth GitHub (30m) T0.4OAuth Slack (30m) | — | — | T0.1Crear Proyecto OpenClaw (30m) T0.3OAuth Linear (30m) |
| D2 | — | — | — | T0.5System Prompt v1 Beautonomous (4h — 1/2) |
| D3 | — | — | — | T0.5System Prompt v1 (2/2) T0.6Mapeo de Roles (1h) T0.7Estructura Linear: 17 proy. + 6 ciclos (2h) |
| D4 | T0.8Validación CORE (1h) | T0.8Validación CORE (1h) | T0.8Validación CORE (1h) | T0.8Validación CORE — Beautonomous operacional |
| D5 | — | — | — | — |
| ★ Beautonomous operational. Linear tracking starts. All 17 projects bootstrapped.Beautonomous operacional. Tracking Linear iniciado. 17 proyectos bootstrap. | ||||
Sprint 1-2 — W1-W2 (10 days) — Walking SkeletonSprint 1-2 — W1-W2 (10 días) — Walking Skeleton Gate: AgentLoop unblocked (T1.6 starts W3). Electron loads marketplace. KB indexed.Gate: AgentLoop desbloqueado (T1.6 inicia W3). Electron carga marketplace. KB indexada.
| Día | Mateo | Andrés | Sergio | Pablo |
|---|---|---|---|---|
| D1 | T1.4ILLMClient: chat() acepta toolDefs + thinkingBudget (1/2) | T1.9Scaffold Marketplace Provider: DDD + DI + VOs + Errors | T1.16Scaffold Electron 28+ + builder + hot reload + preload | T1.21KB Fix duplicados: TRUNCATE, embedded_at, CI Go 1.24 (1/2) |
| D2 | T1.4ILLMClient update (2/2) | T1.10IMarketplaceAdapter 23 métodos + ISKUResolver (4h) T1.15cIOAuth2Flow interface — port genérico OAuth2 (4h) | T1.17MainWindow + WebContentsView — NO BrowserView (1/2) | T1.21KB Fix duplicados (2/2) |
| D3 | T1.1DynamoDB: UUID→ULID, Trace SK, GSI fix, dead code (1/3) | T1.11AES256GCM + ITokenManager + DDB marketplace-credentials (1/2) | T1.17MainWindow + WebContentsView (2/2) | T1.22KB Contextual Retrieval: prefijo + chunking Markdown (1/2) |
| D4 | T1.1DynamoDB fix (2/3) | T1.11AES256GCM + ITokenManager (2/2) | T1.20Auth Memberstack: JWT en electron-store cifrado OS | T1.22KB Contextual Retrieval (2/2) |
| D5 | T1.1DynamoDB fix (3/3) — CDK Stack + KeyBuilders + tests | T1.15aSellerConnection aggregate: state machine 5 estados (1d) T1.15bMarketplaceAction entity + IRepository (4h) | T1.19Tabs + Sidebar container: IPC + Toggle Cmd+B (1/3, dep T0.BB tokens) | T1.24Eval Fase 0: interfaces + modelos + golden 15-20 YAML (1/3) |
| D6 | T1.2UserProfile entity + IUserProfileRepository + DDB impl | T1.14Verificar Terraform GCP: GCS, Cloud Run, Airflow, BigQuery T1.15Solicitar deps externas E1-E5: Amazon SP-API, MeLi, Shopify (4h) | T1.19Tabs + Sidebar container (2/3) | T1.24Eval Fase 0 (2/3) |
| D7 | T1.5SystemPromptComposer L1+L2: cache_control ephemeral (1/2) | T1.12MeLiOAuth2Flow + MeLiAdapter: REST API + errores (1/3) | T1.18MarketplaceDetector: URL patterns MeLi/Amazon/Shopify T1.19Sidebar token setup T0.BB (3/3 — 0.5d) | T1.24Eval Fase 0 (3/3) — 15 golden cases YAML |
| D8 | T1.5SystemPromptComposer L1+L2 (2/2) | T1.12MeLiAdapter (2/3) | T1.MK1Mockup shell container (0.5d) | T1.2510 READ tool specs: name, desc LLM, schema JSON, risk, credits (1/2) |
| D9 | T1.3Historial en prompt: findWindowForPrompt, budget 200K (1/2) | T1.12MeLiOAuth2Flow + MeLiAdapter (3/3) — reutiliza context/ | — | T1.2510 READ tool specs (2/2) |
| D10 | T1.3Historial en prompt (2/2) | T1.13Amazon LWA scaffold: SP-API SDK + rate limit (1/2, dep E1) | — | T1.23Contenido KB: 15-20 docs MeLi/Amazon/Shopify (inicia, cont. W3) |
| ★ S1-2: T1.3+T1.4+T1.5 done → T1.6 unblocked for W3. MeLiAdapter live. KB populated.S1-2: T1.3+T1.4+T1.5 listos → T1.6 desbloqueado para W3. MeLiAdapter vivo. | ||||
Sprint 3-4 — W3-W4 (10 days) — Core EnginesSprint 3-4 — W3-W4 (10 días) — Motores Core Gate: ToolRegistry + 10 READ stubs. Shell chat basic. KB in BigQuery. Eval runs 15 cases.Gate: ToolRegistry + 10 READ stubs. Shell chat. KB en BigQuery. Eval ejecuta 15 casos.
| Día | Mateo | Andrés | Sergio | Pablo |
|---|---|---|---|---|
| D1 | T1.6AgentLoopOrchestrator: ReAct loop, MAX_ROUNDS=10, cost guard (1/3) | T1.13Amazon LWA scaffold (2/2) — full SP-API rate limit families | T2.17Chat UI: burbujas + markdown + indicadores thinking/tool (1/3, +T1.BB integration) | T1.23Contenido KB 15-20 docs (1/4 — cont. desde S1-2) |
| D2 | T1.6AgentLoop (2/3) | T2.10ShopifyOAuth2Flow + ShopifyAdapter: GraphQL Admin API (1/3) | T2.17Chat UI (2/3) | T1.23Contenido KB (2/4) |
| D3 | T1.6AgentLoop (3/3) — retry + is_error tool_result T1.7RestResponseEventEmitter: modo REST sin streaming (4h) | T2.10ShopifyAdapter (2/3) | T2.18CoachWebSocketService: WS client main process + backoff (dep T1.7) | T1.23Contenido KB (3/4) |
| D4 | T1.8Verificar Observability con ReAct: traces multi-step compatibles | T2.10ShopifyAdapter (3/3) — throttling Shopify cost-points | T2.19Inyección contexto URL→metadata via MarketplaceDetector | T1.23Contenido KB (4/4) — indexar pipeline |
| D5 | T2.1ToolRegistry + ToolDefinition: register/getDefinitions/getHandler (1/2) | T2.13DataSync Fase 0.5: Clean Arch API — IDataReader, VOs, DI (1/2) | T2.20Navegación: /chat /profile /billing /enrollment /onboarding | T2.22KB Procesamiento incremental: SHA-256 hash + is_current flag (1/2) |
| D6 | T2.1ToolRegistry (2/2) | T2.13DataSync Clean Arch (2/2) | T2.21OnboardingWizard 5 pasos (+T1.BB integration): Bienvenida + OAuth + Perfil + Query + Éxito (1/3) | T2.22KB incremental (2/2) |
| D7 | T2.2IToolExecutor + ToolExecutor: execute(name, args, ctx)→ToolResult | T2.11AmazonAdapter completo (si E1 ok): SP-API Reports+Catalog (1/3) | T2.17Chat UI (3/3 — T1.BB tokens 0.5d) T2.21OnboardingWizard (2/3) | T2.23KB Batch embeddings: 250 textos/llamada Vertex AI (1/2) |
| D8 | T2.3ToolPolicyFilter: risk gate + marketplace gate (1d) T2.5aToolResult domain model: immutable value (4h) T2.9Tool result caching in-memory: Map por sesión READ/ANALYSIS (4h) | T2.11AmazonAdapter (2/3) | T2.21OnboardingWizard (3/3 — T1.BB 0.5d) | T2.23KB Batch (2/2) — goroutine pool + retry 429/5xx |
| D9 | T2.4HookLifecycle: before_tool → execute → after_tool (1d) | T2.11AmazonAdapter (3/3) — backoff 5 req/s | T2.MK1Mockup ChatView (1d) | T2.24Eval Fase 1: AnthropicLLMJudge + EvalRunner + CLI (1/3) |
| D10 | T2.510 READ tool stubs: handlers HTTP mock en handlers/read/ (1/2) | T2.14DAGs existentes verificados: MeLi+Shopify @hourly + Bronze schemas | T2.MK2Mockup Onboarding (0.5d) | T2.24Eval LLM Judge (2/3) |
| ★ S3-4: ToolRegistry + 10 READ stubs. Shopify+Amazon adapters. KB BigQuery. Eval runner.S3-4: ToolRegistry + stubs. Shopify+Amazon. KB BigQuery. Eval runner ejecuta. | ||||
Sprint 5-6 — W5-W6 (10 days) — WRITE Tools + Billing + EnrichmentSprint 5-6 — W5-W6 (10 días) — Tools WRITE + Billing + Enrichment Gate: WRITE tool executes in marketplace. Billing charges credits. Enrichment returns data.Gate: WRITE tool ejecuta en marketplace. Billing cobra créditos. Enrichment retorna datos.
| Día | Mateo | Andrés | Sergio | Pablo |
|---|---|---|---|---|
| D1 | T2.510 READ stubs (2/2) — estructura handlers/read/ T2.5bupdate_user_profile SYSTEM tool: LLM invoca al detectar info vendedor (4h) | T2.12TokenRefreshCron: EventBridge 5min + mutex DDB + alerta 3 fallos (1d) T2.15CDK base AWS: DDB conv-api, Lambda+APIGW v2, VPC, DDB credentials (1/2) | T3.19BillingView: plan + créditos + stats + botones Stripe (+T2.BB integration, 1/3) | T2.24Eval LLM Judge (3/3) — 20 golden cases mínimo |
| D2 | T2.5ccontextSummary: resumen auto cuando historial supera threshold tokens (1d) T2.5d17 WRITE tool stubs: registrar en ToolRegistry con ConfirmationRequired (4h) | T2.15CDK base AWS (2/2) | T3.19BillingView (2/3) | T2.25Testing E2E Playground: flujos reales Sellerfy → issues Linear T2.26Bootstrap 147 tareas en Linear via Beautonomous (4h) |
| D3 | T2.6IContextAssembler: KB + BrandHealth RAG paralelo, degradación graceful (1/2) | T2.16GitHub Actions CI multi-repo: lint+typecheck+tests en PR (1d) T2.16amarketplace-actions DDB table en CDK: pk sellerId, sk actionId (4h) | T3.19BillingView (3/3 — T2.BB tokens 0.5d) T3.20Diálogos confirmación WRITE: diff rojo/verde + timeout 35min (dep T3.2) | T2.26aQuality gate 5-step Beautonomous: structure→lint→tests→arch→conv (1d) T3.25KB Indexación BigQuery: verificar top-5 semantic para 5 queries |
| D4 | T2.6IContextAssembler (2/2) | T2.16bAmazonAds OAuth2 dual: flujo separado de LWA (1d) T2.16cISKUResolver: MeLi/Amazon/Shopify mapping bidireccional (1d) | T3.21Cards sugerencias + progreso tools (+T2.BB 0.5d): spinner + click contextualizado (1/2) | T3.26Eval CI: eval-on-pr.yml, coach staging→LLM Judge→PR block (1/2) |
| D5 | T2.7BrandHealthContextService.getHealthSummary: siempre en system prompt (1d) T2.8Prompt caching Anthropic: SystemPromptBlock[] cache_control ephemeral (1d) | T3.13Fast Data Layer: 11 endpoints FastAPI GET /data/{uid}/fast/{tool} (1/3) | T3.21Cards sugerencias (2/2 — T2.BB integration) T3.22ProfileView: marketplaces + stats + preferencias + useProfile hook | T3.26Eval CI (2/2) — <10 min para 20-30 casos |
| D6 | T3.110 READ handlers reales: Zod→HTTP→ToolResult (1/3) | T3.13Fast Data Layer (2/3) | T3.23Stripe Checkout Pro $49/mes + Customer Portal autoservicio (1/3) | T3.27Golden dataset 50 casos: 15 producto+10 pricing+8 WRITE+17 edge (1/3) |
| D7 | T3.110 READ handlers reales (2/3) | T3.13Fast Data Layer (3/3) — pyarrow GCS Parquet <500ms | T3.23Stripe Checkout (2/3) | T3.27Golden dataset 50 casos (2/3) |
| D8 | T3.110 READ handlers reales (3/3) | T3.14GCS snapshots para ConfirmationFlow: pre-write state + cleanup DAG (1d) | T3.23Stripe Checkout (3/3) — webhook checkout.session.completed | T3.27Golden dataset (3/3) |
| D9 | T3.2ConfirmationFlow: pausar→diff→A/R→ejecutar, TTL 35min (1/2) | T3.15DAG Amazon: IExtractor+ILoader+AmazonAuthMgr+AmazonExtractor (1/3) | T3.24ICreditsGate: POST /internal/gate + conditional DDB write (1/2) T3.MK1Mockup BillingView (0.5d) | T3.28QA flujos conversación 3 marketplaces datos Sellerfy (1/2) |
| D10 | T3.2ConfirmationFlow (2/2) — OrchestrationSession DDB | T3.15DAG Amazon (2/3) | T3.24ICreditsGate (2/2) — Free 50cr, Pro 500cr, packs T3.24aBilling schema migration: idempotente + credit_transactions (1d) T3.MK2Mockup ProfileView (0.5d) T3.MK3Mockup ConfirmDialog (0.5d) | T3.28QA flujos (2/2) — issues → Linear via Beautonomous |
| ★ S5-6: WRITE handlers unblocked (T3.3 starts W7). ICreditsGate live. Fast Data 11 endpoints.S5-6: WRITE handlers desbloqueados (T3.3 inicia W7). ICreditsGate vivo. Fast Data OK. | ||||
Sprint 7-8 — W7-W8 (10 days) — Hardening + StagingSprint 7-8 — W7-W8 (10 días) — Hardening + Staging Gate (G2): Staging full stack. Load test 50 users. WS streaming. Proactive. Eval >=0.70.Gate (G2): Staging full stack. Load test 50 usuarios. WS streaming. Eval ≥0.70.
| Día | Mateo | Andrés | Sergio | Pablo |
|---|---|---|---|---|
| D1 | T3.34 WRITE handlers: update_product/price/pause/activate (1/3) | T3.15DAG Amazon (3/3) — Bronze schemas MeLi+Shopify verificados | T3.24bSubscriptionLifecycleService: activate/cancel grace/upgrade (1d) | T4.16KB batch v2 + v3: pipeline >5min → activar incremental (1/2) |
| D2 | T3.34 WRITE handlers (2/3) | T4.9aAPI GW v2 WebSocket en CDK: $connect/$disconnect/$default + DDB conn-ids | T3.24cMonthly credit reset cron: EventBridge 1ro/mes + pack credits 12m | T4.16KB batch v2 (2/2) — target >80% hit rate retrieval |
| D3 | T3.34 WRITE handlers (3/3) — snapshot pre-write + verify | T3.16IRateLimiter x3: MeLi token bucket + Amazon burst + Shopify leaky | T4.10WebSocket client progresivo (+T3.BB 0.5d): 8 eventos server→client + backoff (1/3) | T4.17Eval auto CI: 50 golden en push main, falla si <0.70 (1/2) |
| D4 | T3.4ProactiveSuggestionService: afterTool hook + dedup 7d + max 2/turno (1/2) | T3.17Onboarding trigger: primer sync post-conexión marketplace (1d) T3.18CI/CD multi-repo: deploy auto staging en merge main + Org Secrets (2d start) | T4.10WS client progresivo (2/3) — dep T4.1 (infra lista D6) | T4.17Eval auto CI (2/2) — resultados → #engineering Slack |
| D5 | T3.4ProactiveSuggestion (2/2) — sin reglas hardcoded, LLM evalua | T3.18CI/CD multi-repo (2/2) — 11 repos | T4.10WS client (3/3 — T3.BB tokens 0.5d) T4.11EnrollmentView standalone: BrowserWindow OAuth redirect por marketplace (1d) | T4.18Testing proactivas datos reales Sellerfy: triggers + dedup + prompt (1/2) |
| D6 | T3.5IGuardService + InputGuard: injection patterns + fuera de scope (1d) T3.5aHttpCreditGate: POST /internal/gate pre-tool, fail-open (1d, dep T3.24) | T4.6Staging deploy full stack AWS+GCP: CDK + Terraform + health checks (1/3) | T4.11EnrollmentView standalone — completar + tests | T4.18Testing proactivas (2/2) — iterar prompt |
| D7 | T4.1WebSocket streaming: reemplazar REST, 8 eventos server→client (1/2) | T4.6Staging deploy (2/3) | T4.12Sentry crash reporting: source maps + agrupación errores (4h) T4.MK1Mockup EnrollmentView (0.5d) | T4.19Selección 10-15 beta users Sellerfy + video walkthrough 2min (1/2) |
| D8 | T4.1WS streaming (2/2) — restaurar sesión en reconexion | T4.6Staging deploy (3/3) — api-staging.shopilot.ai verde | T4.13Feedback Loop scaffold: interfaces + FeedbackEntry models + CDK (1d) T4.MK2Mockup WRITE flow completo (1d start) | T4.19Beta users prep (2/2) — formulario feedback + calls 30min |
| D9 | T4.2SystemPromptComposer L3: bloque escritura writeCapable, cap 1200 tok T4.3OutputGuard: prevención fuga datos + filtrado peligroso (1d) T4.5aFeedbackCapture hook: after_tool WRITE → POST /feedback/capture (1d) | T4.7Load testing 50 usuarios: Artillery/k6, p95 <2s (1/2) | T4.14calculateImpactScore + DynamoFeedbackRepository (1/2) | T4.19aEval contract testing pipeline: Tool Registry→DataSync/MP/Enrichment (1/2) |
| D10 | T4.5Optimización performance: p95 <3s, cache, paralelización (1/2) T4.5bActionLog entity + DynamoActionLogRepository: GSI1 Conv#{convId} (1d) | T4.7Load testing (2/2) T4.8CloudWatch dashboard + alertas PagerDuty/Slack p95+error+LLM cost (1/2) | T4.14calculateImpactScore + repo (2/2) | T4.19aContract testing (2/2) T4.19bKB quality eval: precision@5 + recall + hit rate, falla CI <80% (1d) |
| ★ G2 — Staging full stack. Load test p95 <2s. WS streaming live. Eval >=0.70. Pablo approves.G2 — Staging full stack. Load test p95 <2s. WS streaming. Eval ≥0.70. Pablo aprueba. | ||||
Sprint 9-10 — W9-W10 (10 days) — LaunchSprint 9-10 — W9-W10 (10 días) — Launch Gate (G3 Go/No-Go): 10+ beta users. .dmg signed. Billing live. Eval >=0.70. p95 <3s. 0 P0 bugs.Gate (G3 Go/No-Go): 10+ beta. .dmg firmado. Billing live. Eval ≥0.70. p95 <3s. 0 bugs P0.
| Día | Mateo | Andrés | Sergio | Pablo |
|---|---|---|---|---|
| D1 | T4.5Optimización performance (2/2) — profiling cuellos botella | T4.8CloudWatch dashboard (2/2) — alertas Slack costo LLM/dia >$50 | T4.15FeedbackMeasurerService + Lambdas: processEntries >7d + EventBridge (1/2) | T5.10Billing Stripe live: switch test→live + verificar checkout+webhooks (dep T3.23 + T5.4) |
| D2 | T5.1LLMGuardChecker: Haiku clasificador inputs sospechosos + fallback (1d) | T5.4Deploy producción: CDK prod + Terraform apply + SSL + health checks (1/3) | T4.15FeedbackMeasurer (2/2) — CDK stack | T5.11Onboarding beta 10-15 vendedores: .dmg → marketplace → query (1/3) |
| D3 | T5.2Bug fixes backend P1/P2: edge cases vacíos, LLM rehusa tool, WRITE concurrentes (1/4) | T5.4Deploy producción (2/3) | T4.15aFeedbackGate anti-fatigue: max 1 prompt/dia + cooldown 24h (4h) T4.15bExplicit feedback endpoint: POST /feedback/:userId/explicit (4h) | T5.11Onboarding beta (2/3) |
| D4 | T5.2Bug fixes backend (2/4) | T5.4Deploy producción (3/3) — api.shopilot.ai | T4.15cImplicit feedback endpoint: POST /feedback/implicit + edited/rejected (4h) T4.15dGrace period 7d billing: webhook subscription.deleted + cron (4h) | T5.11Onboarding beta (3/3) — activación 1+ tool primera sesión |
| D5 | T5.2Bug fixes backend (3/4) | T5.5IaC prod completo: DDB PITR 35d + Secrets Mgr + IAM + concurrency (1/2) | T5.7Code signing + .dmg + auto-updater: notarytool + S3 releases (1/2) | T5.12Feedback calls: 15min/beta user + top 5 issues → Linear (1/2) |
| D6 | T5.2Bug fixes backend (4/4) — tokens expirados mid-conv | T5.5IaC prod (2/2) — GCS lifecycle + Redis backup + PG backups | T5.7.dmg + auto-updater (2/2) — probar Mac limpio sin dev tools | T5.12Feedback calls (2/2) — documentar funciona/no funciona |
| D7 | T5.3System Prompt v3 final: ajuste tono + tool selection + edge cases (1d) | T5.6Rollback testing: Lambda revert <1min + Cloud Run revision <1min (1d) | T5.8Hardening Electron: CSP + sandbox + nodeIntegration=false (1d) | T5.13OWASP top 10 review: injection/auth/XSS/SSRF + arreglar P1s (1d) |
| D8 | — | T5.6aData Sync Fase 4: OpenMetadata FQNs + embed_fast_dag + embed_health_dag (1/2) | T5.9Bug fixes UI/UX beta (+T4.BB post-audit 0.5d): RAM profiling <500MB + animaciones + loading (1/4) | T5.14System Prompt v2 Beautonomous: 10 semanas uso real + indexar docs técnicos (1d) |
| D9 | — | T5.6aData Sync OpenMetadata (2/2) — linaje visible | T5.9Bug fixes UI/UX (2/4) | T5.15aE2E eval pipeline: 10+ escenarios flujo completo query→tool→response (1/2) |
| D10 | — | — | T5.9Bug fixes UI/UX (3/4 + 4/4 — T4.BB post-audit) T5.MK1Mockup Dashboard view (1d start) | T5.15aE2E eval pipeline (2/2) |
| ★ G3 Go/No-Go — T5.15: Pablo firma. Checklist: tools OK, Billing live, 10+ beta, .dmg firmado, OWASP P1s, p95 <3s, eval >0.70.G3 Go/No-Go — T5.15: Pablo firma. Checklist completo → Launch. | ||||
S11-12 — Buffer (Weeks 11-12) — Circuit Breaker Scope S11-12 — Buffer (Semanas 11-12) — Scope Circuit Breaker
Absorbs beta bugs, hardening, and deferred scope cut by circuit breakers in S7-8. Absorbe bugs beta, hardening, y scope diferido por circuit breakers en S7-8.
| Ingeniero | S11 — Hardening | S12 — Scope Diferido |
|---|---|---|
| Mateo | Bug fixes P1/P2 inteligencia. Optimización p95 si no alcanzado. WRITE tools cortadas en S7-8 (T4.4 circuit breaker) | Advertising tools Fase 5: 4 WRITE (create/update/pause/activate_campaign). Enrichment Rainforest API adapter (Amazon market intel). ProactiveSuggestions v2. LLMGuardChecker Phase 2 |
| Andrés | Hardening producción: alertas, runbooks, rollback drills. Fix bugs adapters Amazon/Shopify | DAG Silver→Gold (si cortado T4.9). Rate limiters datos reales. Monitoring expandido |
| Sergio | Bug fixes UI/UX beta pendientes. RAM profiling. .dmg hotfix si necesario | Auto-updater S3. Windows build (si alcanza). FeedbackThrottle anti-fatigue refinement. Feedback UI mejorada |
| Pablo | Iteración Eval conversaciones reales. Expansión golden dataset edge cases | KB v3: docs preguntas beta que v2 no cubía. Eval score target 0.80 |
9.9.2Cross-Engineer Handoff Schedule — 17 Critical Dependencies 9.9.2Schedule de Handoffs Cross-Ingeniero — 17 Dependencias Críticas
Every task that blocks a different engineer. The day the “From” task completes is the earliest start for “Unblocks”. Extracted from doc 70 “Depende” column — cross-owner only. Cada tarea que bloquea a otro ingeniero. El día que completa “From” es el inicio más temprano de “Desbloquea”. Extraído de la columna “Depende” del doc 70 — solo cross-owner.
| # | Handoff — What transfersQué se entrega | From → To | Done byListo en | UnblocksDesbloquea | Earliest startInicio mín. | Risk |
|---|---|---|---|---|---|---|
| 1 | T1.25 — 10 READ tool specs: nombres, schemas, risk levels | Pablo → Mateo | C1-D9 | T2.1 ToolRegistry registra definiciones desde specs | C2-D1 | MED |
| 2 | T1.1 — DynamoDB fix: OrchestrationSession schema estable | Mateo → Andrés | C1-D3 | T2.15 CDK base AWS necesita DDB table schema correcto | C2-D1 | MED |
| 3 | T1.7 — RestResponseEventEmitter: endpoint REST live | Mateo → Sergio | C1-D7 | T2.18 CoachWebSocket necesita REST fallback endpoint | C2-D3 | HIGH |
| 4 | T1.12 — MeLiAdapter: OAuth + IMarketplaceAdapter tipado | Andrés → Sergio | C1-D9 | T2.21 OnboardingWizard requiere OAuth MeLi funcional | C2-D5 | HIGH |
| 5 | T1.6 + T2.1 — ReAct loop + ToolRegistry: 10 tools operativos | Mateo → Pablo | C2-D5 | T2.25 E2E playground necesita loop + tools para probar | C2-D6 | HIGH |
| 6 | T2.13 — Fast Data Layer: 11 endpoints FastAPI live | Andrés → Mateo | C2-D6 | T3.1 READ handlers se conectan a Fast Data (contrato definido) | C3-D1 | HIGH |
| 7 | T3.2 — ConfirmationFlow backend: hold queue + DDB TTL 35min | Mateo → Sergio | C3-D4 | T3.20 Diálogos UI de confirmación integran con ConfirmationFlow API | C3-D5 | HIGH |
| 8 | T3.1 + T3.3 — READ handlers reales + WRITEs operativos | Mateo → Pablo | C3-D9 | T3.28 QA flujos end-to-end sobre tools ya operativas | C3-D10 | HIGH |
| 9 | T3.24 — ICreditsGate: contrato POST /internal/gate | Sergio → Mateo | C3-D9 | T3.5a HttpCreditGate llama POST /internal/gate antes de cada tool | C3-D10 | HIGH |
| 10 | T3.4 — ProactiveSuggestionService: afterTool hook events | Mateo → Pablo | C3-D5 | T4.18 Testing proactivas requiere ProactiveSuggestion event format | C4-D8 | MED |
| 11 | T4.1 — WS streaming server: contrato 8 eventos definido | Mateo → Sergio | C4-D7 | T4.10 WS client progresivo maneja 8 tipos de evento del server | C4-D8 | CRIT |
| 12 | T4.5a — FeedbackCapture hook: after_tool escribe FeedbackEntry | Mateo → Sergio | C4-D9 | T4.13+ Feedback Loop scaffold: IFeedbackRepository + models | C4-D10 | HIGH |
| 13 | T4.6 — Staging full stack: api-staging.shopilot.ai | Andrés → Andrés + Pablo | C4-D8 | T4.7 load test 50 users + T4.17 Eval CI gate sobre staging | C4-D9 | HIGH |
| 14 | T5.4 — Producción deployed: api.shopilot.ai live | Andrés → Pablo | C5-D4 | T5.11 Beta onboarding usa URL producción real (no staging) | C5-D5 | CRIT |
| 15 | T5.7 — .dmg code-signed + notarizado: instalable en Mac virgen | Sergio → Pablo | C5-D6 | T5.11 Beta users instalan .dmg en llamadas 1-on-1 de onboarding | C5-D7 | CRIT |
| 16 | T5.10 — Billing Stripe live: transacciones reales funcionando | Sergio → Mateo | C5-D8 | T5.3 System Prompt v3 activa monetización en prompts de producción | C5-D9 | HIGH |
| 17 | T5.1–T5.14 — Todo el equipo: todos los entregables S5 listos | Todos → Pablo | C5-D9 | T5.15 Go/No-Go checklist final: 11 criterios deben estar ✓ | C5-D10 | CRIT |
9.9.3 Task Summary by Engineer Resumen de Tareas por Ingeniero
4 engineers • 183 tasks • 50d each4 ingenieros • 183 tareas • 50d c/uAggregate view of tasks, estimated days, and projects per engineer across the full 10-sprint MVP. Identifies overloads early and informs circuit-breaker decisions. Vista agregada de tareas, días estimados y proyectos por ingeniero en los 10 sprints del MVP. Identifica sobrecargas temprano e informa decisiones de circuit-breaker.
Mateo — CTO
53 taskstareasPrimary owner: 2-INTELLIGENCE layer (projects #2–#8), 3-KNOWLEDGE (#9 Cerebro KB). Secondary: #11 Enrichment, #17 CORE system prompt, #18 Design System token pipeline. Highest cognitive load — ReAct orchestration + all WRITE tool handlers + KB architecture Go 1.24 + Vertex AI.Propietario principal: capa 2-INTELLIGENCE (proyectos #2–#8), 3-KNOWLEDGE (#9 Cerebro KB). Secundario: #11 Enrichment, system prompt #17 CORE, #18 Design System token pipeline. Mayor carga cognitiva — orquestación ReAct + todos los handlers tools WRITE + arquitectura KB Go 1.24 + Vertex AI.
| Sprint | Key TasksTareas Clave | Est. |
|---|---|---|
| S1–2 | T1.1 DynamoDB, T1.3 UserProfile, T1.6 ILLMClient, T1.7 REST, T1.8 ReAct scaffold, T1.9–T1.12 system prompt v1, context window, T1.21–T1.23 KB fix+retrieval+docs, T1.25 10 READ specs | 14d |
| S3–4 | T2.1 HookLifecycle, T2.2 ToolRegistry, T2.3–T2.8 10 READ stubs, T2.9 IContextWindowManager, T2.11 system prompt v1, T2.22 KB incremental, T2.23 batch embeddings | 15d |
| S5–6 | T3.1 Fast Data Layer wiring, T3.2–T3.7 READ real handlers, T3.4 ProactiveSuggestionService, T3.8 ConfirmationFlow backend, T3.11–T3.14 WRITE handlers (4 tools), T3.25 KB BigQuery indexing, T3.32 DS token pipeline (dep T0.BB) | 20d |
| S7–8 | T4.1 WebSocket upgrade, T4.2–T4.6 remaining 13 WRITE handlers, T4.3 ActionLog, T4.4 ToolPolicyFilter, T4.5 system prompt v2, T4.16 KB batch v2 | 20d |
| S9–10 | T5.1 system prompt v3, T5.2 Personality Engine, T5.3 ProactiveSuggestions v2, T5.4 Context Aggregator, T5.5 bug fixes P0/P1 | 10d |
Overload mitigation: Mateo (1.58×) is overloaded. Sergio reduced to 1.00× after #18 DS Figma moved to external UX/UI team. MANDATORY: reassign Enrichment CDK (T3.10) to Andrés in S5–6. If Mateo slips ≥3d, defer remaining WRITE tools to S11–12. Mateo is the highest-risk SPOF — owns Intelligence + KB + DS pipeline.Mitigación sobrecarga: Mateo (1.58×) está sobrecargado. Sergio reducido a 1.00× tras mover #18 DS Figma al equipo externo UX/UI. OBLIGATORIO: reasignar CDK Enrichment (T3.10) a Andrés en S5–6. Si Mateo se atrasa ≥3d, diferir WRITE tools restantes a S11–12. Mateo es el SPOF de mayor riesgo — es dueño de Intelligence + KB + pipeline DS.
Andrés — Data+BE
36 taskstareasPrimary owner: 4-ACTION (#12 Marketplace Provider) + 5-PLATFORM (#14 DevOps). Secondary: 3-KNOWLEDGE (#10 Data Sync). Float in S1–4, blocker from S5 (Fast Data Layer is the critical dependency).Propietario principal: 4-ACTION (#12 Marketplace Provider) + 5-PLATFORM (#14 DevOps). Secundario: 3-KNOWLEDGE (#10 Data Sync). Holgura en S1–4, bloqueante desde S5 (Fast Data Layer es la dependencia crítica).
| Sprint | Key TasksTareas Clave | Est. |
|---|---|---|
| S1–2 | T1.14 MeLi OAuth2, T1.15 AmazonAdapter scaffold + SP-API request Day 1, T1.16 IMarketplaceAdapter, T1.17 Terraform VPC/CDK base, T1.33 electron-builder CI | 8.5d |
| S3–4 | T2.10 ShopifyOAuth2Flow + AmazonAdapter OAuth, T2.12 TokenRefreshCron, T2.13 DynamoDB credentials, T2.14 CDK Cloud Run, T2.15 Secret Manager | 12d |
| S5–6 | T3.13 Fast Data Layer Cloud Run, T3.15 Redis ElastiCache, T3.16 IRateLimiter, T3.17 CI/CD GitHub Actions, T3.18 Billing DynamoDB, T3.19 ICreditsGate | 14d |
| S7–8 | T4.7 load test k6, T4.8 CloudWatch dashboards, T4.9 PagerDuty, T4.9a WebSocket CDK, Silver+Gold circuit breaker, Staging deploy | 10d |
| S9–10 | T5.6 Production Terraform deploy, T5.6a rollback + OpenMetadata, OWASP scan infra, prod monitoring | 8d |
Float buffer: S1–4 has 2d float per sprint. Critical from S5: Fast Data Layer (T3.13) must be ready before Mateo can wire real handlers. Amazon SP-API approval: requested Day 1 — if not by S3–4, scaffold with mocks and defer to S5.Buffer holgura: S1–4 tiene 2d holgura por sprint. Crítico desde S5: Fast Data Layer (T3.13) debe estar listo antes de que Mateo conecte handlers reales. Aprobación Amazon SP-API: solicitada Día 1 — si no aprobada en S3–4, scaffold con mocks y diferir a S5.
Sergio — Full-stack
45 taskstareasPrimary owner: 1-PRODUCT (#1 Native Shell). Secondary: 5-PLATFORM (#13 Billing), 6-QUALITY (#15 Feedback Loop), 3-KNOWLEDGE (#11 Enrichment). #18 DS Figma now owned by external UX/UI team — Sergio CONSUMES Figma components and creates integration Mockups (T*.MK*). Load reduced from 1.50× to ~1.06×.Propietario principal: 1-PRODUCT (#1 Native Shell). Secundario: 5-PLATFORM (#13 Billing), 6-QUALITY (#15 Feedback Loop), 3-KNOWLEDGE (#11 Enrichment). #18 DS Figma ahora propiedad del equipo externo UX/UI — Sergio CONSUME componentes Figma y crea Mockups de integración (T*.MK*). Carga reducida de 1.50× a ~1.06×.
| Sprint | Key TasksTareas Clave | Est. |
|---|---|---|
| S1–2 | T1.16 Scaffold Electron, T1.17 MainWindow, T1.18 MarketplaceDetector, T1.20 Auth Memberstack, T1.32 canary build. W2 with T0.BB:Sem 2 con T0.BB: T1.19 sidebar UI (2.5d), T1.MK1 Mockup shell container (0.5d) | 9d |
| S3–4 | T2.17 Chat UI + Markdown (2.5d), T2.18 CoachWebSocketService, T2.19 URL context injection, T2.20 react-router, T2.21 OnboardingWizard (2.5d), T2.MK1 Mockup ChatView (1d), T2.MK2 Mockup Onboarding (0.5d), T2.40 Gate 1 signed build | 10.5d |
| S5–6 | T3.19 BillingView (2.5d), T3.20 IMarketIntelligenceAdapter + MeLi Search, T3.21 Rainforest API (1.5d), T3.9 EnrichmentCache TTL Redis, T3.22 Billing Stripe Checkout, T3.MK1 Mockup BillingView (0.5d), T3.MK2 Mockup ProfileView (0.5d), T3.MK3 Mockup ConfirmDialog (0.5d) | 11.5d |
| S7–8 | T4.10 WebSocket 8 events (2.5d), T4.11 EnrollmentView, T4.12 Sentry, T4.13–T4.15 FeedbackLoop, T4.MK1 Mockup EnrollmentView (0.5d), T4.MK2 Mockup WRITE flow completo (1d), T4.24 Gate 2 signed build | 10.5d |
| S9–10 | T5.7 .dmg code signing + notarization, T5.8 Electron hardening, T5.9 beta bugs + RAM <500MB (3.5d), T5.10 Stripe live, T5.MK1 Mockup Dashboard view (1d) | 8.5d |
#18 DS Figma tasks moved to external UX/UI team — Sergio now consumes Figma components and creates integration Mockups (T*.MK*). Load reduced from 1.50× to 1.00×. Pablo is approval gate for all T*.BB deliverables. MANDATORY: T3.10 Enrichment CDK → Andrés.Tareas Figma #18 DS movidas al equipo externo UX/UI — Sergio ahora consume componentes Figma y crea Mockups de integración (T*.MK*). Carga reducida de 1.50× a 1.00×. Pablo es gate de aprobación para todos los entregables T*.BB. OBLIGATORIO: T3.10 CDK Enrichment → Andrés.
Pablo — CEO
32 taskstareasPrimary owner: 6-QUALITY (#16 Eval Suite), 7-INTERNAL (#17 CORE Beautonomous). Gate decision-maker for all 3 Go/No-Go gates. 11d of slack for beta user recruitment, feedback calls, and strategic decisions.Propietario principal: 6-QUALITY (#16 Eval Suite), 7-INTERNAL (#17 CORE Beautonomous). Tomador de decisiones en los 3 gates Go/No-Go. 11d holgura para reclutamiento beta, llamadas feedback y decisiones estratégicas.
| Sprint | Key TasksTareas Clave | Est. |
|---|---|---|
| S0 | T0.1–T0.8: OpenClaw project, Linear structure, System Prompt v1 Beautonomous, role mapping, validation. T0.9–T0.10: Apple Developer + Windows cert. T0.11: Brand Book delivery | 4d |
| S1–2 | T1.24 Eval scaffold + golden dataset, T1.26 brand registration, T1.27 store auth (Apple+Windows). #18 UX/UI approves T0.BB + T1.BB | 8d |
| S3–4 | T2.24 LLM Judge + EvalRunner, T2.25 E2E test suite, T2.26 ~150 Linear tasks, T2.26a quality gate. #18 UX/UI approves T2.BB | 7d |
| S5–6 | T3.26 Eval CI automation, T3.27 golden dataset 50 examples, T3.28 QA 3 marketplaces. #18 UX/UI approves T3.BB | 7d |
| S7–8 | T4.17 Eval automated CI (score ≥0.70), T4.18 proactive suggestions testing, T4.19 beta prep, T4.19a contract testing, T4.19b KB quality eval. #18 UX/UI approves T4.BB | 8d |
| S9–10 | T5.11 onboarding beta 10+ users, T5.12 feedback calls, T5.13 OWASP review, T5.14 Beautonomous SP v2, T5.15 Go/No-Go, T5.15a E2E eval pipeline. #18 UX/UI: no BB — pipeline closed, point queries only | 5d |
Gate authority: Pablo is the sole Go/No-Go decision-maker for Gate 1 (S4), Gate 2 (S8), Gate 3 (S10). Eval score is the objective metric — if <0.70 at Gate 2, Pablo blocks launch regardless of feature completeness.Autoridad de gate: Pablo es el único tomador de decisiones Go/No-Go para Gate 1 (S4), Gate 2 (S8), Gate 3 (S10). Eval score es la métrica objetiva — si <0.70 en Gate 2, Pablo bloquea lanzamiento independientemente de completitud de features.
UX/UI — External Design TeamEquipo Externo de Diseño
#18 Design System Figma. Delivers T*.BB Brand Book milestones. Pablo is approval gate for all deliverables. Sergio consumes Figma components for integration Mockups.#18 Design System Figma. Entrega hitos T*.BB Brand Book. Pablo es gate de aprobación para todos los entregables. Sergio consume componentes Figma para Mockups de integración.
| Sprint | DeliverablesEntregables | Est. |
|---|---|---|
| S1–2 W1 | T0.BB (4d) — Figma Foundations + Tokens + Iconography + Core Components partialFigma Foundations + Tokens + Iconografía + Core Components parcial | 4d |
| S1–2 W2 | T1.BB (6d) — Atoms + Molecules base + Chat OrganismsAtoms + Molecules base + Chat Organisms | 6d |
| S3–4 | T2.BB (6d) — Molecules remaining + Data/Flow OrganismsMolecules restantes + Data/Flow Organisms | 6d |
| S5–6 | T3.BB (5d) — Advanced Organisms + [LIB] Pattern ComponentsAdvanced Organisms + [LIB] Pattern Components | 5d |
| S7–8 | T4.BB (3d) — Figma Quality Audit + CorrectionsAuditoría de Calidad Figma + Correcciones | 3d |
| S9–10 | No BB — pipeline closed, point queries onlySin BB — pipeline cerrado, solo consultas puntuales | — |
External team works in parallel. Pablo approves each T*.BB milestone before Sergio can integrate. T0.BB must be ready by S1-2 W1 end for Sergio to use tokens in T1.19.Equipo externo trabaja en paralelo. Pablo aprueba cada hito T*.BB antes de que Sergio pueda integrar. T0.BB debe estar listo al final de S1-2 W1 para que Sergio use tokens en T1.19.
Capacity Overview — All Engineers Resumen de Capacidad — Todos los Ingenieros
| EngineerIngeniero | S1–2 | S3–4 | S5–6 | S7–8 | S9–10 | TotalTotal | Cap. | LoadCarga |
|---|---|---|---|---|---|---|---|---|
| Mateo | 14d | 15d | 20d | 20d | 10d | 79d | 50d | 1.58× |
| Andrés | 8.5d | 12d | 14d | 10d | 8d | 52.5d | 50d | 1.05× |
| Sergio | 9d | 10.5d | 11.5d | 10.5d | 8.5d | 50d | 50d | 1.00× |
| Pablo | 8d | 7d | 7d | 8d | 5d | 35d +S0: 4d | 50d | 0.78× |
| UX/UI ext. | 10d | 6d | 5d | 3d | — | 24d | ext.ext. | EXT |
Estimates include design + implementation + review. Does not count async standup, ceremonies (<2h/week). Circuit breaker: any task not done at sprint deadline is cut to S11–12 (Shape Up rule).Estimados incluyen diseño + implementación + revisión. No incluye standup asíncrono, ceremonias (<2h/semana). Circuit breaker: cualquier tarea no lista al deadline del sprint se corta a S11–12 (regla Shape Up).
168 internal tasks (T0.1–T5.MK1) + 5 external UX/UI milestones (T0.BB–T4.BB) across 6 phases, 6 sprint cycles (10+2 weeks), 19 projects. All tracked in Linear via Beautonomous (#17 CORE). 168 tareas internas (T0.1–T5.MK1) + 5 hitos externos UX/UI (T0.BB–T4.BB) en 6 fases, 6 ciclos (10+2 semanas), 19 proyectos. Todas trackeadas en Linear via Beautonomous (#17 CORE).
9.9.4Project #17 CORE — Governance Across All Projects 9.9.4Proyecto #17 CORE — Gobernanza en Todos los Proyectos
Every project in the MVP must comply with Beautonomous governance. This is not optional — it's the foundation that enables 4 engineers to operate as 10-15. Cada proyecto en el MVP debe cumplir con la gobernanza de Beautonomous. Esto no es opcional — es la base que permite a 4 ingenieros operar como 10-15.
| Proj | CORE Governance Requirements Requisitos Gobernanza CORE |
|---|---|
| #12 | All PRs reviewed via Beautonomous (El Mago). OAuth token changes require 🍠 approval. Adapter code changes logged in Audit Log.Todos los PRs revisados via Beautonomous (El Mago). Cambios en tokens OAuth requieren aprobacion 🍠. Cambios en codigo de adaptadores registrados en Audit Log. |
| #8 | Alert configurations reviewed by El Mago. Dashboard access: read-only for El Artesano, full for El Mago.Configuraciones de alertas revisadas por El Mago. Acceso dashboard: solo lectura para El Artesano, completo para El Mago. |
| #13 | Stripe configuration changes: 🔴 irreversible (El Mago approval). Pricing changes: El Capitan proposes, El Mago approves.Cambios configuracion Stripe: 🔴 irreversible (aprobacion El Mago). Cambios de precios: El Capitan propone, El Mago aprueba. |
| #10 | DAG configuration changes: 🍠 requires approval. Beautonomous monitors Airflow pipeline health. Data deletion: 🔴 irreversible.Cambios configuracion DAG: 🍠 requiere aprobacion. Beautonomous monitorea salud pipelines Airflow. Eliminacion de datos: 🔴 irreversible. |
| #2 | LLM model changes: 🍠 El Mago approval. System prompt updates: tracked in Beautonomous with version history. Cost guard threshold: El Capitan + El Mago.Cambios modelo LLM: 🍠 aprobacion El Mago. Actualizaciones system prompt: trackeadas con historial. Cambios umbral cost guard: El Capitan + El Mago. |
| #3 | Adding/removing tools: 🍹 reversible, El Mago reviews. Tool risk level assignments: El Mago only.Agregar/quitar tools: 🍹 reversible, El Mago revisa. Asignacion niveles de riesgo de tools: solo El Mago. |
| #9 | KB document updates: 🍹 reversible, all roles. KB schema changes: 🍠 El Mago approval.Actualizaciones docs KB: 🍹 reversible, todos los roles. Cambios schema KB: 🍠 aprobacion El Mago. |
| #15 | Feedback data access: read-only for all. Feedback rules changes: El Mago.Acceso datos feedback: solo lectura para todos. Cambios reglas feedback: El Mago. |
| #4 | Personality/tone changes: El Capitan proposes (product), El Mago reviews (technical). Prompt template versioning via Beautonomous.Cambios personalidad/tono: El Capitan propone (producto), El Mago revisa (tecnico). Versionado templates de prompt via Beautonomous. |
| #5 | Context source changes: 🍹 reversible. Stale threshold changes: El Mago.Cambios fuentes de contexto: 🍹 reversible. Cambios umbral stale: El Mago. |
| #6 | Proactive rule changes: El Capitan defines (product), El Mago implements (code). New rules: 🍹 reversible.Cambios reglas proactivas: El Capitan define (producto), El Mago implementa (codigo). Reglas nuevas: 🍹 reversible. |
| #1 | UI text/color changes: 🍹 reversible, all roles can propose via Beautonomous PR. Code signing: 🔴 El Mago only. Feature flags: 🍠 El Mago.Cambios texto/color UI: 🍹 reversible, todos los roles via PR Beautonomous. Code signing: 🔴 solo El Mago. Feature flags: 🍠 El Mago. |
| #14 | Infrastructure changes: 🍠 all require El Mago approval. Production deploy: 🔴 El Mago explicit confirmation via Beautonomous. Secrets rotation: 🔴 irreversible.Cambios infraestructura: 🍠 todos requieren aprobacion El Mago. Deploy produccion: 🔴 confirmacion explicita El Mago. Rotacion de secrets: 🔴 irreversible. |
| #16 | Eval threshold changes: El Capitan + El Mago. Test case additions: 🍹 reversible, all roles.Cambios umbrales eval: El Capitan + El Mago. Adicion casos de prueba: 🍹 reversible, todos los roles. |
| #7 | Guard rule changes: 🍠 El Mago (security critical). Data leak alert: 🔴 immediate notification to entire team.Cambios reglas guard: 🍠 El Mago (critico seguridad). Alerta fuga datos: 🔴 notificacion inmediata a todo el equipo. |
| #11 | External API credential changes: 🍠 El Mago. New adapter integration: 🍹 reversible. API cost monitoring via Beautonomous alerts.Cambios credenciales API externas: 🍠 El Mago. Integracion nuevo adaptador: 🍹 reversible. Monitoreo costos API via alertas Beautonomous. |
9.10 Critical Path & Dependencies Camino Critico y Dependencias
Critical Path (blocks everything downstream) Camino Critico (bloquea todo lo que sigue)
If ANY item on the critical path slips, the launch date slips. The critical path goes through Mateo's work (S1-8) and then Pablo's beta onboarding (S9). Si CUALQUIER item del camino crítico se retrasa, la fecha de lanzamiento se retrasa. El camino crítico pasa por el trabajo de Mateo (S1-8) y luego el beta onboarding de Pablo (S9).
Cross-Track Dependencies Dependencias Cross-Track
Non-Critical Path: AndresCamino No-Critico: Andres
Andres has float in S1-4 (stubs don't depend on real adapters — Mateo can use mock data). From S5-6 Andres is a blocker: Fast Data Layer (T3.13) blocks Mateo's real READ handlers (T3.1). No float from S5 onward.Andres tiene float en S1-4 (los stubs no dependen de adaptadores reales — Mateo puede usar datos mock). Desde S5-6 es blocker: Fast Data Layer (T3.13) bloquea los READ handlers reales de Mateo (T3.1). Sin float desde S5 en adelante.
Non-Critical Path: Sergio (partially)Camino No-Critico: Sergio (parcialmente)
Electron shell and billing have ~1 week float for S1-6. New dependency: UX/UI delivers Figma components (T0.BB–T4.BB) each sprint; Sergio consumes them + creates Mockups to validate integration. If UX/UI delivery slips, Sergio can continue with placeholder tokens. S7-10 (onboarding + shipping) becomes critical — must ship .dmg for beta by S9.Shell Electron y billing tienen ~1 semana de float para S1-6. Nueva dependencia: UX/UI entrega componentes Figma (T0.BB–T4.BB) cada sprint; Sergio los consume + crea Mockups para validar integración. Si la entrega de UX/UI se retrasa, Sergio puede continuar con tokens placeholder. S7-10 (onboarding + shipping) se vuelve crítico — debe entregar .dmg para beta en S9.
What If the Critical Path Slips?Que Pasa Si el Camino Critico Se Retrasa?
• ReAct Loop slips 1 week: Sergio builds chat UI against REST mock. Pablo tests via API directly. Recoverable.Loop ReAct se retrasa 1 semana: Sergio construye chat UI contra mock REST. Pablo testea via API directo. Recuperable.
• Tool Registry slips 1 week: Gate 1 delayed. Use S5-6 buffer. WRITE tools phase must be tighter (1 week instead of 2).Tool Registry se retrasa 1 semana: Gate 1 retrasado. Usar buffer S5-6. Fase de tools WRITE debe ser mas ajustada (1 semana en vez de 2).
• WRITE Tools slip 1 week: Ship beta with read-only tools. WRITE tools added in S10 as hotfix. Acceptable degradation.Tools WRITE se retrasan 1 semana: Lanzar beta con tools de solo lectura. Tools WRITE agregadas en S10 como hotfix. Degradacion aceptable.
• Multiple things slip: Cut scope: launch with MeLi only (defer Amazon + Shopify), or launch READ + ANALYSIS only (defer WRITE tools). Shopify adapter (T2.10) slipping to S11 is a major delay — scope cut is preferable.Varias cosas se retrasan: Cortar scope: lanzar solo con MeLi (diferir Amazon + Shopify), o lanzar solo READ + ANALYSIS (diferir tools WRITE). Diferir ShopifyAdapter (T2.10) a S11 es un retraso enorme — mejor cortar scope.
9.11 Cross-Project Dependency Map Mapa de Dependencias Cross-Proyecto
Three complementary views: blocking table, critical path DAG, and key task chains. Hard deps block execution; soft deps degrade quality if delayed. Tres vistas complementarias: tabla de bloqueos, DAG de ruta critica, y cadenas de tareas clave. Deps duras bloquean ejecucion; deps suaves degradan calidad si se retrasan.
A — Project Blocking Table A — Tabla de Bloqueos por Proyecto
| Blocking ProjectProyecto Bloqueador | Blocked ProjectsProyectos Bloqueados | Type | Critical? | Earliest UnblockDesbloqueo Más Temprano |
|---|---|---|---|---|
| #17 Beautonomous | ALL 19 active projects — governance, task tracking, code reviewTODOS los 19 proyectos activos — gobernanza, task tracking, code review | Hard | Yes — Day 0Sí — Dia 0 | W0 (T0.8) |
| #2 Orchestrator | #3 Tool Registry (ReAct executes tools) • #6 Proactive (afterTool hook) • #7 Guardrails (pre/post-LLM) • #4 Personality (prompt composition) | Hard | Yes | W2 (T1.6) |
| #12 Marketplace | #10 Data Sync (adapters needed for DAGs) • #5 Context Agg. (reads via adapters) • #11 Enrichment (MeLi Search adapter) • #3 Tools (adapter dispatch) | Hard | Yes | W2 (T1.12) |
| #14 DevOps | ALL projects — infra must exist before any service deploys to staging/productionTODOS los proyectos — infra debe existir antes de cualquier deploy a staging/produccion | Hard | Yes | W3 (T2.15) |
| #3 Tool Registry | #6 Proactive (evaluates tool results) • #15 Feedback (captures write executions) • #13 Billing (credits gate pre-tool) | Hard | Yes | W3 (T2.1) |
| #1 Shell | #13 Billing UI (BillingView in sidebar) • #6 Proactive (suggestion cards UI) | Soft | No | W2 (T1.19) |
| #8 Observability | #16 Eval Suite (eval metrics sourced from traces) • #14 DevOps (alerts in monitoring) | Soft | No | W2 (T1.8) |
| #10 Data Sync | #9 Cerebro KB (embeddings from Gold data) • #5 Context Agg. (fresh data for context) | Soft | No | W3 (T2.13) |
| #13 Billing | #2 Orchestrator (credits gate blocks tool calls) | Hard | No (fail-open) | W6 (T3.24) |
| #18 Design System | #1 Native Shell (tokens + components consumed by desktop client UI via Mockupstokens + componentes consumidos por la UI del cliente vía Mockups) | Soft | No | T0.BB (W1) → T4.BB (W8)T0.BB (S1) → T4.BB (S8) |
B — Critical Path DAG (Project Level) B — DAG Ruta Critica (Nivel Proyecto)
#17 CORE ──────────────────────────────────────────────────────────── governance (all sprints)
│
├──► #2 Orchestrator (W1-2) ──► #3 Tool Registry (W3) ──► #6 Proactive (W5) ──► BETA
│ │ │ │
│ │ #7 Guardrails #15 Feedback
│ │ (W5, pre/post LLM) (W7, write hooks)
│ │
│ └──► #4 Personality (W1, system prompt)
│
├──► #12 Marketplace (W1-2)
│ ├──► #10 Data Sync (W3) ──► #9 Cerebro KB (W3-4) ──► #5 Context Agg. (W3)
│ │ │ │
│ │ #11 Enrichment (W5) RAG quality
│ │
│ └──► #13 Billing (W6) ──► Ship (W10)
│ │
│ #1 Shell (W1-10, parallel UI) ◄── #18 Design System (UX/UI team, T0.BB→T4.BB + Sergio Mockups)
│
├──► #14 DevOps (W3 CDK IaC) ──► Staging (W7-8) ──► Production (W9-10) ──► Ship
│
└──► #8 Observability (W1-2) ──► #16 Eval Suite (W3-8) ──► CI/CD Quality Gate
C — Critical Task Chains (Cross-Engineer) C — Cadenas de Tareas Criticas (Cross-Ingeniero)
Main Product Path (blocking — if any step slips, W10 slips)Ruta Principal del Producto (bloqueante — si cualquier paso se retrasa, la S10 se retrasa)
T0.8 → T1.6 (AgentLoop) → T2.1 (ToolRegistry) → T3.3 (WRITE handlers) → T4.1 (WS Streaming) → T5.7 (.dmg) → T5.10 (Billing live) → T5.15 (Go/No-Go)
Owners: All → Mateo → Mateo → Mateo → Mateo → Sergio → Sergio → PabloOwners: Todos → Mateo → Mateo → Mateo → Mateo → Sergio → Sergio → Pablo
Shell + WebSocket Integration PathRuta Shell + Integración WebSocket
T0.8 → T1.16 (Electron scaffold) → T1.17 (MainWindow+WCV) → T1.19 (Tabs+Sidebar) → T2.17 (Chat UI) → T2.18 (CoachWebSocket) → T4.10 (WS client) → T5.7 (.dmg)
Owner: Sergio (all steps)Owner: Sergio (todos los pasos)
Marketplace Data PathRuta de Datos de Marketplace
T1.9 (scaffold) → T1.10 (IAdapter) → T1.12 (MeLiAdapter) → T2.13 (DataSync arch) → T3.13 (Fast Data Layer) → T2.6 (IContextAssembler) → T4.6 (Staging) → T5.4 (Prod)
Owners: Andrés → Andrés → Andrés → Andrés → Andrés → Mateo → Andrés → AndrésOwners: Andrés → Andrés → Andrés → Andrés → Andrés → Mateo → Andrés → Andrés
Billing PathRuta de Billing
T3.23 (Stripe Checkout) → T3.24 (ICreditsGate backend) → T3.24a (schema migration) → T3.5a (HttpCreditGate in API) → T5.10 (Billing live)
Owners: Sergio → Sergio → Sergio → Mateo → SergioOwners: Sergio → Sergio → Sergio → Mateo → Sergio
UX/UI → Sergio Figma Component Path (every sprint)Ruta Componentes Figma UX/UI → Sergio (cada sprint)
T0.BB (W1) → T1.19+T1.MK1 (W2) → T1.BB (W2) → T2.17+T2.MK1+T2.21+T2.MK2 (W3-4) → T2.BB (W4) → T3.19+T3.MK1+T3.22+T3.MK2+T3.20+T3.MK3 (W5-6) → T3.BB (W6) → T4.10+T4.MK2+T4.11+T4.MK1 (W7-8) → T4.BB (W8) → T5.9+T5.MK1 (W9-10)
Owners: UX/UI → Sergio → UX/UI → Sergio → UX/UI → Sergio → UX/UI → Sergio → UX/UI → Sergio. Pablo approves each T*.BB deliveryOwners: UX/UI → Sergio → UX/UI → Sergio → UX/UI → Sergio → UX/UI → Sergio → UX/UI → Sergio. Pablo aprueba cada entrega T*.BB
Cross-Engineer Critical HandoffsHandoffs Críticos Cross-Ingeniero
Mateo T1.7 (REST emitter) —→ Sergio T2.18 (CoachWebSocket needs REST fallback endpoint)
Andrés T1.12 (MeLiAdapter) —→ Mateo T2.6 (IContextAssembler reads via adapters)
Sergio T3.24 (ICreditsGate backend) —→ Mateo T3.5a (HttpCreditGate calls POST /internal/gate)
Mateo T4.1 (WS streaming server) —→ Sergio T4.10 (WS client renders 8 event types)
Mateo T3.2 (ConfirmationFlow backend) —→ Sergio T3.20 (confirmation dialogs UI)
Mateo T3.4 (ProactiveSuggestionService) —→ Sergio T3.21 (suggestion cards UI)
9.12 Integration Milestones & Risk Gates Milestones de Integracion y Gates de Riesgo
S4 — Integration Gate 1: "It Talks"Gate de Integracion 1: "Habla"
GO / NO-GOGO / NO-GOGo Criteria (ALL must pass)Criterios Go (TODOS deben pasar)
- ✓ ReAct Orchestrator (#2) ↔ Tool Registry (#3) E2E workingOrquestador ReAct (#2) ↔ Tool Registry (#3) E2E funcionando
- ✓ Native Shell (#1) ↔ REST/WebSocket connection with fallback verifiedShell Nativo (#1) ↔ conexión REST/WebSocket con fallback verificada
- ✓ MeLi + Amazon adapters (#12) operational. 10 READ tool stubs returning mock dataAdaptadores MeLi + Amazon (#12) operacionales. 10 stubs READ respondiendo datos mock
- ✓ DevOps IaC scaffold (#14) ready for stagingScaffold IaC DevOps (#14) listo para staging
- ✓ Unit test coverage ≥70% on critical pathsCobertura tests unitarios ≥70% en caminos criticos
- ✓ Eval runner executes 15+ golden cases (T2.24)Eval runner ejecuta 15+ golden cases (T2.24)
- ✓ Beautonomous (#17) used for all task managementBeautonomous (#17) usado para todo el manejo de tareas
If Gate FailsSi el Gate Falla
- • Decision maker: Pablo (CEO)Tomador de decision: Pablo (CEO)
- • Option A: Extend S3-4 by 1 week, compress S5-6Opcion A: Extender S3-4 por 1 semana, comprimir S5-6
- • Option B: Ship Gate 1 with mock data, fix in S5Opcion B: Pasar Gate 1 con datos mock, arreglar en S5
- • Option C: Reassign work if one track is blockedOpcion C: Reasignar trabajo si un track esta bloqueado
Gate 1 Demo ScriptScript de Demo Gate 1
1. Open Electron app → MeLi loads in WebContentsViewAbrir app Electron → MeLi carga en WebContentsView
2. Open sidebar → type "How are my sales this week?"Abrir sidebar → escribir "Como van mis ventas esta semana?"
3. Watch full REST response → mock data from stubsVer respuesta REST completa → datos mock de stubs
4. Ask about product MLA123456 → READ stub execution → response with KB contextPreguntar sobre producto MLA123456 → ejecución de stub READ → respuesta con contexto KB
5. Switch tab to Amazon → context changes automaticallyCambiar tab a Amazon → contexto cambia automaticamente
S8 — Integration Gate 2: "It Acts"Gate de Integración 2: "Actúa"
GO / NO-GOGO / NO-GOGo Criteria (ALL must pass)Criterios Go (TODOS deben pasar)
- ✓ All 3 marketplaces (#12) + Billing (#13) flow completeFlujo completo 3 marketplaces (#12) + Billing (#13)
- ✓ Enrichment (#11) ANALYSIS tools returning cached dataEnrichment (#11) ANALYSIS tools retornando datos cacheados
- ✓ CI/CD pipeline (#14) auto-deploying to stagingPipeline CI/CD (#14) auto-deploy a staging
- ✓ Eval Suite (#16) LLM-as-Judge pipeline running in CIEval Suite (#16) pipeline LLM-as-Judge corriendo en CI
- ✓ WebSocket streaming working (T4.1)WebSocket streaming funcionando (T4.1)
- ✓ Proactive suggestions active (T3.4)Sugerencias proactivas activas (T3.4)
- ✓ Eval score ≥0.70 (T4.17)Eval score ≥0.70 (T4.17)
- ✓ Load test 50 users passes p95 <2s (T4.7)Load test 50 usuarios pasa p95 <2s (T4.7)
- ✓ WRITE tools + confirmation flow tested with real dataTools WRITE + flujo confirmación testeado con datos reales
- ✓ E2E test count ≥30 passingCantidad tests E2E ≥30 pasando
If Gate FailsSi el Gate Falla
- • Decision maker: Pablo (CEO)Tomador de decisión: Pablo (CEO)
- • Option A: Use S11-12 buffer for fixing (that's what it's there for)Opcion A: Usar buffer S11-12 para arreglar (para eso existe)
- • Option B: Ship beta with read-only (no WRITE tools)Opcion B: Lanzar beta con solo lectura (sin WRITE tools)
- • Option C: Cut Shopify, ship MeLi + Amazon onlyOpcion C: Cortar Shopify, lanzar solo MeLi + Amazon
S10 — Launch Gate: "It Ships"Gate de Lanzamiento: "Se Lanza"
LAUNCHLANZAMIENTOGo Criteria (ALL must pass)Criterios Go (TODOS deben pasar)
- ✓ All 4 tracks converge: full E2E with real Sellerfy dataLos 4 tracks convergen: E2E completo con datos reales de Sellerfy
- ✓ Guardrails (#7) input/output filtering activeGuardrails (#7) filtrado input/output activo
- ✓ Security review (OWASP top 10) passedRevision seguridad (OWASP top 10) aprobada
- ✓ Production deploy + monitoring + rollback testedDeploy produccion + monitoreo + rollback testeado
- ✓ 10+ beta users onboarded and active10+ beta users onboarded y activos
- ✓ Stripe billing live and testedBilling Stripe en vivo y testeado
- ✓ .dmg signed, download page live.dmg firmado, página de descarga en vivo
- ✓ Eval score ≥0.70Eval score ≥0.70
- ✓ API p95 <3sAPI p95 <3s
- ✓ 0 P0 bugs0 bugs P0
Rollback PlanPlan de Rollback
- • API: Lambda version revert (<1 min) / Cloud Run revision rollback for Data APIAPI: revertir version Lambda (<1 min) / rollback revision Cloud Run para Data API
- • App: auto-updater pushes hotfixApp: auto-updater envia hotfix
- • Data: DynamoDB point-in-time recoveryData: DynamoDB point-in-time recovery
- • If critical: disable WRITE tools server-side via ToolPolicyFilterSi crítico: deshabilitar WRITE tools server-side via ToolPolicyFilter
Quality Bar at Each GateBarra de Calidad en Cada Gate
TestingTesting
• S4: Unit ≥70%, E2E ≥10Unit ≥70%, E2E ≥10
• S7: Unit ≥80%, E2E ≥30Unit ≥80%, E2E ≥30
• S10: Unit ≥80%, E2E ≥50Unit ≥80%, E2E ≥50
PerformancePerformance
• API p95: S4 <5s · S7 <3s · S10 <3sAPI p95: S4 <5s · S7 <3s · S10 <3s
• Electron RAM <500MBRAM Electron <500MB
• First token: S4 N/A (REST) · S7+ <1sPrimer token: S4 N/A (REST) · S7+ <1s
ReliabilityConfiabilidad
• Error rate <1%Tasa error <1%
• OAuth refresh 100% successRefresh OAuth 100% exitoso
• Target SLA: 99.9%SLA objetivo: 99.9%
9.13 Risk Register Registro de Riesgos
| RiskRiesgo | Prob.Prob. | ImpactImpacto | MitigationMitigacion | OwnerDueno |
|---|---|---|---|---|
| Marketplace API rate limits block functionalityRate limits de API de marketplace bloquean funcionalidad | MED | HIGH | Redis cache from S5-6 (T3.16 IRateLimiter per marketplace) + batch sync + incremental queries + EnrichmentCache TTL (T3.9)Cache Redis desde S5-6 (T3.16 IRateLimiter por marketplace) + sync batch + queries incrementales + EnrichmentCache TTL (T3.9) | Andres |
| Electron app consumes too much RAM (>500MB)App Electron consume demasiada RAM (>500MB) | LOW | MED | Target 400MB, monitor from S3. Optimize WebContentsView if neededTarget 400MB, monitorear desde S3. Optimizar WebContentsView si necesario | Sergio |
| ReAct loop too slow (>5s per turn)Loop ReAct muy lento (>5s por turno) | MED | HIGH | Prompt caching, context pruning, faster model for routing, parallel subtasksPrompt caching, context pruning, modelo mas rapido para routing, subtareas paralelas | Mateo |
| OAuth2 token refresh failure (any marketplace)Falla de refresh de token OAuth2 (cualquier marketplace) | MED | HIGH | Manual fallback, user notification, retry with backoff, Secret ManagerFallback manual, notificacion al usuario, retry con backoff, Secret Manager | Andres |
| Proactive suggestions are noisy / low valueSugerencias proactivas son ruidosas / bajo valor | HIGH | MED | LLM inference via afterTool hook (no hardcoded rules), max 2/turn, 7-day dedup, iterate with beta feedbackInferencia LLM via hook afterTool (sin reglas hardcodeadas), max 2/turno, dedup 7 dias, iterar con feedback beta | Mateo |
| LLM uses tools incorrectlyLLM usa tools incorrectamente | MED | HIGH | Precise descriptions in ToolDefinition, MAX_ROUNDS=10, #16 Eval Suite, iterate system prompt v2/v3Descripciones precisas en ToolDefinition, MAX_ROUNDS=10, #16 Eval Suite, iterar system prompt v2/v3 | Mateo |
| WebContentsView has unexpected limitationsWebContentsView tiene limitaciones inesperadas | LOW | HIGH | Validate in Pre-Sprint session. If critical limitations: evaluate webview tag or sandboxed iframeValidar en sesión Pre-Sprint. Si limitaciones críticas: evaluar webview tag o iframe con sandbox | Sergio |
| MeLi Search API insufficient for competitor analysisMeLi Search API insuficiente para analisis de competidores | MED | MED | IMarketIntelligenceAdapter with MeLi Search API. Rainforest API for Amazon (Enrichment #11 dev plan). Redis cache with TTLIMarketIntelligenceAdapter con MeLi Search API. Rainforest API para Amazon (plan dev Enrichment #11). Cache Redis con TTL | Mateo |
| #1 Native Shell SPOF (1 engineer, 35% effort)#1 Shell Nativo SPOF (1 ingeniero, 35% esfuerzo) | MED | CRIT | Pablo cross-trains React/Electron by S4. If Sergio blocked, Pablo covers basic UI fixesPablo hace cross-training React/Electron para S4. Si Sergio bloqueado, Pablo cubre fixes UI básicos | Sergio |
| #11 Enrichment performance too slow for real-time#11 Enrichment performance muy lenta para tiempo real | MED | MED | Redis TTL-based cache. Pre-compute in DAGs. Accept async for heavy analysisCache Redis basado en TTL. Pre-computar en DAGs. Aceptar async para analisis pesados | Mateo |
| Multi-marketplace OAuth complexity (3 different flows)Complejidad OAuth multi-marketplace (3 flujos diferentes) | MED | MED | IOAuth2Flow generic interface (T1.15c). Each marketplace implements its own flow (Andres T2.10). Test each flow independentlyInterfaz genérica IOAuth2Flow (T1.15c). Cada marketplace implementa su propio flujo (Andres T2.10). Testear cada flujo independientemente | Andres |
| MarketplaceDetector breaks on URL changesMarketplaceDetector se rompe por cambio de URLs | MED | MED | Remote config patterns (JSON in GCS) updatable without re-deployPatterns remote config (JSON en GCS) actualizables sin re-deploy | Sergio |
| Amazon SP-API approval takes >4 weeks (E1)Aprobación Amazon SP-API demora >4 semanas (E1) | MED | HIGH | Request Day 1 (T1.15). AmazonAdapter scaffold with mocks. If not approved by S3-4, defer real Amazon to S5Solicitar Día 1 (T1.15). Scaffold AmazonAdapter con mocks. Si no aprobado en S3-4, diferir Amazon real a S5 | Andres |
| Mateo overloaded (79d estimated in 50d available — 1.58× capacity). Owns Intelligence + KB + DS pipelineMateo sobrecargado (79d estimados en 50d disponibles — 1.58× capacidad). Dueño de Intelligence + KB + pipeline DS | HIGH | CRIT | MANDATORY: reassign Enrichment CDK (T3.10) to Andrés S5–6. If slips ≥3d, defer remaining WRITE tools to S11–12OBLIGATORIO: reasignar CDK Enrichment (T3.10) a Andrés en S5–6. Si se atrasa ≥3d, diferir WRITE tools restantes a S11–12 | Mateo |
| UX/UI external team delivery delays block Sergio's Mockup tasksRetrasos en entrega del equipo externo UX/UI bloquean tareas Mockup de Sergio | MED | MED | UX/UI delivers T0.BB–T4.BB each sprint. If delayed, Sergio continues with placeholder tokens — Mockups shift to next sprint. Pablo manages UX/UI as approval gate. Soft dependency: shell code doesn't block on FigmaUX/UI entrega T0.BB–T4.BB cada sprint. Si se retrasa, Sergio continúa con tokens placeholder — Mockups se mueven al sprint siguiente. Pablo gestiona UX/UI como gate de aprobación. Dependencia suave: código del shell no bloquea por Figma | UX/UI |
| Apple notarization rejection delays signed .dmg buildsRechazo de notarización Apple retrasa builds .dmg firmados | MED | MED | Enroll Apple Developer in S0 (T0.9) to surface issues early. First unsigned canary in S1-2 (T1.32) to catch packaging problems. Signed build at Gate 1 (T2.40) with time to iterateInscribir Apple Developer en S0 (T0.9) para detectar problemas temprano. Primer canary sin firmar en S1-2 (T1.32) para detectar problemas de empaquetado. Build firmado en Gate 1 (T2.40) con tiempo para iterar | Pablo |
| Windows SmartScreen blocks .exe without sufficient reputationSmartScreen Windows bloquea .exe sin suficiente reputación | MED | LOW | Procure EV code signing certificate in S0 (T0.10) — EV certs bypass SmartScreen. If OV only, build reputation via internal installs. Windows is secondary to macOS for MVPAdquirir certificado EV code signing en S0 (T0.10) — certificados EV omiten SmartScreen. Si solo OV, construir reputación vía instalaciones internas. Windows es secundario a macOS para MVP | Pablo |
| Stripe integration more complex than expected (webhooks, idempotency)Integración Stripe más compleja de lo esperado (webhooks, idempotencia) | LOW | MED | Use Stripe Checkout hosted (no custom flow). Sergio starts T3.23 in S5 with time. Fail-open if billing doesn't respondUsar Stripe Checkout hosted (sin flujo custom). Sergio empieza T3.23 en S5 con tiempo. Fail-open si billing no responde | Sergio |
9.14 Infrastructure, DevOps & Cost Model Infraestructura, DevOps y Modelo de Costo
Stack Architecture (Layered) Arquitectura del Stack (por Capas)
1-PRODUCT — Native Shell (#1)1-PRODUCT — Shell Nativo (#1)
Electron 28+ • React 18+ • TypeScript • WebContentsView • WebSocket client • electron-builder/updater
2-INTELLIGENCE — Coach (#2–#8)2-INTELLIGENCE — Coach (#2–#8)
Claude Sonnet 4 (Anthropic API) • ReAct Orchestrator • Tool Registry (tool_use) • Prompt caching • IContextWindowManager • Node.js 22+ TypeScript (Lambda) • AWS API Gateway v2 (HTTP + WebSocket) • DynamoDB (conversations/traces/UserProfile) • PostgreSQL (AgentExecution, triggers créditos) • AWS Secrets Manager
3-KNOWLEDGE — Cerebro KB + Data Sync + Enrichment (#9–#11)3-KNOWLEDGE — Cerebro KB + Data Sync + Enrichment (#9–#11)
Python/FastAPI (GCP Cloud Run) • GCS Parquet (Data Lake) • BigQuery (embeddings) • Vertex AI text-embedding-004 (1024 dims) • Airflow (DAGs) • GCP Secret Manager
4-ACTION — Marketplace Provider (#12)4-ACTION — Marketplace Provider (#12)
MeLi REST API • Amazon SP-API • Shopify GraphQL • DynamoDB (marketplace-credentials + marketplace-actions) • Redis (ElastiCache, from S5-6) • AWS Secrets Manager
5-PLATFORM — Billing + DevOps (#13–#14)5-PLATFORM — Billing + DevOps (#13–#14)
PostgreSQL/RDS (billing/credits) • Stripe (Checkout + webhooks) • AWS CDK • Terraform (GCP) • GitHub Actions CI/CD
6-QUALITY — Feedback Loop + Eval Suite (#15–#16)6-QUALITY — Feedback Loop + Eval Suite (#15–#16)
Sentry • PagerDuty • CloudWatch (dashboards, alertas) • LLM-as-Judge (eval pipeline) • DynamoDB (FeedbackEntry + FeedbackThrottle) • Golden datasets (versioned in git)
7-INTERNAL — Beautonomous (#17)7-INTERNAL — Beautonomous (#17)
#17 CORE Beautonomous (OpenClaw) • Linear • GitHub
CI/CD Pipeline (GitHub Actions) Pipeline CI/CD (GitHub Actions)
Cross-cutting concern owned by Andres (#14 DevOps IaC + pipeline config) + Pablo (quality gates via Beautonomous). Ready by S7. Concern cross-cutting a cargo de Andres (#14 DevOps IaC + config pipeline) + Pablo (quality gates via Beautonomous). Listo para S7.
EnvironmentsAmbientes
- • dev —
npm run dev(Coach) /docker compose up(Data) / mocks for external APIsnpm run dev(Coach) /docker compose up(Datos) / mocks para APIs externas - • staging — Lambda (auto-deploy on merge) / Cloud Run Data APILambda (auto-deploy en merge) / Cloud Run Data API
- • production — Lambda + API Gateway (manual promote via CDK)Lambda + API Gateway (promote manual via CDK)
- • Rollback: Lambda version revert (<1 min)Rollback: revertir version Lambda (<1 min)
[CORREGIDO] Coach → Lambda/API Gateway (AWS), no Cloud Run
MonitoringMonitoreo
- • CloudWatch dashboards (latency, errors, RPS, LLM cost/conversation, tool executions, credits)Dashboards CloudWatch (latencia, errores, RPS, costo LLM/conversacion, tool executions, creditos)
- • PagerDuty: p95 >2s, error rate >1%PagerDuty: p95 >2s, tasa error >1%
- • Slack alerts: LLM cost/day >$50, OAuth failure, DAG failureAlertas Slack: costo LLM/día >$50, falla OAuth, falla DAG
- • DynamoDB point-in-time recovery (35 days)DynamoDB point-in-time recovery (35 dias)
- • Target SLA: 99.9% (8.7h/year downtime)SLA objetivo: 99.9% (8.7h/ano downtime)
SecuritySeguridad
- • AWS Secrets Manager (backend: #2,#3,#12) / GCP Secret Manager (data: #9,#10,#11) [CORREGIDO]AWS Secrets Manager (backend: #2,#3,#12) / GCP Secret Manager (datos: #9,#10,#11) [CORREGIDO]
- • AES-256-GCM for marketplace tokens in DynamoDBAES-256-GCM para tokens marketplace en DynamoDB
- • Memberstack JWT (safeStorage in Electron) + OAuth2 per marketplace (MeLi, Amazon LWA, Shopify)Memberstack JWT (safeStorage en Electron) + OAuth2 por marketplace (MeLi, Amazon LWA, Shopify)
- • Encryption at rest + in transit (TLS 1.3)Encriptación at rest + in transit (TLS 1.3)
- • OWASP top 10 checklist at S10Checklist OWASP top 10 en S10
- • Electron: CSP, sandbox, no nodeIntegrationElectron: CSP, sandbox, no nodeIntegration
Cloud Cost Estimate (Monthly, at 100 users) Estimacion Costo Cloud (Mensual, a 100 usuarios)
InfrastructureInfraestructura
AI + ServicesIA + Servicios
LLM Cost Model (Per-User Breakdown) Modelo de Costo LLM (Desglose Por-Usuario)
Free TierTier Free
- • 50 credits/month ≈ 50 tool calls
- • ~$0.15-0.25/user/mo (with caching)~$0.15-0.25/usuario/mes (con caching)
- • Read-only tools, lower token avgTools solo lectura, promedio tokens menor
- • When credits run out: everything blocked until monthly resetCuando se acaban créditos: todo bloqueado hasta reset mensual
Pro TierTier Pro
- • 500 credits/month ≈ 500 tool calls
- • ~$2.50-4.00/user/mo (with caching)~$2.50-4.00/usuario/mes (con caching)
- • Read+Write + proactive (+20% cost)Lectura+Escritura + proactivo (+20% costo)
- • When credits run out: buy Credit Packs or wait monthly resetCuando se acaban créditos: comprar Credit Packs o esperar reset mensual
AssumptionsSupuestos
- • Claude Sonnet 4 @ $3/$15 per 1M tokens
- • ~2K input + ~500 output tokens/call avg~2K input + ~500 output tokens/llamada promedio
- • Prompt caching reduces input cost 60-80%Prompt caching reduce costo input 60-80%
- • Margin: $49 - ~$4 = ~$45/Pro user/mo (91%)Margen: $49 - ~$4 = ~$45/usuario Pro/mes (91%)
Infrastructure Provisioning TimelineTimeline de Aprovisionamiento de Infraestructura
9.15 Ops Playbook & Launch Readiness Playbook de Operaciones y Preparacion de Lanzamiento
Runbooks (6 Scenarios) Runbooks (6 Escenarios)
1. API Latency Spike1. Pico de Latencia API
• Check Lambda scaling (concurrency, cold starts)Revisar escalado Lambda (concurrencia, cold starts)
• Check DynamoDB throttling (read/write capacity)Revisar throttling DynamoDB (capacidad lectura/escritura)
• Check Anthropic API latency (their status page)Revisar latencia API Anthropic (su status page)
• If Anthropic: enable context pruning, reduce tool countSi Anthropic: habilitar context pruning, reducir cantidad de tools
2. OAuth Token Refresh Failure2. Falla de Refresh Token OAuth
• Verify Secret Manager access + token stateVerificar acceso Secret Manager + estado del token
• Check adapter logs for specific marketplace errorRevisar logs del adaptador para error especifico del marketplace
• Manual re-auth: notify user to reconnect marketplaceRe-auth manual: notificar usuario para reconectar marketplace
• If systemic: check marketplace API status pageSi sistemico: revisar status page de API del marketplace
3. Credit Deduction Mismatch3. Discrepancia en Deduccion de Creditos
• Audit Stripe webhook events vs clients tableAuditar eventos webhook Stripe vs tabla clients
• Check credit_transactions logs for double deductionRevisar logs de credit_transactions por doble deducción
• Manual credit adjustment via admin APIAjuste manual de creditos via API admin
4. LLM Hallucination / Wrong Tool4. Alucinacion LLM / Tool Incorrecto
• Pull trace from ConversationTrace (DynamoDB) + AgentExecution (PostgreSQL)Extraer trace de ConversationTrace (DynamoDB) + AgentExecution (PostgreSQL)
• Analyze: which tool was called, what params, what contextAnalizar: que tool se llamo, que params, que contexto
• Fix: adjust tool description or add few-shot exampleFix: ajustar descripcion del tool o agregar few-shot example
• Log in #16 Eval Suite as regression testRegistrar en #16 Eval Suite como test de regresion
5. Data Sync DAG Failure5. Falla de DAG de Data Sync
• Check Airflow logs for specific DAGRevisar logs Airflow para DAG especifico
• Verify marketplace API accessibilityVerificar accesibilidad API del marketplace
• Manual DAG re-run via Airflow UIRe-run manual de DAG via UI Airflow
• Beautonomous alerts #deploys channel automaticallyBeautonomous alerta canal #deploys automaticamente
6. Electron App Crash6. Crash de App Electron
• Check Sentry for crash report + stack traceRevisar Sentry para reporte de crash + stack trace
• Check memory usage at crash timeRevisar uso de memoria al momento del crash
• If memory: optimize WebContentsView, limit tab countSi memoria: optimizar WebContentsView, limitar cantidad de tabs
• Push hotfix via auto-updater (GitHub Releases)Enviar hotfix via auto-updater (GitHub Releases)
Alert Configuration & On-Call Configuracion de Alertas y Guardia
Alert ThresholdsUmbrales de Alerta
• p95 latency > 2s → PagerDuty (Mateo)Latencia p95 > 2s → PagerDuty (Mateo)
• Error rate > 1% → PagerDuty (Andres)Tasa error > 1% → PagerDuty (Andres)
• LLM cost/day > $50 → Slack (Pablo)Costo LLM/día > $50 → Slack (Pablo)
• OAuth refresh failure → Slack (Andres)Falla refresh OAuth → Slack (Andres)
• DAG failure → Beautonomous → Slack #deploysFalla DAG → Beautonomous → Slack #deploys
On-Call RotationRotacion de Guardia
• Week A: Mateo (BE + orchestrator + performance)Semana A: Mateo (BE + orquestador + performance)
• Week B: Andres (data + APIs + infrastructure)Semana B: Andres (data + APIs + infraestructura)
• Sergio/Pablo: secondary (Electron + product)Sergio/Pablo: secundarios (Electron + producto)
• Escalation: 15min ack → 1h resolution targetEscalacion: 15min ack → 1h objetivo de resolucion
Launch Readiness Checklist (Week 10) Checklist de Preparacion de Lanzamiento (Semana 10)
TechnicalTécnico
☐ All unit tests passing (coverage ≥80%)Todos los tests unitarios pasando (cobertura ≥80%)
☐ All E2E tests passing (≥50 tests)Todos los tests E2E pasando (≥50 tests)
☐ API p95 latency <3s under loadLatencia API p95 <3s bajo carga
☐ Electron RAM <500MB with 3 tabsRAM Electron <500MB con 3 tabs
☐ OAuth2 refresh working for all 3 marketplacesRefresh OAuth2 funcionando para los 3 marketplaces
☐ CI/CD pipeline green on main branchPipeline CI/CD verde en branch main
☐ Lambda production deployed and healthy (CDK deploy)Lambda produccion desplegado y saludable (CDK deploy)
☐ Cloud Run Data API production deployed (Terraform)Cloud Run Data API producción desplegado (Terraform)
☐ Rollback tested and verified (<1 min recovery)Rollback testeado y verificado (<1 min recuperacion)
☐ #16 Eval Suite running on every PR#16 Eval Suite corriendo en cada PR
ProductProducto
☐ Beta feedback incorporated (top 5 issues fixed)Feedback beta incorporado (top 5 issues arreglados)
☐ Onboarding flow tested with 3+ non-technical usersFlujo onboarding testeado con 3+ usuarios no técnicos
☐ System prompt iterated based on real conversationsSystem prompt iterado basado en conversaciones reales
☐ ProactiveSuggestionService (after_tool hook) delivering value — verified via test casesProactiveSuggestionService (hook after_tool) entregando valor — verificado via test cases
☐ 28+ tools working (10 READ, 8 ANALYSIS operational; 4+ real WRITE in 3 marketplaces). 17 WRITE registered, implemented per circuit breaker28+ tools funcionando (10 READ, 8 ANALYSIS operativos; 4+ WRITE reales en 3 marketplaces). 17 WRITE registradas, implementadas según circuit breaker
☐ #7 Guardrails active (input/output filtering)#7 Guardrails activos (filtrado input/output)
BusinessNegocio
☐ Stripe billing live (Free + Pro + Credit Packs)Billing Stripe en vivo (Free + Pro + Credit Packs)
☐ Download page with .dmg link livePágina de descarga con link .dmg en vivo
☐ Support channel ready (email or Slack community)Canal de soporte listo (email o comunidad Slack)
☐ 10+ beta users onboarded and active10+ beta users onboarded y activos
Legal & SecurityLegal y Seguridad
☐ Privacy policy publishedPolitica de privacidad publicada
☐ Terms of service publishedTerminos de servicio publicados
☐ OWASP top 10 security review passedRevision seguridad OWASP top 10 aprobada
☐ Marketplace API compliance verified (MeLi, Amazon, Shopify TOS)Compliance de API de marketplace verificado (TOS MeLi, Amazon, Shopify)
☐ Apple code signing + notarizationCode signing + notarizacion Apple
Beta User Selection & Onboarding Seleccion de Beta Users y Onboarding
Selection CriteriaCriterios de Seleccion
• Current Sellerfy users (existing relationship)Usuarios actuales de Sellerfy (relacion existente)
• Active on MeLi (primary marketplace)Activos en MeLi (marketplace primario)
• Mix: 5 small sellers + 5 medium + 5 largeMix: 5 vendedores pequenos + 5 medianos + 5 grandes
• Willing to give feedback (15-min calls)Dispuestos a dar feedback (calls de 15 min)
• Mac users (Electron is Mac-only for MVP)Usuarios Mac (Electron es solo Mac para MVP)
Onboarding PlanPlan de Onboarding
• 2-min video walkthrough + setup documentVideo walkthrough de 2 min + documento de setup
• 1-on-1 Zoom call for first 5 users (Pablo)Llamada Zoom 1-on-1 para primeros 5 usuarios (Pablo)
• Async Slack channel for beta groupCanal Slack async para grupo beta
• Feedback form after 48h of usageFormulario de feedback despues de 48h de uso
• Follow-up calls at day 3, 7, 14Calls de seguimiento en dia 3, 7, 14
Week 10 Deliverable — The Complete Picture Entregable Semana 10 — El Panorama Completo
A seller of MercadoLibre, Amazon, or Shopify downloads Shopilot.app, connects their account, and can: Un vendedor de MercadoLibre, Amazon o Shopify descarga Shopilot.app, conecta su cuenta, y puede:
1. Talk to a copilot that knows their business (sales, inventory, competitors)Hablar con un copiloto que conoce su negocio (ventas, inventario, competidores)
2. Ask smart questions ("How do I improve this product?", "Who is my competition?")Preguntar cosas inteligentes ("Cómo mejoro este producto?", "Quién es mi competencia?")
3. Execute real actions (edit title, change price, toggle listing) — with confirmationEjecutar acciones reales (editar titulo, cambiar precio, activar/pausar publicacion) — con confirmacion
4. Receive proactive suggestions during conversation ("Your competitor dropped prices — want to reprice?", "5 products need attention")Recibir sugerencias proactivas durante la conversación ("Tu competidor bajó precios — ¿quieres repreciar?", "5 productos necesitan atención")
5. Everything from a native app where they also browse their marketplace with automatic context detectionTodo desde una app nativa donde también navegan su marketplace con detección automática de contexto
That is Shopilot MVP. 4 engineers, 10 weeks, AI-augmented. From 200 paying Sellerfy users to a new product with real sellers using it. Eso es Shopilot MVP. 4 ingenieros, 10 semanas, aumentados con IA. De 200 usuarios pagos de Sellerfy a un nuevo producto con vendedores reales usandolo.
9.16 Workflow & Meetings Flujo de Trabajo y Reuniones
Philosophy Filosofía
• Async-first.Async-first. Code speaks louder than meetings.El código habla más fuerte que las reuniones.
• Daily communication: Linear task updates + Slack messages. No standups.Comunicación diaria: Actualizaciones de tareas en Linear + mensajes en Slack. Sin standups.
• If it can be a Slack message, it IS a Slack message.Si puede ser un mensaje de Slack, ES un mensaje de Slack.
• Meetings (called “Coffee” internally) only when they add value.Las reuniones (llamadas “Coffee” internamente) solo cuando agregan valor.
Sprint Coffee Coffee de Sprint
• 1 Coffee at the start of each sprint cycle1 Coffee al inicio de cada ciclo de sprint
6 total: S1-2, S3-4, S5-6, S7-8, S9-10, S11-126 en total: S1-2, S3-4, S5-6, S7-8, S9-10, S11-12
• Plus 1 Kick-off Coffee in S0 (Pre-Sprint alignment)Más 1 Coffee de Kick-off en S0 (alineación Pre-Sprint)
• 30 minutes max30 minutos máximo
AgendaAgenda
• What you’ll do this cycleQué harás este ciclo
• What you need from othersQué necesitas de otros
• What blocks youQué te bloquea
• Format: Google MeetFormato: Google Meet
Total: 7 Sprint CoffeesTotal: 7 Sprint Coffees
Gate Coffee Coffee de Gate
• 1 Coffee at each Go/No-Go gate1 Coffee en cada gate Go/No-Go
3 total: Gate 1 (end of S4), Gate 2 (end of S8), Launch Gate (end of S10)3 en total: Gate 1 (fin de S4), Gate 2 (fin de S8), Launch Gate (fin de S10)
• Pablo is the decision makerPablo es quien decide
FormatFormato
• Demo + Go/No-Go voteDemo + votación Go/No-Go
• 45 minutes max45 minutos máximo
• Gate 2 Coffee can overlap with S9-10 Sprint CoffeeEl Coffee de Gate 2 puede superponerse con el Sprint Coffee de S9-10
Ad-hoc Coffee Coffee Ad-hoc
• Any engineer can request one at any timeCualquier ingeniero puede solicitar uno en cualquier momento
• No rules, but a reason is expectedSin reglas, pero se espera una razón
Typical ReasonsRazones Típicas
• Blocked >4hBloqueado >4h
• Cross-project dependency needs coordinationDependencia cross-proyecto necesita coordinación
• Scope changeCambio de alcance
• Architectural decision that affects another engineerDecisión arquitectónica que afecta a otro ingeniero
No bureaucracy — just post in #engineering Slack and schedule.Sin burocracia — solo postea en #engineering en Slack y agenda.
Coffee Calendar — Complete Timeline Calendario de Coffees — Timeline Completo
| WeekSemana | CoffeeCoffee | TypeTipo | ParticipantsParticipantes | DurationDuración |
|---|---|---|---|---|
| W0 | Kick-offKick-off | Sprint | All 4Los 4 | 30min |
| W1 | S1-2 StartInicio S1-2 | Sprint | All 4Los 4 | 30min |
| W3 | S3-4 StartInicio S3-4 | Sprint | All 4Los 4 | 30min |
| W4 | Gate 1: “It Talks”Gate 1: “Habla” | Gate | All 4 (Pablo decides)Los 4 (Pablo decide) | 45min |
| W5 | S5-6 StartInicio S5-6 | Sprint | All 4Los 4 | 30min |
| W7 | S7-8 StartInicio S7-8 | Sprint | All 4Los 4 | 30min |
| W8 | Gate 2: “It Acts”Gate 2: “Actúa” | Gate | All 4 (Pablo decides)Los 4 (Pablo decide) | 45min |
| W9 | S9-10 StartInicio S9-10 | Sprint | All 4Los 4 | 30min |
| W10 | Launch Gate: “It Ships”Launch Gate: “Se Lanza” | Gate | All 4 (Pablo decides)Los 4 (Pablo decide) | 45min |
Total scheduled Coffees: ~9 (7 Sprint + 3 Gate, minus 1 overlap Gate 2 / S9-10). Ad-hoc Coffees are additional. Total de Coffees programados: ~9 (7 Sprint + 3 Gate, menos 1 overlap Gate 2 / S9-10). Los Coffees ad-hoc son adicionales.
Golden Rule Regla de Oro
“If a Coffee can be a Slack message, it’s a Slack message. Respect everyone’s deep work time.” “Si un Coffee puede ser un mensaje de Slack, es un mensaje de Slack. Respeta el tiempo de trabajo profundo de todos.”
9.17 Linear Workspace — Project Management Linear Workspace — Gestión de Proyectos
What Changed: Specs → Linear Qué Cambió: Specs → Linear
• Before (v6): Tasks lived as T-codes in markdown files (sprints.md, engineers.md). No real-time tracking, no dependency visualization, no automated workflow.Antes (v6): Las tareas vivían como T-codes en archivos markdown (sprints.md, engineers.md). Sin tracking en tiempo real, sin visualización de dependencias, sin workflow automatizado.
• Now (v7): All 192 tasks are Linear issues (AUT-22 to AUT-213). Linear is the single source of truth for execution tracking. This blueprint remains the architectural reference.Ahora (v7): Las 192 tareas son issues de Linear (AUT-22 a AUT-213). Linear es la única fuente de verdad para tracking de ejecución. Este blueprint sigue siendo la referencia arquitectónica.
• 19 architectural projects remain in this blueprint for technical reference. In Linear, execution is organized into 6 time-bound Projects that group tasks by sprint phase.Los 19 proyectos arquitectónicos permanecen en este blueprint como referencia técnica. En Linear, la ejecución se organiza en 6 Proyectos por tiempo que agrupan tareas por fase de sprint.
Workspace Structure Estructura del Workspace
OrganizationOrganización
• Workspace: beautonomous
• Team: Shopilot (AUT)
• Issues:Issues: 192 (AUT-22 → AUT-213)
• Initiative:Iniciativa: MVP Shopilot — Cursor for eCommerce
Team MembersMiembros del Equipo
• Mateo — CTO (#2-#9, #11, #18)
• Andrés — Data+BE (#10, #12, #14)
• Sergio — Full-Stack (#1, #13, #15)
• Pablo — CEO (#16, #17, #18, #19)
6 Time-Bound Projects 6 Proyectos por Tiempo
Each Project groups all tasks from a sprint pair. The 19 architectural modules are tracked via Layer/ labels.
Cada Proyecto agrupa todas las tareas de un par de sprints. Los 19 módulos arquitectónicos se rastrean vía labels Layer/.
| # | ProjectProyecto | SprintsSprints | DatesFechas | IssuesIssues | PtsPts | LeadLead | GateGate |
|---|---|---|---|---|---|---|---|
| P1 | Walking Skeleton | S0 + S1-2 | Mar 11 → Mar 28 | 48 | 124 | Pablo | Gate 0 |
| P2 | Core Engines | S3-4 | Mar 31 → Apr 11 | 38 | 94 | Mateo | Gate 1 |
| P3 | WRITE + Billing + Design | S5-6 | Apr 14 → Apr 25 | 42 | 116 | Mateo | WRITE+Billing |
| P4 | Integration + Polish | S7-8 | Apr 28 → May 9 | 37 | 93 | Mateo | Gate 2 |
| P5 | Production + Launch | S9-10 | May 12 → May 23 | 18 | 52 | Pablo | Go/No-Go |
| P6 | Buffer | S11-12 | May 26 → Jun 6 | 9 | 34 | Pablo | Buffer |
| TotalTotal | 192 | 513 |
How 19 Architectural Modules Map to 6 Projects Cómo los 19 Módulos Arquitectónicos se Mapean a 6 Proyectos
The 19 modules in this blueprint (Section 8) are architectural — they define what each component does. The 6 Linear projects are temporal — they define when work gets done. A single architectural module contributes tasks to multiple projects across sprints. The bridge is the Layer/ label on each issue.
Los 19 módulos en este blueprint (Sección 8) son arquitectónicos — definen qué hace cada componente. Los 6 proyectos de Linear son temporales — definen cuándo se hace el trabajo. Un solo módulo arquitectónico contribuye tareas a múltiples proyectos a través de los sprints. El puente es el label Layer/ en cada issue.
P1: Walking Skeleton (48 issues, 124 pts)
All 7 layers bootstrapped. Electron scaffold (#1), AgentLoop ReAct (#2), KB fix (#9), MeLi/Amazon adapters (#12), OAuth2 + Redis (#14), Beautonomous bootstrap (#17), Figma Foundations (#18), Eval setup (#16). First canary build .dmg+.exe. Las 7 capas arrancadas. Electron scaffold (#1), AgentLoop ReAct (#2), KB fix (#9), MeLi/Amazon adapters (#12), OAuth2 + Redis (#14), Bootstrap Beautonomous (#17), Figma Foundations (#18), Eval setup (#16). Primer build canary .dmg+.exe.
Team: Pablo 14, Mateo 14, Andrés 13, Sergio 7Equipo: Pablo 14, Mateo 14, Andrés 13, Sergio 7
P2: Core Engines (38 issues, 94 pts)
Intelligence layer deepens. ToolRegistry + ToolPolicyFilter (#3), ContextAggregator (#5), 10 READ handlers, ConfirmationFlow (#2), KB incremental + batch (#9), ShopifyAdapter (#12), Data Sync DAGs (#10), Chat UI + OnboardingWizard (#1), Gate 1 signed build. La capa de inteligencia se profundiza. ToolRegistry + ToolPolicyFilter (#3), ContextAggregator (#5), 10 READ handlers, ConfirmationFlow (#2), KB incremental + batch (#9), ShopifyAdapter (#12), Data Sync DAGs (#10), Chat UI + OnboardingWizard (#1), Gate 1 build firmado.
Team: Mateo 15, Andrés 10, Sergio 8, Pablo 5Equipo: Mateo 15, Andrés 10, Sergio 8, Pablo 5
P3: WRITE + Billing + Design (42 issues, 116 pts)
Write capabilities + monetization. WRITE tools phase 1 (#3), InputGuard (#7), Enrichment scaffold (#11), Fast Data Layer + Amazon DAG (#10), BillingView + Stripe + credits (#13), Token pipeline Style Dictionary (#18), Figma Quality Eval extension (#16), KB BigQuery (#9). Capacidades de escritura + monetización. WRITE tools fase 1 (#3), InputGuard (#7), Enrichment scaffold (#11), Fast Data Layer + Amazon DAG (#10), BillingView + Stripe + credits (#13), Token pipeline Style Dictionary (#18), extensión Eval calidad Figma (#16), KB BigQuery (#9).
Team: Mateo 15, Sergio 12, Pablo 9, Andrés 6Equipo: Mateo 15, Sergio 12, Pablo 9, Andrés 6
P4: Integration + Polish (37 issues, 93 pts)
Full integration + quality. WebSocket streaming (#2), OutputGuard (#7), Feedback Loop scaffold (#15), WRITE tools remaining (#3), Staging deploy (#14), Load testing 50 users (#14), WS Electron client (#1), Desktop Build Eval extension (#16), Figma audit (#18), Gate 2 signed build. Integración completa + calidad. WebSocket streaming (#2), OutputGuard (#7), Feedback Loop scaffold (#15), WRITE tools restantes (#3), Deploy staging (#14), Load testing 50 usuarios (#14), WS client Electron (#1), extensión Eval Desktop Build (#16), auditoría Figma (#18), Gate 2 build firmado.
Team: Sergio 13, Pablo 11, Mateo 8, Andrés 5Equipo: Sergio 13, Pablo 11, Mateo 8, Andrés 5
P5: Production + Launch (18 issues, 52 pts)
Ship it. Production deploy AWS+GCP (#14), code signing .dmg + auto-updater (#1), security hardening CSP (#1), LLMGuardChecker + system prompt v3 (#7), Stripe live (#13), beta onboarding 10-15 sellers, OWASP review, Go/No-Go decision. A producción. Deploy producción AWS+GCP (#14), code signing .dmg + auto-updater (#1), security hardening CSP (#1), LLMGuardChecker + system prompt v3 (#7), Stripe live (#13), beta onboarding 10-15 sellers, OWASP review, decisión Go/No-Go.
Team: Pablo 6, Sergio 5, Andrés 4, Mateo 3Equipo: Pablo 6, Sergio 5, Andrés 4, Mateo 3
P6: Buffer (9 issues, 34 pts — all Circuit-Breaker scope)
Deferred + hardening. WRITE tools deferred (#3), p95 optimization (#2), prod hardening + monitoring (#14), adapter bug fixes (#12), beta UI bug fixes (#1), auto-updater S3 pipeline (#1), Windows build (#1), eval expansion (#16), postmortem (#17). Diferidos + hardening. WRITE tools diferidos (#3), optimización p95 (#2), hardening prod + monitoreo (#14), bug fixes adapters (#12), bug fixes UI beta (#1), pipeline auto-updater S3 (#1), Windows build (#1), expansión eval (#16), postmortem (#17).
Team: Sergio 3, Pablo 2, Andrés 2, Mateo 2Equipo: Sergio 3, Pablo 2, Andrés 2, Mateo 2
Total capacity: 513 estimated pts across 192 issues. At ~80 pts/cycle theoretical capacity (4 engineers × 80h), ratio is ~1.5× → Circuit-Breaker label identifies tasks that can be deferred to P6 Buffer. Capacidad total: 513 pts estimados en 192 issues. A ~80 pts/ciclo de capacidad teórica (4 ingenieros × 80h), el ratio es ~1.5× → El label Circuit-Breaker identifica tareas que pueden diferirse al P6 Buffer.
6 Cycles (2 Weeks Each) 6 Ciclos (2 Semanas Cada Uno)
Each Cycle maps 1:1 to a Project with identical dates. Issues are assigned to the cycle matching their sprint. Cada Ciclo mapea 1:1 con un Proyecto con fechas idénticas. Los issues se asignan al ciclo correspondiente a su sprint.
C1
Mar 11-28
S0+S1-2
C2
Mar 31-Apr 11
S3-4
C3
Apr 14-25
S5-6
C4
Apr 28-May 9
S7-8
C5
May 12-23
S9-10
Cool
May 26-Jun 6
S11-12
Config: 2-week cadence • Auto-create OFF • Started/Completed auto-add OFF • Cooldown for deferred tasks only Config: Cadencia 2 semanas • Auto-crear OFF • Auto-agregar Started/Completed OFF • Cooldown solo para tareas diferidas
5 Milestones (Gates) 5 Milestones (Gates)
| MilestoneMilestone | TargetMeta | CriteriaCriterios |
|---|---|---|
| Gate 0: APIs Connected | Mar 28 | MeLi + Amazon OAuth2 working, /context endpoint returns dataMeLi + Amazon OAuth2 funcionando, /context endpoint retorna datos |
| Gate 1: “It Reads” | Apr 11 | /conversation returns analysis with real MeLi+Amazon data. Signed build: .dmg notarized + .exe signed/conversation retorna análisis con datos reales MeLi+Amazon. Build firmado: .dmg notarizado + .exe firmado |
| WRITE + Billing Functional | Apr 25 | WRITE tools phase 1 working with confirmation flow. Stripe Checkout integrated. Credits gate enforcingWRITE tools fase 1 funcionando con flujo de confirmación. Stripe Checkout integrado. Credits gate aplicando |
| Gate 2: “It Acts” | May 9 | WRITE action executed with confirmation + rollback. Full .dmg+.exe with all S8 featuresAcción WRITE ejecutada con confirmación + rollback. Full .dmg+.exe con todas las features S8 |
| Go/No-Go | May 23 | 0 P0 bugs, uptime 99.9%, <50ms p95 adapters. Production build ready for distribution0 P0 bugs, uptime 99.9%, <50ms p95 adapters. Build de producción listo para distribución |
Label System Sistema de Labels
Layer/ (7 labels)
• Layer/1-Product • • Layer/2-Intelligence
• Layer/3-Knowledge • • Layer/4-Action
• Layer/5-Platform • • Layer/6-Quality
• Layer/7-Internal
Scope/ (3 labels)
• MVP-Critical — must ship for Gate 2debe estar para Gate 2
• Important — should ship, can defer 1 sprintdebería estar, puede diferirse 1 sprint
• Circuit-Breaker — only if capacity allowssolo si hay capacidad
Type/ (7 labels)
• Feature • Build • Gate • Eval
• Mockup • Design-Delivery • Setup
SpecialEspeciales (3 labels)
• figma-dependency — blocked until Figma deliverybloqueado hasta entrega Figma
• cross-team — needs coordination between engineersnecesita coordinación entre ingenieros
• needs-ac — acceptance criteria pendingcriterios de aceptación pendientes
Workflow & Automations Workflow y Automatizaciones
Issue WorkflowWorkflow de Issues
• Won’t Do — cancelled/out of scopecancelado/fuera de alcance
• Deferred — moved to S11-12 buffermovido al buffer S11-12
GitHub AutomationsAutomatizaciones GitHub
• PR opened with AUT-XX in branch → issue moves to In ProgressPR abierto con AUT-XX en branch → issue pasa a In Progress
• Review requested → issue moves to In ReviewReview solicitado → issue pasa a In Review
• PR merged → issue moves to DonePR mergeado → issue pasa a Done
Branch naming: AUT-XX-short-description
Naming de branches: AUT-XX-descripcion-corta
Fibonacci Estimates Estimaciones Fibonacci
ScaleEscala
1
4h
2
1d
3
2d
5
3d
8
5d
CapacityCapacidad
• 4 engineers × 80h/cycle = 320h available4 ingenieros × 80h/ciclo = 320h disponibles
• ~80 pts/cycle theoretical capacity~80 pts/ciclo capacidad teórica
• Ratio: ~1.5× capacity → circuit breaker neededRatio: ~1.5× capacidad → circuit breaker necesario
How Engineers Use Linear Cómo los Ingenieros Usan Linear
1. Open Linear → My Issues to see your assigned tasks for the current cycle.Abrir Linear → My Issues para ver tus tareas asignadas del ciclo actual.
2. Move issue to In Progress when you start working (or just open a PR with AUT-XX in the branch name).Mover issue a In Progress cuando empieces a trabajar (o simplemente abre un PR con AUT-XX en el nombre del branch).
3. Check blocked/blocking relations before starting — don’t start a task if its blocker isn’t done.Revisar relaciones blocked/blocking antes de empezar — no empieces una tarea si su blocker no está listo.
4. When a PR is merged, the issue auto-moves to Done.Cuando un PR se mergea, el issue se mueve automáticamente a Done.
5. Use Cmd+K to navigate quickly. Use Cycles view to see your sprint’s scope.Usar Cmd+K para navegar rápido. Usar vista Cycles para ver el alcance de tu sprint.
Architecture (Blueprint) vs Execution (Linear) Arquitectura (Blueprint) vs Ejecución (Linear)
This Blueprint (19 Projects)Este Blueprint (19 Proyectos)
• Architecture reference — what each module does, its components, APIs, data modelsReferencia arquitectónica — qué hace cada módulo, sus componentes, APIs, modelos de datos
• Technical specs, deep dives, acceptance criteriaSpecs técnicas, deep dives, criterios de aceptación
• Organized by architectural layer (7 layers, 19 projects)Organizado por capa arquitectónica (7 capas, 19 proyectos)
Linear (6 Projects)Linear (6 Proyectos)
• Execution tracking — who does what, when, status, blockersTracking de ejecución — quién hace qué, cuándo, estatus, blockers
• Real-time progress, GitHub PR integration, Slack notificationsProgreso en tiempo real, integración con GitHub PRs, notificaciones Slack
• Organized by time phase (6 projects, 6 cycles, 5 gates)Organizado por fase temporal (6 proyectos, 6 ciclos, 5 gates)
Bridge: Every Linear issue has a Layer/ label mapping it back to its architectural project. Filter by label to see all issues for a specific module.
Puente: Cada issue de Linear tiene un label Layer/ que lo mapea a su proyecto arquitectónico. Filtra por label para ver todos los issues de un módulo específico.
Critical Handoffs Handoffs Críticos
These are modeled as blocked/blocking relations in Linear. The blocking issue must be Done before the blocked issue can start. Estos están modelados como relaciones blocked/blocking en Linear. El issue bloqueante debe estar Done antes de que el issue bloqueado pueda empezar.
Engineer → EngineerIngeniero → Ingeniero
• Mateo T1.5 (REST) → Sergio T2.18 (WS client)
• Andrés T1.10 (/context) → Mateo T2.7 (ContextAgg)
• Mateo T3.2 (ConfirmationFlow) → Sergio T3.20 (confirmation UI)
• Sergio T3.24 (credits BE) → Mateo T3.5a (HttpCreditGate)
• Mateo T3.4 (ProactiveSuggestion) → Sergio T3.21 (suggestion cards)
• Mateo T4.1 (WS upgrade) → Sergio T4.10 (WS client)
UX/UI → SergioUX/UI → Sergio
• T0.BB (W1) → T1.19 Tabs+Sidebar + T1.MK1
• T1.BB (W2) → T2.17 Chat UI + T2.MK1/MK2
• T2.BB (S4) → T3.19 BillingView + T3.MK1-3
• T3.BB (S6) → T4.10 WS client + T4.MK1/MK2
• T4.BB (S8) → T5.9 Bug fixes + T5.MK1
10. Full Product Roadmap Roadmap del Producto Completo
Weeks 1-10 — Core ProductSemanas 1-10 — Producto Core
MercadoLibre + Amazon + Shopify. ~25 primitive tools + autonomous agent. Proactive intelligence. Native shell (Mac). Freemium (Free + Pro $49/mo) + Credit Packs. Beta with 10-15 Sellerfy users.MercadoLibre + Amazon + Shopify. ~25 herramientas primitivas + agente autonomo. Inteligencia proactiva. Shell nativa (Mac). Freemium (Free + Pro $49/mes) + Credit Packs. Beta con 10-15 usuarios Sellerfy.
Weeks 11-16 — More Tools + Billing + FeedbackSemanas 11-16 — Mas Herramientas + Billing + Feedback
15+ additional tools (campaigns, inventory reports, image generation). Feedback loop (#15). Business plan ($149/mo, 5K credits). More marketplace regions. 50+ active users.15+ herramientas adicionales (campanias, reportes inventario, generacion imagenes). Feedback loop (#15). Plan Business ($149/mes, 5K creditos). Mas regiones de marketplace. 50+ usuarios activos.
Weeks 17-22 — AI-Powered FeaturesSemanas 17-22 — Features Potenciados por IA
24/7 trend agents. Multi-agent parallel execution. Advanced proactive strategies. Windows build. Public API. 200+ users.Agentes de tendencias 24/7. Ejecucion multi-agente en paralelo. Estrategias proactivas avanzadas. Build Windows. API publica. 200+ usuarios.
Week 23+ — PlatformSemana 23+ — Plataforma
Multi-language. Marketplace Hub (WooCommerce, eBay, Alibaba). Tool marketplace (third-party tools). Enterprise features. WhatsApp channel. 1,000+ users.Multi-idioma. Marketplace Hub (WooCommerce, eBay, Alibaba). Marketplace de herramientas (tools de terceros). Features enterprise. Canal WhatsApp. 1,000+ usuarios.
11. Monetization & Pricing Monetizacion y Pricing
The Credit Model — Agent Interaction CostsEl Modelo de Creditos — Costos de Interaccion del Agente
Every autonomous interaction has a cost based on tools used + tokens consumed + human-hours equivalent. The agent composes primitive tools into workflows of varying complexity. The goal: the user always perceives so much value that when credits run out, they want to buy more. Cada interaccion autonoma tiene un costo basado en herramientas usadas + tokens consumidos + horas-humano equivalentes. El agente compone herramientas primitivas en flujos de complejidad variable. El objetivo: que el usuario siempre perciba tanto valor que, al quedarse sin creditos, quiera comprar mas.
Example: Agent Interaction CostsEjemplo: Costos de Interaccion del Agente
| Agent InteractionInteraccion del Agente | CreditsCreditos |
|---|---|
| Quick question (get_metrics, simple lookup)Pregunta rapida (get_metrics, consulta simple) | 1 |
| Autonomous action (write operation with confirmation)Accion autonoma (operacion de escritura con confirmacion) | 3 |
| Product audit (analyze, compare competitors, propose improvements)Auditoria de producto (analizar, comparar competidores, proponer mejoras) | 12 |
| Competitor deep-dive (search, compare 10+ competitors, generate report)Analisis profundo competencia (buscar, comparar 10+ competidores, generar reporte) | 10 |
| Full store optimization (audit all products, prioritize, propose changes)Optimizacion completa tienda (auditar todos los productos, priorizar, proponer cambios) | 25 |
Value PerceptionPercepcion de Valor
$49/mo = ~500 credits = hundreds of autonomous agent interactions, equivalent to $2,000-7,000 of manual analyst work.$49/mes = ~500 creditos = cientos de interacciones autonomas del agente, equivalente a $2,000-7,000 de trabajo manual de analista.
- 500 quick questions or500 preguntas rapidas o
- 100 product audits or100 auditorias de producto o
- 20 full store optimizations20 optimizaciones completas de tienda
$49 input → $2,000-$7,000 value output = 40-140x ROI$49 input → $2,000-$7,000 valor output = 40-140x ROI
Pricing TiersPlanes de Precio v2.1
Freemium model: low-friction onboarding, value-driven upgrade, credit packs for expansion revenue.Modelo freemium: onboarding sin friccion, upgrade por valor, credit packs para expansion revenue.
$0/mo
Free
- 50 credits/monthcreditos/mes
- 3 marketplaces (MeLi, Amazon, Shopify)
- 5 skills (read-onlysolo lectura)
- No proactive alertsSin alertas proactivas
- No credit packsSin credit packs
At 100%: everything blocked → upgrade CTAAl 100%: todo bloqueado → CTA upgrade
$49/mo
Pro
- 500 credits/monthcreditos/mes
- 3 marketplaces (MeLi, Amazon, Shopify)
- 8 tools (READ + WRITEREAD + WRITE)
- Proactive suggestions (LLM-based)Sugerencias proactivas (basadas en LLM)
- Credit packs availableCredit packs disponibles
- Chat supportSoporte por chat
At 80%: alert. At 100%: writes blocked, reads continue → buy pack CTAAl 80%: alerta. Al 100%: escrituras bloqueadas, lecturas siguen → CTA comprar pack
PHASE 2FASE 2
$149/mo
Business
- 5,000 credits/monthcreditos/mes
- 3 marketplaces + more regions3 marketplaces + mas regiones
- All tools + customTodas las tools + custom
- Full proactive engineMotor proactivo completo
- Priority support + onboardingSoporte prioritario + onboarding
- API accessAcceso API
Credit Packs (Pro only, Stripe one-time payment)Credit Packs (solo Pro, pago unico Stripe)
100 cr
$5
$0.050/cr
500 cr
$20
$0.040/cr
1,000 cr
$35
$0.035/cr
Credits added to current billing cycle. Do not roll over.Creditos se suman al ciclo actual. No se acumulan entre meses.
Unit EconomicsUnit Economics
85-92%
Gross marginMargen bruto
2-3 mo
CAC paybackPayback CAC
$0.02-0.08
Cost per creditCosto por credito
40-140x
User ROIROI usuario
MVP Pricing StrategyEstrategia de Pricing MVP v2.1
Launch with Free ($0) + Pro ($49/mo). Free tier lowers acquisition friction — users experience value before paying. Pro unlocks write skills + proactive alerts. Credit packs solve the "what happens when I run out" problem without forcing a plan upgrade. Revenue expansion via packs: est. 15-25% of Pro users buy 1+ pack/month.Lanzar con Free ($0) + Pro ($49/mes). El tier Free reduce friccion de adquisicion — usuarios experimentan valor antes de pagar. Pro desbloquea skills de escritura + alertas proactivas. Credit packs resuelven el "que pasa cuando se me acaban" sin forzar upgrade de plan. Expansion revenue via packs: est. 15-25% de usuarios Pro compran 1+ pack/mes.
12. Team & Execution Equipo y Ejecucion
4 AI-augmented engineers operating at 4-6x each. Effectively 16-24 engineering equivalents. See full team page. 4 ingenieros aumentados con IA operando a 4-6x cada uno. Efectivamente 16-24 ingenieros equivalentes. Ver pagina completa del equipo.
Pablo Estrada CEO & Product Engineer
15-year ecommerce veteran. $15M+ in sales processed. Ships features end-to-end with Claude. Owns product vision, system prompt design, QA, eval suite, and go-to-market.15 anos en ecommerce. $15M+ en ventas procesadas. Shipea features end-to-end con Claude. Dueno de la vision de producto, diseno de system prompt, QA, eval suite y go-to-market.
Mateo Quintero CTO
Owns all architecture decisions. ReAct Orchestrator, Tool Registry, intelligence layer, Cerebro Knowledge Base (Go 1.24 + Vertex AI), caching and cost optimization. Uses AI for code review and rapid prototyping.Dueno de todas las decisiones de arquitectura. Orquestador ReAct, Tool Registry, capa de inteligencia, Cerebro Knowledge Base (Go 1.24 + Vertex AI), caching y optimizacion de costos. Usa IA para code review y prototipado rapido.
Andres Leon Data + Backend
Built the Data Orchestrator from scratch. Owns marketplace adapters (MeLi + Amazon + Shopify), Context Aggregator, TokenManager, Data Sync, and production infrastructure.Construyo el Orquestador de Datos desde cero. Dueno de adaptadores marketplace (MeLi + Amazon + Shopify), Context Aggregator, TokenManager, Data Sync e infraestructura de produccion.
Sergio Murillo Full-Stack
Owns the Native Shell (Electron), React sidebar, all UI/UX, MeLi + Amazon + Shopify detection, WebSocket integrations, billing UI, and app distribution (packaging, signing, auto-updates).Dueno de la Shell Nativa (Electron), sidebar React, todo el UI/UX, deteccion MeLi + Amazon + Shopify, integraciones WebSocket, UI de billing y distribucion de la app (empaquetado, signing, auto-updates).
Why this team can ship in 10+2 weeksPor que este equipo puede shipearlo en 10+2 semanas
13. Version History Historial de Versiones
v7 — Mar 10, 2026
Final blueprint with Linear workspace fully configured. 192 issues (AUT-22 to AUT-213) created with Fibonacci estimates, labels, dependencies. 6 time-bound Projects (Walking Skeleton through Buffer), 6 Cycles (2 weeks each), 5 Milestones/Gates, 1 Initiative. GitHub org-level integration with workflow automations (PR→In Progress, Review→In Review, Merge→Done). Slack integration. New Section 9.17 documenting the complete Linear methodology. Task count expanded from 147 to 192 with UX/UI pipeline, Mockups, Eval extensions, and code signing builds. Blueprint final con Linear workspace completamente configurado. 192 issues (AUT-22 a AUT-213) creados con estimaciones Fibonacci, labels, dependencias. 6 Proyectos por tiempo (Walking Skeleton hasta Buffer), 6 Ciclos (2 semanas cada uno), 5 Milestones/Gates, 1 Iniciativa. Integración GitHub a nivel organización con automatizaciones de workflow (PR→In Progress, Review→In Review, Merge→Done). Integración Slack. Nueva Sección 9.17 documentando la metodología completa de Linear. Conteo de tareas expandido de 147 a 192 con pipeline UX/UI, Mockups, extensiones Eval, y builds de code signing.
v6 — Mar 4, 2026
Complete rewrite as incremental technical blueprint. 8 deep sections (What is Shopilot, Big Players, What We Have, What We Reuse, Architecture with 7 layers, Project Map, Beautonomous 13-subsection guide, How We Build with 19 project cards). Projects reorganized by architecture layer (1-19), layer-group UX with collapse/filter. Bilingual EN/ES throughout. Dark-mode glass-card design. Reescritura completa como blueprint técnico incremental. 8 secciones profundas (Qué es Shopilot, Grandes Players, Lo Que Tenemos, Lo Que Reutilizamos, Arquitectura con 7 capas, Mapa de Proyectos, Guía Beautonomous 13 subsecciones, Cómo Construiremos con 19 tarjetas de proyecto). Proyectos reorganizados por capa de arquitectura (1-19), UX de grupos por capa con collapse/filtro. Bilingüe EN/ES. Diseño dark-mode glass-card.
v5 — Mar 2-3, 2026
MVP 10-week execution plan with 5 sprint phases, 4 engineer tracks (Mateo/Andres/Sergio/Pablo), week-by-week deliverables matrix, critical path analysis, 9 risk categories, infra cost breakdown, milestones & gates, ops playbook, and cross-project dependency matrix. Monetization model (Free/Pro/Credit Packs). Team & execution page. Plan de ejecución MVP 10 semanas con 5 fases de sprint, 4 tracks de ingeniero (Mateo/Andrés/Sergio/Pablo), matriz de entregables semana por semana, análisis de ruta crítica, 9 categorías de riesgo, desglose de costos de infra, milestones y gates, playbook de operaciones, y matriz de dependencias cross-proyecto. Modelo de monetización (Free/Pro/Credit Packs). Página de equipo y ejecución.
v4 — Mar 2, 2026
13 projects rewritten with real implementation specs. 3 projects eliminated. 4 new projects (#14 DevOps, #16 Eval Suite, #7 Guardrails, #11 Enrichment Layer). Project Implementation Map added. 36 primitive tools documented. 4 QA audit rounds. Per-project changelogs. Sprint plan with integration milestones. 13 proyectos reescritos con specs reales de implementación. 3 proyectos eliminados. 4 proyectos nuevos (#14 DevOps, #16 Eval Suite, #7 Guardrails, #11 Enrichment Layer). Mapa de Implementación agregado. 36 tools primitivas documentadas. 4 rondas de auditoría QA. Changelogs por proyecto. Sprint plan con milestones de integración.
v3 — Feb 27-28, 2026
All 16 projects expanded to CORE level. CTO/PM audit: 7 critical fixes applied. Sidebar navigation + collapsible projects. Contradictions eliminated. Los 16 proyectos expandidos a nivel CORE. Auditoría CTO/PM: 7 correcciones críticas aplicadas. Navegación sidebar + proyectos colapsables. Contradicciones eliminadas.
v2.1 — Feb 27, 2026
Business model update: Freemium ($0, 50cr) + Pro ($49/mo, 500cr) + Credit Packs. Billing project added. Plan-aware personality engine. Actualización modelo de negocio: Freemium ($0, 50cr) + Pro ($49/mes, 500cr) + Credit Packs. Proyecto Billing agregado. Motor de personalidad plan-aware.
v2 — Feb 27, 2026
CTO technical review. Deferred Feedback Loop. Redis Day 1. GCP Secret Manager. WebContentsView. Simplified Skills Engine (direct tool_use). Added Amazon + Shopify to MVP scope. Revisión técnica del CTO. Diferido Feedback Loop. Redis Día 1. GCP Secret Manager. WebContentsView. Skills Engine simplificado (tool_use directo). Agregado Amazon + Shopify al scope del MVP.
v1 — Feb 26, 2026
Initial deep spec. 15 projects. MeLi-only MVP. Complete architecture and data models. Spec profundo inicial. 15 proyectos. MVP solo MeLi. Arquitectura completa y modelos de datos.
14. Design Guide — Brand Decisions + Build Plan Guía de Diseño — Decisiones de Marca + Plan de Construcción
Shopilot has no brand identity yetShopilot no tiene identidad de marca aún
9 design decisions pending → Brand Book does not exist → Technical build blocked until decisions are made9 decisiones de diseño pendientes → Brand Book no existe → Construcción técnica bloqueada hasta que se tomen las decisiones
0/9
decisionsdecisiones
–
Brand Book
–
ComponentsComponentes
9 Brand Decisions — Must be made before anything else9 Decisiones de Marca — Deben tomarse antes que todo lo demás
Each decision below has candidate options from the benchmark study. Once made, each answer goes directly into the Brand Book. The research that informs each decision is in the study sections below (§02 Benchmark, §21 Synthesis).Cada decisión tiene opciones candidatas del estudio de benchmark. Una vez tomada, cada respuesta va directamente al Brand Book. La investigación que informa cada decisión está en las secciones de estudio abajo (§02 Benchmark, §21 Síntesis).
Brand Emotion — what feeling does Shopilot own?Emoción de Marca — ¿qué sentimiento posee Shopilot?
This is the single sentence that drives every visual decision. Every color, every radius, every animation derives from this answer.Esta es la oración que guía cada decisión visual. Cada color, cada radio, cada animación deriva de esta respuesta.
Primary Brand Color — the one color that IS ShopilotColor Primario de Marca — el único color que ES Shopilot
Appears on every button, active state, focus ring, and logo. Choose a color no dominant competitor owns in the Latin American e-commerce tools space.Aparece en cada botón, estado activo, focus ring y logo. Elegir un color que ningún competidor dominante posee en el espacio de herramientas de e-commerce latinoamericano.
Background Mode — dark-first or light-first?Modo de Fondo — ¿dark-first o light-first?
Study finding: 11/16 power tools are dark-first. Sellers use Shopilot during working hours alongside marketplaces (which are light). This affects eye fatigue and the "feel" of the sidebar next to the marketplace WebView.Hallazgo del estudio: 11/16 herramientas de poder son dark-first. Los sellers usan Shopilot durante horas de trabajo junto a marketplaces (que son light). Esto afecta la fatiga visual y el "feel" del sidebar junto al WebView del marketplace.
Typography Pair — UI font + data fontPar Tipográfico — fuente UI + fuente de datos
2 fonts max. Rule: one sans for all text, one monospace for all numbers, prices, percentages, code. The mono font for numbers is functional, not stylistic — it keeps columns stable.Máximo 2 fuentes. Regla: una sans para todo el texto, una mono para todos los números, precios, porcentajes, código. La fuente mono para números es funcional, no estilística — mantiene las columnas estables.
Logo — commission a designer, not AI-generatedLogo — encargar a un diseñador, no generado por IA
Needs to work at 16px (macOS tray icon, favicon) AND at 512px (App Store). Outputs needed: icon.svg, wordmark.svg, logo-dark.svg, logo-light.svg, favicon.ico. Blocked until D1+D2 are decided.Debe funcionar a 16px (ícono del tray macOS, favicon) Y a 512px (App Store). Outputs necesarios: icon.svg, wordmark.svg, logo-dark.svg, logo-light.svg, favicon.ico. Bloqueado hasta que D1+D2 estén decididos.
Border Radius Style — sharp, standard, or rounded?Estilo de Border Radius — ¿sharp, estándar o redondeado?
This single decision changes how the product FEELS more than any color. Sharp = technical precision. Rounded = approachable. The mini buttons above in §02 Benchmark show the real difference.Esta única decisión cambia cómo se SIENTE el producto más que cualquier color. Sharp = precisión técnica. Redondeado = accesible. Los mini botones en §02 Benchmark muestran la diferencia real.
Shadow Policy — none, minimal, or soft?Política de Sombras — ¿ninguna, mínima o suave?
Shadows signal "weight". Linear and Vercel have zero shadows — creates a flat, fast, technical feel. Stripe and HubSpot use soft shadows — creates a layered, approachable feel. No middle ground works well.Las sombras señalan "peso". Linear y Vercel tienen cero sombras — crea una sensación plana, rápida y técnica. Stripe y HubSpot usan sombras suaves — crea una sensación de capas y accesibilidad. No hay término medio que funcione bien.
Semantic Colors — success / warning / error / infoColores Semánticos — éxito / advertencia / error / info
These are functional colors — used in audit log, fraud alerts, status badges, confirmation dialogs. They are NOT brand colors. The standard (green/amber/red/blue) works. The only question is the exact shade — must contrast well on the decided background.Son colores funcionales — usados en audit log, alertas de fraude, badges de estado, diálogos de confirmación. NO son colores de marca. El estándar (verde/amber/rojo/azul) funciona. La única pregunta es el tono exacto — debe contrastar bien en el fondo decidido.
UI Voice — how does every label, tooltip, and message sound?Voz de la UI — ¿cómo suenan todos los labels, tooltips y mensajes?
Every word in the UI is the brand speaking. "Save" vs "Save changes" vs "Apply". "Error" vs "Something went wrong" vs "Couldn't save — try again". This is not a technical decision — it's a brand voice decision.Cada palabra en la UI es la marca hablando. "Guardar" vs "Guardar cambios" vs "Aplicar". "Error" vs "Algo salió mal" vs "No se pudo guardar — intentalo de nuevo". No es una decisión técnica — es una decisión de voz de marca.
Where it lives & what goes in itDónde vive y qué contiene
Does not exist yetNo existe aún
Created after the 9 decisions above are madeSe crea después de tomar las 9 decisiones de arriba
Will contain (6 outputs)Contendrá (6 outputs)
Where to create itDónde crearlo
Option A: Figma file → "Shopilot Brand" (recommended — visual, shareable)Opción A: Archivo Figma → "Shopilot Brand" (recomendado — visual, compartible)
Option B: docs/BRAND_BOOK.md in this repo (fast, version-controlled)Opción B: docs/BRAND_BOOK.md en este repo (rápido, con control de versiones)
Technical Build — blocked until Track AConstrucción Técnica — bloqueado hasta Track A
Cannot start without brand color + typography + backgroundNo puede empezar sin color de marca + tipografía + fondo
Building tokens without brand decisions = wasted workConstruir tokens sin decisiones de marca = trabajo desperdiciado
After Track A is done, CTO builds in this orderCuando Track A esté listo, CTO construye en este orden
design-tokens.json → CSS vars → Tailwind config▼ Study & Reference — read to inform the decisions above ▼ Estudio y Referencia — leer para informar las decisiones de arriba
Philosophy — "Warm Precision" Filosofía — "Warm Precision"
The 6 most trusted software products of the AI era converge on a single unwritten visual language. The proposed candidate is "Warm Precision" — not yet decided: warm neutral backgrounds (not pure white), warm near-blacks (not #000), orange/coral accents, premium custom typography, and systematic spacing based on mathematical base units. Los 6 productos de software más confiables de la era de la IA convergen en un único lenguaje visual no escrito. Lo llamamos "Warm Precision": fondos neutros cálidos (no blancos puros), negros cálidos (no #000), acentos naranja/coral, tipografía premium custom y espaciado sistemático basado en unidades base matemáticas.
Warm NeutralsNeutros Cálidos
Backgrounds avoid extremes. Not #fff, not #000. Cream whites (#faf9f5) and warm blacks (#141413) reduce eye fatigue and communicate human warmth — critical for AI products where "coldness" generates distrust. Los fondos evitan extremos. No #fff, no #000. Blancos crema (#faf9f5) y negros cálidos (#141413) reducen fatiga visual y comunican calidez humana — crítico para productos de IA donde la "frialdad" genera desconfianza.
Precision SpacingEspaciado de Precisión
All spacings derive from 1–2 mathematical base units (Cursor: --g=10px, --v=22px; Anthropic: clamp-based fluid scale). No arbitrary pixels. This creates visual rhythm the user feels but doesn't consciously notice. Todos los espaciados derivan de 1–2 unidades base matemáticas (Cursor: --g=10px, --v=22px; Anthropic: escala fluida con clamp). No hay px arbitrarios. Esto crea un ritmo visual que el usuario siente pero no nota conscientemente.
Functional ColorColor Funcional
Color carries meaning. Only 3–4 semantic colors: success (green), warning (amber), error (red), info (blue). Brand accents are reserved for CTAs and critical emphasis — never decoration. Orange only when action is required. El color tiene significado. Solo 3–4 colores semánticos: éxito (verde), warning (amber), error (rojo), info (azul). Los acentos de marca se reservan para CTAs y énfasis crítico — nunca decoración. Naranja solo cuando se requiere acción.
Trust-First TypographyTipografía Trust-First
The best companies invest in custom or premium type. Legibility is non-negotiable. Using generic fonts (Arial, default Inter) communicates lack of care. Text is 70% of the UI in data-centric products — it must earn trust at every size. Las mejores empresas invierten en tipografía custom o premium. La legibilidad no es opcional. Usar fuentes genéricas (Arial, Inter por defecto) comunica falta de cuidado. El texto es el 70% de la UI en productos de datos — debe ganar confianza en cada tamaño.
Why it matters for ShopilotPor qué importa para Shopilot
Shopilot is an AI agent that manages sellers' real money. The UI must communicate trust (not a side project), precision (data is clean and readable), and control (the user feels they can trust the agent's actions). A "Warm Precision" design achieves this better than any vibrant palette or extreme minimalism. Shopilot es un agente de IA que maneja el dinero real de vendedores. La UI debe comunicar confianza (no parece un side project), precisión (los datos se ven limpios y legibles), y control (el usuario siente que puede confiar en las acciones del agente). Un diseño "Warm Precision" logra esto mejor que cualquier paleta vibrante o minimalismo extremo.
Benchmark — 8 Products, What Problem Each Solved With Design Benchmark — 8 Productos, Qué Problema Resolvió Cada Uno Con Diseño
Design decisions are never arbitrary. Every color, font, and layout pattern in a world-class product exists because someone was solving a specific problem. This section documents the real reasoning behind each brand — not just the output (hex codes), but the input (the problem and the why). For each brand: problem → key design decision → result. Las decisiones de diseño nunca son arbitrarias. Cada color, fuente y patrón de layout en un producto de clase mundial existe porque alguien estaba resolviendo un problema específico. Esta sección documenta el razonamiento real detrás de cada marca — no solo el output (códigos hex), sino el input (el problema y el por qué). Por cada marca: problema → decisión clave de diseño → resultado.
Anthropic.com + Claude.ai
#d97757 · r:8px · serif
THE PROBLEMEL PROBLEMA
Anthropic needed to communicate "powerful AI" without triggering the fear response that "cold + blue + robotic" design creates. Every AI competitor (Google, Microsoft, OpenAI) was using blue — the tech-corporate default. The product needed trust, not awe.Anthropic necesitaba comunicar "IA poderosa" sin activar la respuesta de miedo que genera el diseño "frío + azul + robótico". Cada competidor de IA usaba azul — el estándar corporativo tech. El producto necesitaba confianza, no asombro.
KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO
"Clay copper" — inspired by unfired clay and the Anthropocene epoch. Explicitly rejected: blue-shifted darks, futuristic aesthetics, corporate sans. Chose organic warmth (#CC785C) over digital precision. The serif typefaces (Styrene, Tiempos) reinforce the "human intellectual" positioning."Cobre arcilla" — inspirado en arcilla sin cocer y la época del Antropoceno. Rechazaron explícitamente: oscuros azulados, estética futurista, sans corporativa. Eligieron calidez orgánica (#CC785C) sobre precisión digital. Las tipografías serif (Styrene, Tiempos) refuerzan el posicionamiento "intelectual humano".
RESULTRESULTADO
Claude.ai is perceived as "the thoughtful AI" — distinct from GPT's sterile white and Gemini's corporate blue. The cream backgrounds (#faf9f5 light / #141413 dark) create a reading experience that feels more like a book than a software dashboard. Contrast: 19.9:1 AAA — the highest in the benchmark.Claude.ai es percibido como "la IA reflexiva" — distinto del blanco estéril de GPT y el azul corporativo de Gemini. Los fondos crema crean una experiencia de lectura que se siente más como un libro que un dashboard. Contraste: 19.9:1 AAA — el más alto del benchmark.
Exact CSS values extractedValores CSS exactos extraídos
Dark primary: #141413 · warm undertone (R>B)
Light primary: #faf9f5 · faintly toasted cream
Brand copper: #CC785C · logo + selection highlight
UI orange: #d97757 · CTAs, interactive elements
Selection bg: rgba(204,120,92,.5)
Fluid type: clamp(3rem → 5rem) display · clamp(1.125 → 1.25rem) body
Fonts: Styrene A/B (display) · Tiempos Text (body) · JetBrains Mono (code)
Nav height: 4.25rem = 68px
Chat max-width: max-w-3xl (768px) · messages 75ch
Motion: menu 400ms · dropdown 200ms · cubic-bezier(0.4,0,0.2,1)
Cursor IDE
#f54e00 · r:2px · mono
THE PROBLEMEL PROBLEMA
VS Code is functional but neutral — it has no strong personality. When Cursor launched as "the AI IDE", they needed the UI to communicate "this is the next generation of the editor" without alienating developers who are used to a neutral chrome. Too flashy = distrust. Too similar to VS Code = no differentiation.VS Code es funcional pero neutro — no tiene personalidad fuerte. Cuando Cursor lanzó como "el IDE con IA", necesitaban que la UI comunicara "esta es la próxima generación del editor" sin alejar a devs acostumbrados a un chrome neutro. Demasiado llamativo = desconfianza. Demasiado similar a VS Code = sin diferenciación.
KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO
Built a mathematical design system, not a visual one. Everything derives from 2 base units: --g ≈ 10px (grid) and --v ≈ 22px (vertical rhythm). The single bold accent (#f54e00 fire orange) only appears on 3 elements: the active tab, the streaming cursor, and hover states. Pure CSS vars — zero JS for theming. The UI disappears so the code (and AI) can emerge.Construyeron un sistema de diseño matemático, no visual. Todo deriva de 2 unidades base. El único acento audaz (#f54e00 naranja fuego) aparece en solo 3 elementos. CSS vars puras — cero JS para theming. La UI desaparece para que el código (y la IA) emerjan.
RESULTRESULTADO
Cursor feels simultaneously "familiar" (inherits VS Code's neutral density) and "new" (the AI panel integrated into the right side with warm dark #26251e feels like a completely different layer). The mathematical spacing system means everything aligns perfectly even when user content is dynamic. Most important: developers don't feel like they're using a "designed" product.Cursor se siente simultáneamente "familiar" (hereda la densidad neutra de VS Code) y "nuevo" (el panel AI integrado a la derecha con #26251e cálido se siente como una capa completamente diferente). El sistema de espaciado matemático hace que todo esté perfectamente alineado incluso con contenido dinámico.
Exact CSS values (main.css)Valores CSS exactos (main.css)
Base units: --g: calc(10rem/16) ~10px · --v: 1rem*1.4 ~22px
Opacity system: --fg-01 → --fg-100 (every 5% step as hex)
Duration: --duration: .14s · --duration-slow: .25s
Easing: --ease-out-spring: cubic-bezier(.25,1,.5,1)
Text scale: sm: 11px · base: 12px · lg: 13px
Shadows: ultra minimal — 0 0 1rem #00000005 (flyout only)
Breakpoints: 420 · 660 · 768 · 900 · 1140 · 1380px
OS detection: data-os=linux → system font fallback
Linear
#5e6ad2 · r:6px · no shadow
THE PROBLEMEL PROBLEMA
Jira was (and is) the default project management tool — and it feels like bureaucracy. Every action has friction. Every screen has visual noise. Loading spinners everywhere. Linear's founders decided that the product's design IS the product's value proposition — speed and clarity are not features, they are the brand.Jira era (y es) la herramienta de gestión de proyectos por defecto — y se siente como burocracia. Cada acción tiene fricción. Cada pantalla tiene ruido visual. Los fundadores de Linear decidieron que el diseño del producto ES la propuesta de valor del producto — velocidad y claridad no son características, son la marca.
KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO
Optimistic UI on every action — no confirmation dialogs for reversible operations, no loading states for local actions. The entire design vocabulary is built on 3 rules: no gradients (gradients = effort = slowness), no decorative shadows (shadows = heavy = bureaucracy), opacity-over-color (derive all grays from the brand color at different opacities, not a separate gray scale). The indigo #5e6ad2 is deliberately calm — not exciting, not urgent, just authoritative.UI optimista en cada acción. El vocabulario de diseño completo se construye sobre 3 reglas: sin gradientes, sin sombras decorativas, opacidad sobre color nuevo. El índigo #5e6ad2 es deliberadamente calmado — no emocionante, no urgente, simplemente autoritativo.
RESULTRESULTADO
Linear is used as a benchmark of "what great software feels like" in every design community. The product grew primarily through word-of-mouth among developers because the experience is demonstrably different. The lesson: design clarity is a growth strategy. Engineers and PMs show it to colleagues as an example of quality.Linear se usa como benchmark de "cómo se siente el software excelente" en cada comunidad de diseño. El producto creció principalmente por word-of-mouth entre desarrolladores porque la experiencia es demostrablemente diferente. La lección: la claridad de diseño es una estrategia de crecimiento.
Arc Browser
user color · r:20px pill · SF Pro
THE PROBLEMEL PROBLEMA
Chrome's UI is the most used interface in the world — and it has no identity. It's deliberately invisible. Arc wanted to make the browser a "personal space" that feels different for each user. The challenge: how do you build a browser with a strong personality without imposing one personality on everyone?La UI de Chrome es la interfaz más usada del mundo — y no tiene identidad. Arc quería hacer el browser un "espacio personal" que se siente diferente para cada usuario. El desafío: ¿cómo construyes un browser con personalidad fuerte sin imponer una personalidad a todos?
KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO
User-owned accent color — the product's "brand color" is whatever the user picks. This is the opposite of every other product. The sidebar UI uses macOS system dark (#1c1c1e) as the only fixed base, and everything else derives from the user's color choice. The left sidebar is persistent, not the top bar — inverting 20 years of browser convention.Color de acento del usuario — el "color de marca" del producto es el que elige el usuario. El sidebar usa macOS system dark (#1c1c1e) como única base fija, y todo lo demás deriva del color elegido por el usuario. El sidebar izquierdo es persistente, no la barra superior — invirtiendo 20 años de convención de browsers.
RESULTRESULTADO
Arc screenshots look different on every user's computer — generating enormous organic social sharing. People screenshot their Arc setup like they screenshot their iPhone homescreen. The product's design became its marketing. Relevance for Shopilot: the sidebar-as-primary-chrome pattern is exactly the 70/30 Shopilot split.Las capturas de Arc se ven diferentes en cada computador — generando enorme sharing social orgánico. La gente captura su setup de Arc como capturan su homescreen del iPhone. El diseño del producto se convirtió en su marketing. Relevancia para Shopilot: el patrón sidebar-como-chrome-principal es exactamente el split 70/30 de Shopilot.
Stripe
#635bff · r:6px · shadow
THE PROBLEMEL PROBLEMA
Stripe handles billions of dollars. Their design problem: financial products are traditionally austere, green (trust/money), and boring — because the assumption is that "serious = colorless". Stripe needed to feel trustworthy AND modern AND developer-friendly at the same time, for three completely different audiences: CTOs, developers, and CFOs.Stripe maneja miles de millones de dólares. Su problema de diseño: los productos financieros son tradicionalmente austeros, verdes y aburridos — porque la suposición es que "serio = sin color". Stripe necesitaba sentirse confiable Y moderno Y amigable para desarrolladores al mismo tiempo, para tres audiencias completamente diferentes: CTOs, devs y CFOs.
KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO
Split personality by surface: the marketing site uses deep navy (#0a2540) + gradient aurora effects + large editorial typography to signal "world-class and premium". The product dashboard is pure white + minimal — so CFOs feel "this is clean and trustworthy". The documentation uses monospace + code samples everywhere — so developers feel "this was built for us". Three audiences, three sub-designs, one brand color: purple #635bff.Personalidad dividida por superficie: el sitio de marketing usa navy profundo + gradientes aurora + tipografía editorial grande para señalar "primera clase y premium". El dashboard del producto es blanco puro + minimal — para que los CFOs sientan "esto es limpio y confiable". La documentación usa mono + code samples — para que los devs sientan "esto fue construido para nosotros". Tres audiencias, tres sub-diseños, un color de marca: violeta #635bff.
RESULTRESULTADO
Stripe's landing page is widely considered the benchmark of "premium product marketing" — it redefined what a fintech company's site should look like. Critical insight for Shopilot: handling money requires design that communicates both precision (clean numbers, clear status) AND trust (not too flashy, nothing decorative near financial data). White space is a feature around numbers.El sitio de Stripe es considerado el benchmark de "marketing de producto premium". Insight crítico para Shopilot: manejar dinero requiere diseño que comunique tanto precisión (números limpios, estado claro) como confianza (nada decorativo cerca de datos financieros). El espacio en blanco es una característica alrededor de los números.
HubSpot Canvas
#FF7A59 · r:6px · sans
THE PROBLEMEL PROBLEMA
HubSpot serves non-technical users — sales reps and marketers who are not designers and do not care about design. Their predecessor (Salesforce) was notorious for dense, overwhelming UIs. The risk: building an "impressive" design system that designers love but sales reps find confusing. The audience is the person who hates software complexity.HubSpot sirve a usuarios no técnicos — representantes de ventas y marketers que no son diseñadores y no les importa el diseño. Su predecesor (Salesforce) era famoso por UIs densas y abrumadoras. El riesgo: construir un sistema de diseño "impresionante" que los diseñadores amen pero los representantes de ventas encuentren confuso.
KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO
"Sprocket-right" principle: every component is evaluated by whether it helps the user accomplish the task, not by whether it looks good. Custom typography (HubSpot Sans + HubSpot Serif via Typekit) — but only because generic fonts signal "we don't care". Orange (#FF7A59) is warm and approachable — the opposite of the cold blue of Salesforce. 6px border radii everywhere: soft enough to not feel corporate, tight enough to not feel "cute".Principio "Sprocket-right": cada componente se evalúa por si ayuda al usuario a completar la tarea, no por si se ve bien. Naranja (#FF7A59) es cálido y accesible — lo opuesto al frío azul de Salesforce. Radios de 6px: lo suficientemente suave para no sentirse corporativo, lo suficientemente ajustado para no sentirse "lindo".
RESULT / RELEVANCE FOR SHOPILOTRESULTADO / RELEVANCIA PARA SHOPILOT
HubSpot's user is the closest analog to Shopilot's seller. Both are: non-technical, results-oriented, using the software during their working day (not a "power user" session). The key lesson: density is the enemy. Every piece of data a seller sees should be immediately interpretable. No scanning. No decoding. The number should tell the story.El usuario de HubSpot es el análogo más cercano al seller de Shopilot. Ambos son: no técnicos, orientados a resultados, usan el software durante su jornada. La lección clave: la densidad es el enemigo. Cada dato que ve un seller debe ser interpretable inmediatamente. Sin escanear. Sin decodificar. El número debe contar la historia.
Vercel Geist
#ffffff btn · r:6px · Geist
THE PROBLEMEL PROBLEMA
Vercel competes against AWS, Google Cloud, and Heroku. The problem: every cloud platform feels the same — blue, corporate, dense. Vercel's audience (frontend developers) is design-literate and will immediately judge a product's quality by its visual craftsmanship. The product needed to feel like it was built by people who care about craft — because that's who their users are.Vercel compite contra AWS, Google Cloud y Heroku. El problema: cada plataforma cloud se siente igual — azul, corporativa, densa. La audiencia de Vercel (devs frontend) es diseño-literate y juzgará inmediatamente la calidad de un producto por su artesanía visual. El producto necesitaba sentirse construido por personas que se preocupan por el craft — porque eso es lo que son sus usuarios.
KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO
Maximum constraint = maximum differentiation. Pure #000000 dark + #fafafa light — the most extreme contrast possible, no warm undertone, no "personality" color at all. The result is that Vercel looks like a luxury product, not a tech product — like an Apple product page. They built a custom open-source font (Geist) to reinforce the "we invest in craftsmanship" signal. Zero decorative elements. Zero shadows. Negative space as the only visual tool.Constrainment máximo = diferenciación máxima. #000000 puro + #fafafa puro. Sin matiz cálido, sin color de "personalidad". El resultado: Vercel se parece a un producto de lujo, no a un producto tech — como una página de Apple. Construyeron una fuente open-source personalizada (Geist) para reforzar la señal "invertimos en artesanía". Cero elementos decorativos. Cero sombras.
RESULTRESULTADO
Vercel's design became a cultural signal in the frontend community. "Vercel-style" is now shorthand for "brutalist minimalism done premium". The open-source Geist font is downloaded thousands of times per week by developers who want to use it in their own products. Important caveat: this approach requires absolute discipline — one wrong design decision and "minimal" becomes "empty".El diseño de Vercel se convirtió en una señal cultural en la comunidad frontend. "Estilo Vercel" es ahora abreviatura de "minimalismo brutalista hecho premium". Advertencia importante: este enfoque requiere disciplina absoluta — una decisión de diseño incorrecta y "minimal" se convierte en "vacío".
Shopify Polaris
#008060 · r:4px · neutral
THE PROBLEMEL PROBLEMA
Shopify's admin is used by 2M+ merchants who range from solo entrepreneurs running their first store to enterprise brands managing thousands of SKUs. Their common trait: they are not designers, they are business owners who need to take a specific action (change a price, fulfill an order) in 10 seconds or less. Every second of confusion is a second of lost revenue.El admin de Shopify es usado por 2M+ comerciantes que van desde emprendedores solos con su primera tienda hasta marcas enterprise manejando miles de SKUs. Su rasgo común: no son diseñadores, son dueños de negocios que necesitan tomar una acción específica en 10 segundos o menos.
KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO
Every data visualization rule is merchant-first: show the total first (large, bold, row 1) — not the chart. One insight per visualization, never two. Always provide multiple formats (number + percentage + delta). The color system has one rule: green #008060 = Shopify = money = growth. It only appears when something positive happened. 100% of design decisions are tested against: "does a first-time merchant understand this in under 5 seconds?"Cada regla de visualización de datos es merchant-first: mostrar el total primero (grande, bold, fila 1) — no el gráfico. Un insight por visualización, nunca dos. Siempre proporcionar múltiples formatos. El verde #008060 solo aparece cuando algo positivo sucedió. 100% de las decisiones de diseño se prueban contra: "¿entiende esto un comerciante por primera vez en menos de 5 segundos?"
RESULT / DIRECT LESSON FOR SHOPILOTRESULTADO / LECCIÓN DIRECTA PARA SHOPILOT
Polaris is the most-studied design system for e-commerce tools because it solved the "seller comprehension" problem at scale. Direct lesson for Shopilot: every KPI card, every data table, every chart must pass the "5-second merchant test". If a seller needs to think for more than 5 seconds to understand a data point, the design failed — regardless of how good it looks.Polaris es el sistema de diseño más estudiado para herramientas de e-commerce. Lección directa para Shopilot: cada KPI card, cada tabla de datos, cada gráfico debe pasar el "test del comerciante de 5 segundos". Si un seller necesita pensar más de 5 segundos para entender un dato, el diseño falló.
What Actually Differentiates Each BrandLo Que Realmente Diferencia Cada Marca
Not the hex codes — the why behind the hex codes.No los hex codes — el por qué detrás de los hex codes.
| BrandMarca | Unique differentiatorDiferenciador único | Emotion targetedEmoción objetivo | What Shopilot can learnQué puede aprender Shopilot |
|---|---|---|---|
| Anthropic | Clay copper that rejected "AI = cold + blue"Cobre arcilla que rechazó "IA = frío + azul" | Intellectual trustConfianza intelectual | AI products can be warm. Warmth = trust for agents that handle real stakes.Los productos AI pueden ser cálidos. Calidez = confianza para agentes con apuestas reales. |
| Cursor | Mathematical system (2 base units) that makes the UI disappearSistema matemático (2 unidades base) que hace desaparecer la UI | Invisible powerPoder invisible | --g and --v base units. Sidebar chrome should never compete with the marketplace.Unidades base --g y --v. El chrome del sidebar nunca debe competir con el marketplace. |
| Linear | Design as the product's value proposition, not its wrapperEl diseño como propuesta de valor del producto, no como envoltorio | Speed as feelingVelocidad como sensación | No confirmation dialogs for reversible coach actions. Optimistic UI where safe.Sin confirmaciones para acciones reversibles del coach. UI optimista donde sea seguro. |
| Arc | Persistent left sidebar as primary chrome (inverted the tab bar convention)Sidebar izquierdo persistente como chrome principal (invirtió la convención del tab bar) | Personal ownershipPropiedad personal | The 70/30 split with a fixed right sidebar is the same pattern — marketplace is the "browser", sidebar is the "Arc panel".El split 70/30 con sidebar derecho fijo es el mismo patrón — el marketplace es el "browser", el sidebar es el "panel Arc". |
| Stripe | White space as a trust signal around financial dataEspacio en blanco como señal de confianza alrededor de datos financieros | Premium reliabilityConfiabilidad premium | Nothing decorative near prices, inventory counts, or revenue figures. Data must breathe.Nada decorativo cerca de precios, inventario o ingresos. Los datos deben respirar. |
| HubSpot | "Sprocket-right" — function evaluated from user's task, not designer's taste"Sprocket-right" — función evaluada desde la tarea del usuario, no el gusto del diseñador | Approachable competenceCompetencia accesible | Sellers are not power users. Every screen passes the "new seller in 10 seconds" test.Los sellers no son power users. Cada pantalla pasa el test "seller nuevo en 10 segundos". |
| Shopify | The "5-second merchant test" applied to every data visualizationEl "test del comerciante de 5 segundos" aplicado a cada visualización de datos | Clarity = profitClaridad = ganancia | Total first. Number big. Delta visible. One chart = one insight. The dashboard is a decision tool, not a data dump.Total primero. Número grande. Delta visible. Un gráfico = un insight. El dashboard es una herramienta de decisión, no un vertedero de datos. |
Color — Palette & Semantic Tokens Color — Paleta & Tokens Semánticos
Shopilot Dark Palette (proposed)Paleta Dark Shopilot (propuesta)
BackgroundsFondos
--bg
#0f0e0d
--bg-2
#1a1917
--bg-3
#242220
TextTexto
--text
#f5f3ef
--text-2
#c8c5be
--text-3
#8b8880
--text-4
#5a5855
Brand AccentAcento Marca
--orange
#f97316
--orange-2
#ea6c0a
SemanticSemántico
--green
Success
--amber
Warning
--red
Error
--blue
Info/Analysis
--purple
Ctx/Technical
Color usage rulesReglas de uso del color
🟠 Orange: CTA only · AI actions · new notifications. Never decoration.Solo CTAs · acciones de AI · notificaciones nuevas. Nunca decoración.
🟢 Green: Confirmations · price rising · OK · buy box winning.Confirmaciones · precio subiendo · OK · ganando buy box.
🔴 Red: Errors · blocks · fraud alerts · price vs floor.Errores · bloqueos · alertas fraude · precio vs. floor.
🟡 Amber: Warnings · pending · TTL expiring · intermediate states.Warnings · pendientes · TTL expirando · estados intermedios.
🔵 Blue: ANALYSIS tools (read-only) · contextual info.Tools ANALYSIS (solo lectura) · info contextual.
🟣 Purple: Technical context · tokens · ctx window · system info.Contexto técnico · tokens · ctx window · info de sistema.
Key Pattern: Opacity-based color architecture (from Cursor)Patrón clave: Arquitectura de color basada en opacidad (de Cursor)
Instead of creating new colors for every state, derive all shades from the foreground color at varying opacity: rgba(var(--text-raw), 0.05) → 0.10 → 0.15 → 0.20 → ... This ensures all UI states are automatically harmonious and theme-compatible.
En vez de crear nuevos colores para cada estado, derivar todos los matices del color foreground a diferente opacidad: rgba(var(--text-raw), 0.05) → 0.10 → 0.15 → 0.20 → ... Esto garantiza que todos los estados de la UI sean automáticamente armoniosos y compatibles con el tema.
Typography — Scale & Pairing Tipografía — Escala & Pairing
Display / Headings
Styrene A
Geometric · slightly humanist · "squarish f,j,r,t" — technical precision with personality. Geométrica · levemente humanista · "f,j,r,t cuadrados" — precisión técnica con personalidad.
Anthropic use: marketing headlines
Body / UI
Styrene B / Inter
The words that matter most.
Narrower · "gentle with words" · readable at 11–14px · handles dense data tables. Más condensada · "gentil con las palabras" · legible a 11–14px · maneja tablas de datos densas.
Mono / Data
$84.99 → $79.99
update_price B09XYZ
BSR: 654 · BB: 63%
JetBrains Mono — all prices, percentages, tool names, IDs. Tabular figures = scannable data. JetBrains Mono — todos los precios, porcentajes, nombres de tools, IDs. Tabular figures = datos escaneables.
Anthropic complete type system (confirmed data)Sistema tipográfico completo de Anthropic (datos confirmados)
9 font families loaded: AnthropicSans · AnthropicSerif · AnthropicMono (proprietary, emerging) + Copernicus Book/Medium (Galaxie, Chester Jenkins+Kris Sowersby 2009) + StyreneA Regular/Medium + StyreneB Regular/Medium (Berton Hasebe, Commercial Type) + TiemposText Regular/Medium (Klim Type Foundry) + JetBrainsMono (variable TTF)
Marketing roles: Styrene A (headlines) · Styrene B (subheads, nav) · Tiempos Text (body long-form)
Claude.ai product roles: Galaxie Copernicus Book (UI headings) · Styrene B (input, UI labels, 400/500/700) · Tiempos Text (AI response prose — "the AI speaks in editorial serif, not system sans")
Pairing logic: Copernicus+Tiempos share a Kris Sowersby lineage → optical harmony. Styrene provides geometric contrast. Serif = human knowledge/warmth. Sans = the system speaking.
Chat input height: ~300px (deliberate — invites long-form composition, treats user as writer not command-line typist)
Fluid type (confirmed clamp values):
--font-size--display-xxl: clamp(3rem, 2.388rem + 2.612vw, 5rem) /* 48px → 80px */ --font-size--display-xs: clamp(1.125rem, 1.087rem + 0.163vw, 1.25rem) /* 18px → 20px */ --font-size--monospace: clamp(0.875rem, 0.531rem + 1.469vw, 2rem) /* 14px → 32px */ --site--margin: clamp(2rem, 1.082rem + 3.918vw, 5rem) /* 32px → 80px */
Shopilot Type ScaleEscala Tipográfica Shopilot
Spacing & Grid — Mathematical Base Units Espaciado & Grid — Unidades Base Matemáticas
Inspired by Cursor's two-unit base system (--g for grid, --v for vertical rhythm). Every measurement is a multiple of 4px — no arbitrary values. Inspirado en el sistema de dos unidades base de Cursor (--g para grilla, --v para ritmo vertical). Cada medida es múltiplo de 4px — sin valores arbitrarios.
Spacing Scale
Border Radii
Spacing RulesReglas de Espaciado
- ✓ Use 4, 8, 12, 16, 24, 32 — never 7 or 13Usar 4, 8, 12, 16, 24, 32 — nunca 7 ni 13
- ✓ More space = more conceptual separationMás espacio = más separación conceptual
- ✓ Dense tables: 5–6px row padding. Confirmation panels: 12–16px.Tablas densas: 5–6px padding de fila. Paneles de confirmación: 12–16px.
- ✗ Never padding 2px with border-radius 8pxNunca padding 2px con border-radius 8px
- ✗ No arbitrary pixel valuesSin px arbitrarios
UI Components — Live Preview Componentes UI — Preview en Vivo
Buttons
Badges / Tags
CardsTarjetas
Standard Card
$84.99
Auriculares BT Pro
Success State
Buy Box: 82%
↑ +19pts this week
Alert State
FraudDetector ⚠
Score: 0.73 · Action required
Data Table (Polaris-inspired — Seller-first)Tabla de Datos (inspirada en Polaris — Seller-first)
Rules: headers 8px uppercase mono · data 9px mono · "me" row highlighted with brand opacity · winner in green · loser in red.Reglas: headers 8px uppercase mono · datos 9px mono · fila "me" resaltada con opacidad de marca · ganador en verde · perdedor en rojo.
Motion & Animation Motion & Animación
Golden Rule (from Anthropic Frontend Cookbook)Regla de Oro (del Anthropic Frontend Cookbook)
One orchestrated page-load animation > 10 scattered micro-interactions. Staggered reveals with animation-delay create more delight than scattered micro-interactions. Una animación orquestada de page-load > 10 micro-interacciones dispersas. Los reveals escalonados con animation-delay crean más deleite que las micro-interacciones dispersas.
Duration TokensTokens de Duración
--dur-instant: .08s
--dur-fast: .14s (Cursor)
--dur-normal: .25s (Cursor slow)
--dur-slow: .40s (Anthropic fade)
Menu open: 400ms (Anthropic nav)
Dropdown: 200ms (Anthropic)
Easing Curves
--ease-spring:
cubic-bezier(.25,1,.5,1)
--ease-out:
cubic-bezier(.4,0,.2,1)
↑ Material standard · used by Anthropic cookbook
Key AnimationsAnimaciones Clave
fadeInUp: Y+4px → Y0 · opacity 0→1 · .2s ease-outY+4px → Y0 · opacidad 0→1 · .2s ease-out
thinking-pulse: opacity .4→1→.4 · 1.2s infiniteopacidad .4→1→.4 · 1.2s infinito
stagger: animation-delay: 50ms per itemanimation-delay: 50ms por ítem
Always: prefers-reduced-motion supportSiempre: soporte prefers-reduced-motion
Accessibility — WCAG 2.2 AA Minimum Accesibilidad — WCAG 2.2 AA Mínimo
Contrast Ratios (Shopilot palette)Ratios de Contraste (paleta Shopilot)
RequirementsRequisitos
✓ :focus-visible with 2px orange outline on all interactive elements:focus-visible con outline naranja 2px en todos los elementos interactivos
✓ Minimum touch target: 44×44pxTarget táctil mínimo: 44×44px
✓ prefers-reduced-motion → all animations disabledprefers-reduced-motion → todas las animaciones desactivadas
✓ Semantic HTML: <button> for actions, <a> for navigationHTML semántico: <button> para acciones, <a> para navegación
✓ Never convey information via color aloneNunca comunicar información solo con color
✓ skip-to-content link (HubSpot pattern)link skip-to-content (patrón HubSpot)
Design Tokens — Shopilot CSS Variables Design Tokens — CSS Variables Shopilot
Complete, copy-usable CSS variable system for all Shopilot projects. Dark-mode first. Sistema completo de variables CSS, copiable y listo para usar en todos los proyectos Shopilot. Dark-mode first.
:root {
/* ── Typography ──────────────────────────────── */
--font-display: 'Styrene A', 'Fraunces', Georgia, serif;
--font-body: 'Inter', 'Space Grotesk', system-ui, sans-serif;
--font-mono: 'JetBrains Mono', 'Fira Code', ui-monospace, monospace;
/* ── Backgrounds (dark-mode first) ──────────── */
--bg: #0f0e0d; /* warm near-black */
--bg-2: #1a1917; /* cards */
--bg-3: #242220; /* hover states */
--bg-4: #2e2c29; /* active states */
/* ── Text ────────────────────────────────────── */
--text: #f5f3ef; /* warm near-white */
--text-2: #c8c5be; /* secondary */
--text-3: #8b8880; /* muted */
--text-4: #5a5855; /* placeholder / disabled */
/* ── Brand accent ────────────────────────────── */
--orange: #f97316; /* primary CTA */
--orange-2: #ea6c0a; /* hover */
--orange-3: rgba(249,115,22,.15); /* tinted bg */
/* ── Semantic ────────────────────────────────── */
--green: #22c55e;
--amber: #f59e0b;
--red: #ef4444;
--blue: #3b82f6;
--purple: #a855f7;
/* ── Borders ─────────────────────────────────── */
--border: rgba(255,255,255,.07);
--border-2: rgba(255,255,255,.12);
--border-3: rgba(255,255,255,.20);
/* ── Opacity system (Cursor-inspired) ───────── */
--fg-05: rgba(245,243,239,.05);
--fg-08: rgba(245,243,239,.08);
--fg-12: rgba(245,243,239,.12);
--fg-20: rgba(245,243,239,.20);
--fg-40: rgba(245,243,239,.40);
/* ── Radii ───────────────────────────────────── */
--r-xs: 2px; --r-sm: 4px; --r: 6px;
--r-lg: 8px; --r-xl: 12px; --r-2xl: 16px;
/* ── Spacing ─────────────────────────────────── */
--sp-1: 4px; --sp-2: 8px; --sp-3: 12px;
--sp-4: 16px; --sp-6: 24px; --sp-8: 32px;
/* ── Motion ──────────────────────────────────── */
--dur: .14s; /* Cursor base */
--dur-slow: .25s; /* Cursor slow */
--ease: cubic-bezier(.25,1,.5,1); /* spring */
--ease-std: cubic-bezier(.4,0,.2,1); /* Material */
/* ── Text scale ──────────────────────────────── */
--text-3xs: 9px; --text-2xs: 10px; --text-xs: 11px;
--text-sm: 12px; --text-base: 13px; --text-md: 14px;
--text-lg: 16px; --text-xl: 20px; --text-2xl: 24px;
}
Shopilot-Specific Patterns Patrones Específicos de Shopilot
ReAct Pattern (Thought → Action → Observation)Patrón ReAct (Pensamiento → Acción → Observación)
Confirmation Card HierarchyJerarquía del Card de Confirmación
1. Header: action type + REVERSIBLE/IRREVERSIBLE badgeHeader: tipo de acción + badge REVERSIBLE/IRREVERSIBLE
2. Diff: from/to with arrows + semantic colorsDiff: de/hacia con flechas + colores semánticos
3. Impact: bulleted consequencesImpacto: bullets de consecuencias
4. Actions: orange confirm + neutral cancelAcciones: confirmar naranja + cancelar neutro
5. Footer: rollback_token if applicableFooter: rollback_token si aplica
Marketplace Data PanelPanel de Datos de Marketplace
1. Header: marketplace icon + name + status badgeHeader: icono marketplace + nombre + badge estado
2. Main metric: large mono number + semantic deltaMétrica principal: número mono grande + delta semántico
3. Secondary: 2–3 col grid, smaller textSecundarias: grid 2–3 col, texto más pequeño
4. Action: link or button at footerAcción: link o botón al pie del panel
Status Indicator LanguageLenguaje de Indicadores de Estado
Do ✓
- Use mono for ALL numeric values (prices, %, BSR, tokens)Usar mono para TODOS los valores numéricos (precios, %, BSR, tokens)
- Keep high density in tables — sellers are power usersMantener alta densidad en tablas — los sellers son power users
- Always show the "reason" behind every agent actionMostrar siempre el "motivo" detrás de cada acción del agente
- Use opacity-based colors for panel backgroundsUsar colores basados en opacidad para fondos de panel
- More critical = more space + more contrastMás crítico = más espacio + más contraste
Don't ✗
- Brand color gradients as UI backgroundsGradientes de color de marca como fondos de UI
- Decorative shadows (flyout/modal only)Sombras decorativas (solo flyout/modal)
- More than 3 semantic colors in one panelMás de 3 colores semánticos en un mismo panel
- Font size < 9px for any interactive textTamaño < 9px para texto interactivo
- Animations > .4s in workflow flowsAnimaciones > .4s en flujos de trabajo
- Orange accent on more than 1 element per screenAcento naranja en más de 1 elemento por pantalla
Design Stack & Toolchain Design Stack & Toolchain
Shopilot's design stack is intentionally lean. The key insight: this HTML spec WILL BE the foundation of the design system once brand decisions are made. No Figma required until Phase 2. El design stack de Shopilot es intencionalmente lean. La clave: este HTML spec ES el sistema de diseño en v1. No se necesita Figma hasta la Fase 2.
4-Level StackStack de 4 Niveles
Spec Layer — This HTML + MarkdownCapa Spec — Este HTML + Markdown
The spec is the design. Cursor AI reads this HTML and generates components that match exactly. Validated through code review. Zero ambiguity vs. Figma handoff.La spec es el diseño. Cursor AI lee este HTML y genera componentes que coinciden exactamente. Validado a través de code review. Cero ambigüedad vs. handoff de Figma.
Token Layer — design-tokens.json → Style Dictionary → CSS :rootCapa Token — design-tokens.json → Style Dictionary → CSS :root
Single source of truth for all values. Style Dictionary transforms JSON → CSS custom properties + tailwind.config.js. One change propagates everywhere.Única fuente de verdad para todos los valores. Style Dictionary transforma JSON → propiedades CSS custom + tailwind.config.js. Un cambio se propaga en todas partes.
Component Layer — React + Tailwind + Figma MCPCapa Componente — React + Tailwind + Figma MCP
All components are defined in Figma (#18 Design System) following Atomic Design. Claude reads the Figma via Figma MCP and implements matching React components. No components are created outside of what is defined in the Figma.Todos los componentes están definidos en Figma (#18 Design System) siguiendo Atomic Design. Claude lee el Figma via Figma MCP e implementa componentes React que coinciden. No se crean componentes fuera de lo definido en el Figma.
QA Layer — Figma ↔ Code visual consistency + PR gatesCapa QA — Consistencia visual Figma ↔ Código + gates de PR
Every PR is reviewed against the Figma source of truth. Visual diff between Figma spec and implemented component is verified during code review. Blocks merge if component deviates from Figma definition.Cada PR se revisa contra la fuente de verdad en Figma. El diff visual entre spec de Figma y componente implementado se verifica durante code review. Bloquea merge si el componente se desvía de la definición en Figma.
Tool × Purpose × When to Use × AlternativeHerramienta × Propósito × Cuándo Usar × Alternativa
| ToolHerramienta | PurposePropósito | WhenCuándo | AlternativeAlternativa |
|---|---|---|---|
| Style Dictionary | Token transformTransform tokens | Phase 1+Fase 1+ | Theo, vanilla-extract |
| Tailwind CSS | Utility stylingEstilos utility | AlwaysSiempre | UnoCSS, vanilla CSS |
| Figma MCP | Component source of truthFuente de verdad de componentes | Phase 1+Fase 1+ | — |
| Radix UI | Accessible primitivesPrimitivos accesibles | Modals, dropdownsModales, dropdowns | Headless UI, Ark |
| Figma | Component source of truth (Atomic Design)Fuente de verdad de componentes (Atomic Design) | Phase 1+Fase 1+ | Penpot (OSS) |
| This HTML spec | Source of truth v1Fuente de verdad v1 | Now → Phase 1Ahora → Fase 1 | — |
Team Roles in Design SystemRoles del Equipo en Design System
💡 Key insight: Figma (#18 Design System) is the single source of truth for all visual components, following Atomic Design (atoms, molecules, organisms, templates, pages). Claude reads Figma via Figma MCP and implements matching React components in #1 Native Shell. No components are created outside of what is defined in the Figma.💡 Insight clave: Figma (#18 Design System) es la fuente única de verdad para todos los componentes visuales, siguiendo Atomic Design (átomos, moléculas, organismos, plantillas, páginas). Claude lee Figma via Figma MCP e implementa componentes React en #1 Native Shell. No se crean componentes fuera de lo definido en el Figma.
How Cursor Built Its UI — Deep Dive Cómo Cursor Construyó Su UI — Deep Dive
Cursor's UI is the closest analogue to Shopilot: an Electron desktop app with a split pane (native web view left + React panel right). Every pattern they solved, we inherit. Below are the concrete technical decisions and their direct Shopilot equivalents. La UI de Cursor es el análogo más cercano a Shopilot: app Electron desktop con split pane (web view nativo izquierda + panel React derecha). Cada patrón que ellos resolvieron, lo heredamos. A continuación las decisiones técnicas concretas y sus equivalentes directos en Shopilot.
Opacity Color SystemSistema de Color por Opacidad
Instead of hardcoded hex colors, Cursor uses a single base color and derives the full scale via opacity:En lugar de colores hex hardcodeados, Cursor usa un único color base y deriva la escala completa vía opacidad:
Result: automatic dark/light theme compatibility. One token set, zero manual overrides.Resultado: compatibilidad automática dark/light. Un set de tokens, cero overrides manuales.
2 Base Units — Everything Derives From Here2 Unidades Base — Todo Deriva de Aquí
No arbitrary pixel values anywhere. Every spacing value is a multiple or fraction of --g or --v. The visual rhythm emerges automatically.Ningún valor en px arbitrario en ningún lugar. Cada valor de espaciado es múltiplo o fracción de --g o --v. El ritmo visual emerge automáticamente.
WebContentsView Split Pane ArchitectureArquitectura Split Pane con WebContentsView
Cursor renders the code editor via WebContentsView (left 70%) — a native Chromium view embedded in Electron. The AI panel (right 30%) is standard React. Communication via ipcMain/ipcRenderer.Cursor renderiza el editor de código vía WebContentsView (izquierda 70%) — una vista Chromium nativa embebida en Electron. El panel AI (derecha 30%) es React estándar. Comunicación vía ipcMain/ipcRenderer.
→ Shopilot identical: marketplace WebContentsView 70% + React sidebar 30%→ Shopilot idéntico: marketplace WebContentsView 70% + sidebar React 30%
Pure CSS Theming — Zero JSTheming CSS Puro — Cero JS
Theme switch is a single setAttribute('data-theme') on document.body. No React state, no re-renders, no flash. Instant.El cambio de tema es un solo setAttribute('data-theme') en document.body. Sin React state, sin re-renders, sin flash. Instantáneo.
Status Bar Anatomy — 24px Fixed HeightAnatomía del Status Bar — 24px de Altura Fija
Cursor Pattern → Shopilot Equivalent (12 patterns)Patrón Cursor → Equivalente Shopilot (12 patrones)
| Cursor PatternPatrón Cursor | Shopilot EquivalentEquivalente Shopilot |
|---|---|
| Shadow workspaces (background indexing) | Marketplace context loaded in background on app startContexto marketplace cargado en background al iniciar |
| Streaming ghost text (word-by-word) | Sidebar word-by-word with fadeIn .08s per tokenSidebar word-by-word con fadeIn .08s por token |
| Tab bar (files) | Tab bar (MeLi · Amazon · Shopify) with marketplace iconsTab bar (MeLi · Amazon · Shopify) con íconos marketplace |
| Status bar 24px | Status bar 24px (agent state · credits · model version)Status bar 24px (estado agente · créditos · versión modelo) |
| Apply/Reject diff blocks | Confirmation card with Confirm/Cancel + diff viewConfirmation card con Confirmar/Cancelar + vista diff |
| Agent loop step indicators | ReAct stream (Thought → Action → Observation)ReAct stream (Pensamiento → Acción → Observación) |
| Composer context pills | Context bar (SELLER_PROFILE · MARKETPLACE · ASIN loaded)Context bar (SELLER_PROFILE · MARKETPLACE · ASIN cargado) |
| --fg-XX opacity token scale | --sp-fg-XX same pattern with sp- (shopilot) prefix--sp-fg-XX mismo patrón con prefijo sp- (shopilot) |
| --g + --v base unit system | --sp-g: 10px + --sp-v: 22px (identical values)--sp-g: 10px + --sp-v: 22px (valores idénticos) |
| data-theme CSS architecture | data-theme on body, dark-only in MVPdata-theme en body, dark-only en MVP |
| Cmd+K command palette | Cmd+K focus chat input (identical binding)Cmd+K foco en chat input (binding idéntico) |
| Frameless titlebar + drag region | titleBarStyle:'hidden' + -webkit-app-region: dragtitleBarStyle:'hidden' + -webkit-app-region: drag |
How Claude Code Built Its UI — Terminal → Desktop Cómo Claude Code Construyó Su UI — Terminal → Desktop
Claude Code started as a pure terminal CLI (React + Ink renderer). Its UX decisions — born from constraints — are some of the best in the AI-native category. Shopilot adapts the same mental model for a visual desktop context. Claude Code comenzó como un CLI de terminal puro (React + Ink renderer). Sus decisiones de UX — nacidas de restricciones — son de las mejores de la categoría AI-native. Shopilot adapta el mismo modelo mental para un contexto visual de desktop.
React + Ink Renderer with Cell-DiffReact + Ink Renderer con Cell-Diff
Ink renders React components as terminal cells. On update, only changed cells are repainted — like a virtual DOM for terminal. Zero flicker even with rapid token streaming. Shopilot equivalent: React with React.memo + virtualized list for chat history.Ink renderiza componentes React como celdas de terminal. En actualización, solo las celdas cambiadas se repintan — como un virtual DOM para terminal. Cero parpadeo incluso con streaming rápido de tokens. Equivalente Shopilot: React con React.memo + lista virtualizada para historial de chat.
No Alternate Screen — Always ScrollableSin Alternate Screen — Siempre Scrollable
Claude Code deliberately avoids alternate screen mode. All output stays in normal scroll buffer so users can scroll back to review previous turns. This is a key trust decision.Claude Code evita deliberadamente el modo alternate screen. Todo el output permanece en el buffer de scroll normal para que los usuarios puedan scrollear hacia atrás y revisar turnos anteriores. Esta es una decisión clave de confianza.
→ Shopilot: chat history always scrollable, no pagination, no "clear screen"→ Shopilot: historial de chat siempre scrollable, sin paginación, sin "clear screen"
Streaming Cursor ▊ — Not a SpinnerCursor de Streaming ▊ — No un Spinner
Claude Code uses a blinking block cursor ▊ to indicate active generation. No spinner, no skeleton, no loading bar. The cursor IS the loading state. Less UI noise = more focus on content.Claude Code usa un cursor de bloque parpadeante ▊ para indicar generación activa. Sin spinner, sin skeleton, sin barra de carga. El cursor ES el estado de carga. Menos ruido de UI = más foco en contenido.
Tool Call Accordion PatternPatrón Accordion de Tool Calls
→ Shopilot: identical accordion for all 36 tools→ Shopilot: accordion idéntico para las 36 herramientas
Agent Loop Visual — 3 PhasesVisual del Agent Loop — 3 Fases
Each phase is visually distinct in the chat. Context gathering = blue, Actions = orange, Verification = green. The user always knows what phase the agent is in without needing to read the text.Cada fase es visualmente distinta en el chat. Recopilar contexto = azul, Acciones = naranja, Verificación = verde. El usuario siempre sabe en qué fase está el agente sin necesidad de leer el texto.
▶ Context Compaction + Memory System (deep spec) Context Compaction + Sistema de Memoria (spec profundo)
Context Compaction BannerBanner de Context Compaction
Appears discretely when context window hits 80%. Blue background, auto-dismiss after 4s. Never interrupts workflow. Shopilot equivalent: same banner + credit usage update in status bar.Aparece discretamente cuando la ventana de contexto alcanza el 80%. Fondo azul, auto-dismiss después de 4s. Nunca interrumpe el flujo. Equivalente Shopilot: mismo banner + actualización de uso de créditos en status bar.
CLAUDE.md → Shopilot EquivalentCLAUDE.md → Equivalente Shopilot
Complete Component Catalog — Atomic Design Catálogo Completo de Componentes — Atomic Design
Every component Shopilot needs, organized by Atomic Design level. Status: Build = create from scratch · Buy = use library · Done = already in spec. Todos los componentes que Shopilot necesita, organizados por nivel de Atomic Design. Estado: Build = crear desde cero · Buy = usar librería · Done = ya en spec.
ATOMS — Indivisible UI ParticlesATOMS — Partículas UI Indivisibles
| ComponentComponente | DescriptionDescripción | StatusEstado | WeekSemana | OwnerResponsable |
|---|---|---|---|---|
| ColorChip | 12×12px swatch, rounded-sm, border 1pxMuestra 12×12px, rounded-sm, borde 1px | Build | W1 | Sergio |
| Icon | Lucide-react, 3 sizes: 12/16/20pxLucide-react, 3 tamaños: 12/16/20px | Buy | W1 | Sergio |
| Spinner | 16px, border-2, spin 600ms, 3 size variants16px, border-2, spin 600ms, 3 variantes de tamaño | Build | W1 | Sergio |
| Divider | 1px, --fg-10, horizontal and vertical1px, --fg-10, horizontal y vertical | Build | W1 | Sergio |
| StatusDot | 8px circle, 4 semantic colors, optional pulseCírculo 8px, 4 colores semánticos, pulso opcional | Build | W1 | Sergio |
| AvatarInitials | 24/32px circle, 2-letter initials, hashed bg colorCírculo 24/32px, iniciales 2 letras, color bg hasheado | Build | W1 | Sergio |
| CreditBadge | Mono font, number + "cr", color by thresholdFuente mono, número + "cr", color por umbral | Build | W2 | Sergio |
AI-NATIVE ATOMS — Unique to AI ProductsAI-NATIVE ATOMS — Exclusivos de Productos AI
| ComponentComponente | BehaviorComportamiento | StatusEstado | OwnerResponsable |
|---|---|---|---|
| StreamingCursor ▊ | Blink 1s infinite, opacity .4→1→.4, no spinnerParpadeo 1s infinito, opacity .4→1→.4, sin spinner | Build | Sergio |
| ThinkingPulse ··· | 3 dots, opacity .4→1→.4, 1.2s staggered3 puntos, opacity .4→1→.4, 1.2s escalonado | Build | Sergio |
| ToolBadge | Tool name + state icon + duration. 4 states: queued/running/done/errorNombre tool + ícono estado + duración. 4 estados: queued/running/done/error | Build | Sergio |
| AgentStatusBar | Bottom 24px, state-driven dot animation + text24px inferior, animación de punto por estado + texto | Build | Sergio |
| RiskBadge | A/B/C risk level, color-coded, uppercaseNivel de riesgo A/B/C, codificado por color, mayúsculas | Build | Sergio |
| TTLCountdown | Remaining time mono, amber below 20%, red below 5%Tiempo restante mono, amber bajo 20%, rojo bajo 5% | Build | Andrés |
MOLECULES — Composed Interactive UnitsMOLECULES — Unidades Interactivas Compuestas
| ComponentComponente | VariantsVariantes | StatusEstado | WeekSemana | OwnerResponsable |
|---|---|---|---|---|
| Button | primary · secondary · ghost · destructive · icon · loadingprimario · secundario · ghost · destructivo · ícono · cargando | Build | W1 | Sergio |
| Input / Search | text · search (with icon) · textarea · readonlytexto · búsqueda (con ícono) · textarea · readonly | Build | W1 | Sergio |
| Select | single · multi · searchable — Radix Selectsimple · múltiple · buscable — Radix Select | Buy | W2 | Sergio |
| Toggle | on/off, 32px wide, smooth transition 200mson/off, 32px ancho, transición suave 200ms | Build | W2 | Sergio |
| Tooltip | hover delay 200ms, max-width 200px, Radix Tooltipdelay hover 200ms, max-width 200px, Radix Tooltip | Buy | W2 | Sergio |
| ProgressBar | linear, 6px height, animated fill, semantic colorslineal, 6px altura, relleno animado, colores semánticos | Build | W2 | Andrés |
| TabBar | marketplace tabs, icon + label, active underline 2px orangetabs marketplace, ícono + etiqueta, subrayado activo 2px naranja | Build | W2 | Sergio |
| Dropdown | Radix DropdownMenu, keyboard nav, icons optionalRadix DropdownMenu, nav teclado, íconos opcionales | Buy | W2 | Sergio |
| KbdShortcut | <kbd> styled, Cmd/Ctrl adaptive, monospace<kbd> estilizado, Cmd/Ctrl adaptativo, monospace | Build | W3 | Sergio |
ORGANISMS — Complex, Stateful UI SectionsORGANISMS — Secciones UI Complejas con Estado
| ComponentComponente | Key BehaviorComportamiento Clave | StatusEstado | WeekSemana | OwnerResponsable |
|---|---|---|---|---|
| CardStandard | glass, border, hover shadow, padding p-4/p-6glass, borde, sombra hover, padding p-4/p-6 | Build | W1 | Sergio |
| DataTable | sortable, sticky header, mono numbers, row hoverordenable, header fijo, números mono, hover fila | Build | W3 | Andrés |
| ConfirmDialog | REVERSIBLE/IRREVERSIBLE badge + diff + impact bulletsbadge REVERSIBLE/IRREVERSIBLE + diff + bullets impacto | Build | W4 | Sergio |
| ToolAccordion | collapsed: badge+name+duration · expanded: full JSONcolapsado: badge+nombre+duración · expandido: JSON completo | Build | W4 | Sergio |
| ReActStream | Thought(purple) → Action(orange) → Observation(green) per turnPensamiento(morado) → Acción(naranja) → Observación(verde) por turno | Build | W5 | Sergio |
| ProactiveCard | slide-up from bottom, max 2 simultaneous, dismiss swipeslide-up desde abajo, máx 2 simultáneas, dismiss swipe | Build | W5 | Sergio |
| ContextBar | stacked context window bar with legend (project 26)barra de ventana de contexto apilada con leyenda (proyecto 26) | Done | — | Andrés |
| AuditLog | timeline, mono timestamps, expandable rows, action badgestimeline, timestamps mono, filas expandibles, badges de acción | Build | W6 | Andrés |
| RollbackPanel | shows rollback_token, before/after diff, one-click restoremuestra rollback_token, diff antes/después, restaurar un clic | Build | W7 | Andrés |
| FraudAlert | red banner, hard block mode, escalation CTAbanner rojo, modo hard block, CTA de escalación | Build | W7 | Sergio |
| MarketplaceKPI | large mono metric + delta badge + sparkline + secondary gridmétrica mono grande + badge delta + sparkline + grid secundario | Build | W6 | Andrés |
| CreditEconomy | stacked bar + number + upgrade CTA at thresholdsbarra apilada + número + CTA upgrade en umbrales | Build | W6 | Sergio |
| OnboardingStep | step N/M indicator, progress bar, back/next, skipindicador paso N/M, barra progreso, atrás/siguiente, saltar | Build | W8 | Sergio |
| EnrollmentCard | ASIN + marketplace + risk assessment + enroll buttonASIN + marketplace + evaluación de riesgo + botón enrollar | Build | W9 | Andrés |
| ErrorRecovery | A=amber recoverable · B=red unrecoverable · C=blue infoA=amber recuperable · B=rojo irrecuperable · C=azul info | Build | W5 | Sergio |
TEMPLATES — Full Page LayoutsTEMPLATES — Layouts de Página Completa
ChatView
W4
Dashboard
W6
Settings
W7
Billing
W8
Enrollment
W9
Data Visualization for Sellers — 8 Patterns Visualización de Datos para Sellers — 8 Patrones
Sellers are power users. They read numbers professionally. Every data component must prioritize density, precision, and scanability. Golden rule: ALL numbers use JetBrains Mono. No exceptions. Los sellers son power users. Leen números profesionalmente. Cada componente de datos debe priorizar densidad, precisión y escaneabilidad. Regla de oro: TODOS los números usan JetBrains Mono. Sin excepciones.
1 · KPI Metric Card1 · KPI Metric Card
Large mono number + delta badge (▲green/▼red) + mini sparkline. Zero chart library dependency.Número mono grande + badge delta (▲verde/▼rojo) + mini sparkline. Cero dependencia de librería de charts.
2 · Competitor Table2 · Tabla de Competidores
| Seller | Price | BB% |
|---|---|---|
| You | $24.99 | 72% |
| Seller_A | $24.50 | 18% |
| Seller_B | $25.99 | 10% |
"You" row highlighted orange. Winner rows green, loser rows red/dim. Sortable columns. Dense = good.Fila "Tú" resaltada naranja. Filas ganadoras verde, perdedoras rojo/dim. Columnas ordenables. Denso = bueno.
3 · Context Window Bar3 · Barra de Ventana de Contexto
Stacked horizontal bar. Each segment = one context source. Labeled legend below. CSS-only, zero JS dependencies.Barra horizontal apilada. Cada segmento = una fuente de contexto. Leyenda etiquetada abajo. Solo CSS, cero dependencias JS.
4 · Buy Box % Gauge4 · Gauge de Buy Box %
MeLi · 30dBuy Box
MeLi · 30d
CSS conic-gradient semicircle. No SVG, no chart lib. Color threshold: >60% green · 40-60% amber · <40% red.CSS conic-gradient semicírculo. Sin SVG, sin lib de charts. Umbral de color: >60% verde · 40-60% amber · <40% rojo.
5 · BSR Sparkline (30 points)5 · Sparkline BSR (30 puntos)
Note: BSR lower = better, so bars trend DOWN is good. Inverted scale. Pure CSS bar sparkline, no Recharts needed.Nota: BSR más bajo = mejor, así que barras en bajada = bueno. Escala invertida. Sparkline de barras CSS puro, sin necesidad de Recharts.
6 · Portfolio Health Grid6 · Grid de Salud del Portfolio
ASIN-001
BB 72%
ASIN-002
BB 42%
ASIN-003
BB 12%
ASIN-004
BB 68%
2×N semaphore grid. Color = health. Click = drill-down. Scales to 50+ ASINs with virtualization.Grid semáforo 2×N. Color = salud. Click = drill-down. Escala a 50+ ASINs con virtualización.
7 · Audit Log Timeline7 · Timeline Audit Log
Mono timestamps. Dot-line connector. Expandable JSON on click.Timestamps mono. Conector punto-línea. JSON expandible al hacer click.
8 · Credit Economy Bar8 · Barra de Economía de Créditos
Green >20% · Amber at 20% · Red at 5% · Modal upgrade at 0%. Never hide the number.Verde >20% · Amber al 20% · Rojo al 5% · Modal upgrade al 0%. Nunca ocultar el número.
Golden Rule: Every Number → JetBrains MonoRegla de Oro: Todo Número → JetBrains Mono
Prices, percentages, BSR rankings, token counts, credit counts, timestamps, ASIN codes, version numbers, durations — ALL use font-family: 'JetBrains Mono'. This creates instant visual scanning: human eye finds numbers automatically when they have a distinct type treatment.Precios, porcentajes, rankings BSR, conteos de tokens, créditos, timestamps, códigos ASIN, números de versión, duraciones — TODOS usan font-family: 'JetBrains Mono'. Esto crea escaneo visual instantáneo: el ojo humano encuentra números automáticamente cuando tienen un tratamiento tipográfico distinto.
Electron Desktop Design Patterns Patrones de Diseño para Electron Desktop
Shopilot is a desktop app, not a web app. This distinction has concrete design implications. Every decision below is specific to Electron and cannot be copied from web-only design systems. Shopilot es una app de desktop, no una web app. Esta distinción tiene implicaciones de diseño concretas. Cada decisión a continuación es específica de Electron y no puede copiarse de sistemas de diseño web.
Title Bar — Frameless + Native ButtonsTitle Bar — Sin Marco + Botones Nativos
macOS traffic lights (●●●) appear natively. Tab bar acts as drag region. Interactive elements must have -webkit-app-region: no-drag.Los semáforos de macOS (●●●) aparecen nativamente. La tab bar actúa como región de arrastre. Los elementos interactivos deben tener -webkit-app-region: no-drag.
Split Pane 70/30 — Not Resizable in MVPSplit Pane 70/30 — No Redimensionable en MVP
70% left = marketplace WebContentsView. 30% right = React sidebar. Fixed split in MVP — no drag-to-resize. Simplifies implementation by 3–4 weeks.70% izquierda = marketplace WebContentsView. 30% derecha = sidebar React. Split fijo en MVP — sin redimensionamiento por arrastre. Simplifica implementación en 3–4 semanas.
Tab Bar — 3 Marketplace TabsTab Bar — 3 Tabs de Marketplace
Active tab: orange accent background + border. Inactive: dimmed. Tab switch = WebContentsView bounds swap. Cmd+1/2/3 keyboard shortcuts.Tab activo: fondo acento naranja + borde. Inactivo: atenuado. Cambio de tab = intercambio de bounds de WebContentsView. Atajos de teclado Cmd+1/2/3.
System Tray — 3-Level NotificationsSystem Tray — Notificaciones 3 Niveles
Tray icon: 16×16 mono SVG, template image (macOS adaptive). Context menu: Open · Pause Agent · Quit.Ícono tray: SVG mono 16×16, template image (adaptativo macOS). Menú contextual: Abrir · Pausar Agente · Salir.
Status Bar Bottom 24px — AnatomyStatus Bar Inferior 24px — Anatomía
Left side:Lado izquierdo:
- ● Animated dot matching agent state (idle/thinking/acting)Punto animado según estado del agente (idle/pensando/actuando)
- ● Active marketplace nameNombre del marketplace activo
- ● Current agent state labelEtiqueta del estado actual del agente
Right side:Lado derecho:
- ● Credit count (color-coded by threshold)Conteo de créditos (codificado por umbral)
- ● Active model versionVersión del modelo activo
- ● App versionVersión de la app
Keyboard Map — Core ShortcutsMapa de Teclado — Atajos Principales
Minimum window size: 900×600px. Below this, show a friendly "resize window" overlay. Never let UI break at any size above 900px.Tamaño mínimo de ventana: 900×600px. Por debajo de esto, mostrar un overlay amigable de "redimensionar ventana". Nunca dejar que la UI se rompa en ningún tamaño por encima de 900px.
AI-Native Interaction Patterns Patrones de Interacción AI-Native
These patterns don't exist in traditional design systems. They emerge from the AI agent paradigm: streaming, tool execution, multi-step reasoning, confirmation dialogs for real-world actions, and error recovery specific to LLM behavior. Estos patrones no existen en sistemas de diseño tradicionales. Emergen del paradigma del agente AI: streaming, ejecución de herramientas, razonamiento multi-paso, diálogos de confirmación para acciones del mundo real y recuperación de errores específica del comportamiento LLM.
Agent State MachineMáquina de Estado del Agente
Status bar dot color and animation directly reflects state. No extra loading UI needed — the dot IS the state indicator.El color y la animación del punto en el status bar refleja directamente el estado. No se necesita UI de carga adicional — el punto ES el indicador de estado.
Word-by-Word StreamingStreaming Palabra a Palabra
Each token appended as a <span class="word">. No skeleton screens, no progressive disclosure — just the text appearing naturally. Mimics human typing.Cada token añadido como <span class="word">. Sin skeleton screens, sin divulgación progresiva — solo el texto apareciendo naturalmente. Imita el tipeo humano.
Thinking Pulse ··· — Never Show Elapsed TimePulso de Pensamiento ··· — Nunca Mostrar Tiempo Transcurrido
Staggered dots communicate "processing" without anxiety-inducing elapsed time. Never show a counter like "Thinking... 12s". That creates frustration. The wave implies ongoing progress.Puntos escalonados comunican "procesando" sin mostrar tiempo transcurrido que genera ansiedad. Nunca mostrar un contador como "Pensando... 12s". Eso crea frustración. La onda implica progreso continuo.
Tool Stage TransitionsTransiciones de Estado de Tool
State transitions animated: scale 0.9→1 + opacity 0→1 in 150ms. Never jump — always transition.Transiciones de estado animadas: scale 0.9→1 + opacity 0→1 en 150ms. Nunca saltar — siempre transicionar.
Confirmation Card AnimationAnimación del Card de Confirmación
Card slides up from below. Backdrop fades in simultaneously. Diff rows appear sequentially (staggered 50ms). Confirms that this is a considered action requiring full attention.El card sube desde abajo. El backdrop aparece simultáneamente. Las filas del diff aparecen secuencialmente (escalonado 50ms). Confirma que es una acción considerada que requiere atención completa.
Error Hierarchy — A · B · CJerarquía de Errores — A · B · C
Recoverable ErrorError Recuperable
Amber background. "Try again" or alternative action offered. Examples: API timeout, rate limit, price validation failed. Agent can retry autonomously or with user nudge.Fondo amber. Se ofrece "Reintentar" o acción alternativa. Ejemplos: timeout de API, rate limit, validación de precio fallida. El agente puede reintentar autónomamente o con empuje del usuario.
Unrecoverable ErrorError Irrecuperable
Red background. Hard stop. Human escalation required. Examples: fraud detected, marketplace account suspended, rollback failed. Modal, not dismissable without action.Fondo rojo. Parada total. Se requiere escalación humana. Ejemplos: fraude detectado, cuenta marketplace suspendida, rollback fallido. Modal, no se puede descartar sin acción.
Informational BlockBloqueo Informativo
Blue background. Non-critical. Context about a limitation. Examples: "This marketplace is in read-only mode", "Feature available in Pro plan". Dismissable, no urgency.Fondo azul. No crítico. Contexto sobre una limitación. Ejemplos: "Este marketplace está en modo solo lectura", "Función disponible en plan Pro". Descartable, sin urgencia.
Proactive Suggestion CardsCards de Sugerencia Proactiva
● Appear from bottom via slide-up animationAparecen desde abajo vía animación slide-up
● Maximum 2 simultaneously — never moreMáximo 2 simultáneamente — nunca más
● Dismiss: click X or swipe rightDescartar: click X o deslizar a la derecha
● Auto-dismiss after 30s if no interactionAuto-descartar después de 30s sin interacción
● One primary action button (orange)Un botón de acción primaria (naranja)
Credit Warning SystemSistema de Alerta de Créditos
Design → Code Pipeline Pipeline Diseño → Código
How design decisions become production code. The pipeline has 3 phases matching the product lifecycle: Design-in-Code (now), Token-driven (Phase 1), and Figma-backed (Phase 2+). Cómo las decisiones de diseño se convierten en código de producción. El pipeline tiene 3 fases que coinciden con el ciclo de vida del producto: Design-in-Code (ahora), Token-driven (Fase 1) y Figma-backed (Fase 2+).
Design-in-Code: This HTML = Source of TruthDesign-in-Code: Este HTML = Fuente de Verdad
Every design decision is documented directly in this HTML spec. Cursor AI reads it and generates matching React components. Design reviews happen via PR diffs on this file. Zero Figma dependency.Cada decisión de diseño está documentada directamente en este HTML spec. Cursor AI lo lee y genera componentes React que coinciden. Las revisiones de diseño ocurren vía diffs de PR en este archivo. Cero dependencia de Figma.
▶ design-tokens.json → Style Dictionary → CSS + Tailwind (deep spec) design-tokens.json → Style Dictionary → CSS + Tailwind (spec profundo)
W3C Design Tokens Format (excerpt)Formato W3C Design Tokens (extracto)
{
"color": {
"brand": {
"primary": { "$value": "#f97316", "$type": "color" },
"meli": { "$value": "#f97316", "$type": "color" },
"amazon": { "$value": "#f97316", "$type": "color" },
"shopify": { "$value": "#5c6ac4", "$type": "color" }
}
},
"spacing": {
"g": { "$value": "10px", "$type": "dimension" },
"v": { "$value": "22px", "$type": "dimension" }
}
}
Token Naming ConventionConvención de Naming de Tokens
--sp- prefix prevents collisions with framework tokens. Always 3-4 segments. Kebab-case throughout.El prefijo --sp- previene colisiones con tokens del framework. Siempre 3-4 segmentos. Kebab-case en todo momento.
Figma MCP WorkflowWorkflow de Figma MCP
// Claude reads Figma (#18 Design System) via Figma MCP
// Atomic Design hierarchy in Figma:
//
// Atoms: Button, Input, Badge, Icon, Label, StatusDot, Avatar
// Molecules: FormField, SearchBar, NavItem, ChatBubble, TabBar
// Organisms: Sidebar, Header, CardLayout, Modal, ToolProgress
// Templates: ChatView, ProfileView, BillingView, EnrollmentView
// Pages: Full-screen compositions for each Shell view
//
// Workflow:
// 1. External design team creates/updates component in Figma
// 2. Claude reads component spec via Figma MCP
// 3. Claude implements matching React component in #1 Native Shell
// 4. Code review verifies fidelity to Figma spec
// Rule: NO React components outside of what Figma defines
Handoff Checklist (5 items)Checklist de Handoff (5 items)
- □Component exists in Figma (#18) with all states and variantsComponente existe en Figma (#18) con todos los estados y variantes
- □All states shown (default, hover, active, disabled, loading, error)Todos los estados mostrados (default, hover, active, disabled, loading, error)
- □Token values referenced (no hardcoded hex)Valores de tokens referenciados (sin hex hardcodeados)
- □a11y: aria-label, focus ring, keyboard nava11y: aria-label, focus ring, navegación teclado
- □Responsive: works at 900px min widthResponsive: funciona a 900px de ancho mínimo
Design debt tracking: When a PR introduces a visual inconsistency (wrong token, missing state, hardcoded value), add Linear label design-system + comment with exact issue. Never merge and forget — debt compounds fast in AI products where UI is the trust layer.Tracking de deuda de diseño: Cuando un PR introduce una inconsistencia visual (token incorrecto, estado faltante, valor hardcodeado), añadir label Linear design-system + comentario con el issue exacto. Nunca mergear y olvidar — la deuda se acumula rápido en productos AI donde la UI es la capa de confianza.
Governance & Scalability Escalabilidad y Gobernanza
How to grow the design system without breaking existing components or creating chaos. Rules are simple enough to remember, strict enough to matter. Cómo hacer crecer el sistema de diseño sin romper componentes existentes ni crear caos. Las reglas son lo suficientemente simples para recordarlas, lo suficientemente estrictas para importar.
Abstraction Rule — 3+Regla de Abstracción — 3+
3 or more uses → abstract into a component.
1–2 uses → inline style is fine. Don't create a component for a one-off. Don't create abstraction anxiety by over-componentizing trivial things.3 o más usos → abstraer en componente.
1–2 usos → style inline está bien. No crear componente para algo de un solo uso. No crear ansiedad de abstracción con over-componentización de cosas triviales.
New Marketplace = 3 FilesNuevo Marketplace = 3 Archivos
Adding a 4th marketplace requires only:
1 accent color token + 1 logo SVG + 1 URL pattern.
All components, layouts, and patterns work automatically. This is the test of a real design system — extensibility without modification.Añadir un 4to marketplace requiere solo:
1 token de color acento + 1 logo SVG + 1 patrón URL.
Todos los componentes, layouts y patrones funcionan automáticamente. Este es el test de un sistema de diseño real — extensibilidad sin modificación.
Token VersioningVersionado de Tokens
Each design-tokens.json release is versioned. Non-breaking changes = patch. New tokens = minor. Token renames or value breaks = major. Always deprecate before removing — give 1 sprint lead time.Cada release de design-tokens.json está versionado. Cambios sin breaking = patch. Nuevos tokens = minor. Renombres o cambios de valor = major. Siempre deprecar antes de eliminar — dar 1 sprint de lead time.
Design Review — 2 RolesRevisión de Diseño — 2 Roles
Author: PR description + before/after screenshot + token references listedDescripción de PR + screenshot antes/después + tokens referenciados listados
Reviewer: checks tokens (no hardcoded values) + a11y (aria, contrast) + responsive (900px min)verifica tokens (sin valores hardcodeados) + a11y (aria, contraste) + responsive (900px mín)
Anti-Patterns to AvoidAnti-Patrones a Evitar
Maturity Levels L1 → L4Niveles de Madurez L1 → L4
| LevelNivel | IncludesIncluye | When BuiltCuándo Se Construye |
|---|---|---|
| L1 | Tokens + all Atoms + CSS architectureTokens + todos los Atoms + arquitectura CSS | Phase 1 Week 1–2Fase 1 Semanas 1–2 |
| L2 | All Molecules + CardStandard + DataTableTodos los Molecules + CardStandard + DataTable | Phase 1 Week 2–4Fase 1 Semanas 2–4 |
| L3 | All Organisms + AI-native atoms + data vizTodos los Organisms + AI-native atoms + data viz | Phase 1 Week 4–10Fase 1 Semanas 4–10 |
| L4 | All Templates + Figma ↔ Code consistency gates + governance docsTodos los Templates + gates de consistencia Figma ↔ Código + docs de gobernanza | Phase 2+Fase 2+ |
🔮 Open source future: Publish @shopilot/design-tokens as an npm package when Shopilot has 3+ white-label clients. Token layer is the most transferable part — it's how Shopify Polaris and Atlassian Design System generate revenue beyond their own product.🔮 Futuro open source: Publicar @shopilot/design-tokens como paquete npm cuando Shopilot tenga 3+ clientes white-label. La capa de tokens es la parte más transferible — así es como Shopify Polaris y Atlassian Design System generan ingresos más allá de su propio producto.
Design System Roadmap — What to Build When Roadmap del Design System — Qué Construir Cuándo
Delimited to the actual MVP scope: Electron desktop 70/30 split, React sidebar, 36 tools. The 80/20 rule applies: tokens + button + card + badge + data table + confirmation dialog = 80% of the UI. Delimitado al scope real del MVP: Electron desktop split 70/30, sidebar React, 36 herramientas. La regla 80/20 aplica: tokens + button + card + badge + data table + confirmation dialog = 80% de la UI.
Week 1
- CSS tokens (--sp-*)
- All 7 Atoms
- Button (6 variants)
- Input / Textarea
Week 2
- Card + Divider
- Toggle + Select
- Tab bar (3 tabs)
- Status bar 24px
Week 3
- Tooltip + Dropdown
- Progress bar
- AI-native Atoms (6)
- KbdShortcut
Week 4
- ToolAccordion
- ConfirmDialog
- ChatView template
- Figma MCP integration
Week 5
- ReActStream
- ProactiveCard
- ErrorRecovery A/B/C
- Credit warning
Week 6
- MarketplaceKPI
- CreditEconomy
- AuditLog
- Dashboard template
Week 7–8
- DataTable (full)
- RollbackPanel
- FraudAlert
- Settings template
- Billing template
Week 9–10
- EnrollmentCard
- OnboardingStep
- Enrollment template
- Style Dictionary setup
Figma component library refinement (full Atomic Design hierarchy). Token export pipeline (Token Studio → Style Dictionary → Tailwind). Design system documentation site. Accessibility audit with axe-core. Visual consistency gates in PR review.Refinamiento de librería de componentes Figma (jerarquía completa Atomic Design). Pipeline de export de tokens (Token Studio → Style Dictionary → Tailwind). Sitio de documentación del design system. Auditoría de accesibilidad con axe-core. Gates de consistencia visual en review de PRs.
Never Build — Use LibrariesNunca Construir — Usar Librerías
80/20 — These 6 Cover 80% of UI80/20 — Estos 6 Cubren el 80% de la UI
Velocity Target & Final Status TableObjetivo de Velocidad & Tabla de Estado Final
Target: 1 component/day (Sergio, Weeks 1–6). At this velocity, the full organism library is complete before the first external user demo. Over-engineering a component takes 3 days minimum. Keep it simple until complexity is needed.Objetivo: 1 componente/día (Sergio, Semanas 1–6). A esta velocidad, la librería completa de organisms está lista antes del primer demo a usuario externo. Un componente over-engineered tarda un mínimo de 3 días. Mantener simple hasta que la complejidad sea necesaria.
| ComponentComponente | WeekSemana | OwnerResponsable | Visual StateEstado Visual |
|---|---|---|---|
| CSS Tokens + Atoms | W1 | Mateo + Sergio | pending |
| Button + Input + Card | W1–2 | Sergio | pending |
| TabBar + StatusBar + AI Atoms | W2–3 | Sergio | pending |
| ToolAccordion + ConfirmDialog | W4 | Sergio | pending |
| ReActStream + ProactiveCard | W5 | Sergio | pending |
| KPI + DataTable + AuditLog | W6–7 | Andrés | pending |
| Enrollment + Onboarding | W9–10 | Sergio + Andrés | pending |
15. Brand Intelligence Lab — 17 Brand Books + Shopilot Recommendation Brand Intelligence Lab — 17 Brand Books + Recomendación Shopilot
Deep-dive brand books for the 6 reference products + 10 YC-backed startups with similar contexts. Colors, typography, buttons, spacing, motion, voice — everything. Ends with the Shopilot Recommended Brand Book. Brand books a profundidad de los 6 productos de referencia + 10 startups respaldadas por YC con contextos similares. Colores, tipografía, botones, espaciado, motion, voz — todo. Termina con el Brand Book Recomendado de Shopilot.
6 Reference Brands6 Marcas de Referencia
10 YC Startups10 Startups YC
SynthesisSíntesis
Anthropic / Claude.ai
AI Safety Company · San Francisco · 2021 · YC Alumni (W21)Empresa de AI Safety · San Francisco · 2021 · Alumni YC (W21)
Brand PhilosophyFilosofía de Marca
"AI for human flourishing"
The Anthropic visual language is built around the concept of "clay" — unfired earth, warm, unfinished, human. The brand consciously rejects the cold blue-shifted AI aesthetic (think IBM, Microsoft Azure, early OpenAI). Instead: warmth, earth, copper, organic. The name "Claude" deliberately chosen for its French warmth and humanist connotations. Every color decision reflects: trustworthy AI that feels human, not robotic.El lenguaje visual de Anthropic se construye alrededor del concepto de "arcilla" — tierra sin cocer, cálida, inacabada, humana. La marca rechaza conscientemente la estética AI fría con tono azulado (como IBM, Microsoft Azure, OpenAI inicial). En cambio: calidez, tierra, cobre, orgánico. El nombre "Claude" elegido deliberadamente por su calidez francesa y connotaciones humanistas. Cada decisión de color refleja: AI confiable que se siente humana, no robótica.
Color SystemSistema de Color
#faf9f5
Background Light
RGB 250/249/245 · toasted cream
#141413
Background Dark
RGB 20/20/19 · warm undertone
#CC785C
Brand Copper
Logo · selection · icon
#d97757
UI Orange
CTAs · interactive elements
#6a9bcc
Muted Blue
Secondary · info states
#788c5d
Muted Green
Success · positive states
rgba(204,120,92,.15)
Selection BG
Text selection highlight
#1a1915
Surface Dark
Cards on dark bg
● Contrast: #141413 on #faf9f5 = 19.9:1 AAA · #CC785C on #faf9f5 = 5.0:1 AA · #d97757 on #141413 = 6.1:1 AAContraste: #141413 sobre #faf9f5 = 19.9:1 AAA · #CC785C sobre #faf9f5 = 5.0:1 AA · #d97757 sobre #141413 = 6.1:1 AA
● Rule: Never use pure black (#000) or pure white (#fff). The warmth delta of ~5 RGB units in each neutral makes everything feel premium vs. commodity.Regla: Nunca usar negro puro (#000) ni blanco puro (#fff). El delta de calidez de ~5 unidades RGB en cada neutro hace que todo se sienta premium vs. commodity.
Typography SystemSistema Tipográfico
| Role | Font | Weight | Usage |
|---|---|---|---|
| Display / Headlines | Styrene A / Styrene B | 400–700 | Hero titles, section headsTítulos hero, encabezados |
| Editorial / Long-form | Tiempos Text | 400 italic | Blog, docs, long readsBlog, docs, lectura larga |
| Product / UI Text | Styrene A | 400–500 | App UI, labels, bodyUI de app, etiquetas, cuerpo |
| Code / Data | JetBrains Mono | 400 | Code blocks, inline codeBloques de código, código inline |
| Accent / Quote | Galaxie Copernicus | 300 italic | Pull quotes, feature textPull quotes, texto destacado |
Type scale:Escala tipográfica: display-xxl: clamp(3rem, 5vw, 5rem) · display-lg: clamp(2rem, 3.5vw, 3.5rem) · display-xs: clamp(1.125rem, 1.5vw, 1.25rem) · body: 1rem/1.6
Button SystemSistema de Botones
● Primary: bg #d97757 · text white · radius 8px · padding 10px 20px · font-weight 600
● Secondary: border 1.5px #CC785C/50 · text #CC785C · bg transparent · same radii/padding
● Hover: filter: brightness(1.1) — never use a fixed darker hex, keep theming dynamicHover: filter: brightness(1.1) — nunca usar un hex oscuro fijo, mantener el theming dinámico
▶ Spacing · Shadows · Motion · Voice (deep spec) Espaciado · Sombras · Motion · Voz (spec profundo)
SpacingEspaciado
Site margin: clamp(2rem, 5rem)
Nav height: 68px (4.25rem)
Section gap: 96px–160px
Chat max-w: 768px (3xl)
Message max: 75ch
ShadowsSombras
Default: none
Flyout: 0 8px 32px rgba(0,0,0,.12)
Modal: 0 24px 64px rgba(0,0,0,.18)
Focus ring: 0 0 0 3px rgba(204,120,92,.3)
Motion
Menu open: 400ms
Dropdown: 200ms
Tooltip: 150ms
Easing: cubic-bezier(.4,0,.2,1)
Streaming: 0ms delay, instant
Brand VoiceVoz de Marca
Tone adjectivesAdjetivos de tono
Thoughtful · Warm · Honest · Direct · Curious · Humble
Anti-toneAnti-tono
Never: Hype-y · Corporate · Cold · Overpromising · Robotic
Writing styleEstilo de escritura
Conversational but precise. Short sentences. Active voice. Explains "why" not just "what".Conversacional pero preciso. Frases cortas. Voz activa. Explica el "por qué" no solo el "qué".
Shopilot inheritsShopilot hereda
Candidate inspiration: warm copper accent · dark backgrounds · trustworthy AI voiceInspiración candidata: acento cobre cálido · fondos oscuros · voz AI confiable
Cursor IDE
AI-Native Code Editor · Anysphere · 2022 · YC S22Editor de Código AI-Native · Anysphere · 2022 · YC S22
Brand PhilosophyFilosofía de Marca
"The AI-first code editor built for pair programming with AI"
Cursor's brand philosophy is hyper-functional. There is no decorative layer — every visual decision serves the task of writing code. The orange accent (#f54e00) is used only for the critical hot path: the most important action on screen. The warm off-white/off-black background signals "professional tool" vs. "consumer app." The UI is intentionally dense — developers are trained to read dense information quickly.La filosofía de marca de Cursor es híper-funcional. No hay capa decorativa — cada decisión visual sirve a la tarea de escribir código. El acento naranja (#f54e00) se usa solo para el hot path crítico: la acción más importante en pantalla. El fondo off-white/off-black cálido señala "herramienta profesional" vs. "app consumer". La UI es intencionalmente densa — los developers están entrenados para leer información densa rápidamente.
Color SystemSistema de Color
#f7f7f4
--color-theme-bg
Warm off-white
#26251e
--color-theme-fg
Warm off-black
#f54e00
--color-theme-accent
Hot orange · CTAs only
--fg-01 … --fg-100
Opacity Scale
Every 5% step from bg color
● Base units: --g: calc(10rem/16) ≈ 10px (grid) · --v: 1.375rem ≈ 22px (vertical rhythm)
● Duration: --duration: .14s · --duration-slow: .25s
● Easing: --ease-out-spring: cubic-bezier(.25,1,.5,1)
● Shadows: Ultra-minimal 0 0 1rem #00000005 — shadows only on flyouts, never on cards
● Border radii: 2 · 4 · 8 · 12 · 16px — smallest for inputs, largest for panels
TypographyTipografía
| Role | Font | Size | Notes |
|---|---|---|---|
| UI Product (sm) | System + custom | 11px (.6875rem) | --text-product-sm · labels, status--text-product-sm · etiquetas, estado |
| UI Product (base) | System + custom | 12px (.75rem) | --text-product-base · default text--text-product-base · texto por defecto |
| UI Product (lg) | System + custom | 13px (.8125rem) | --text-product-lg · section titles--text-product-lg · títulos de sección |
| Code / Data | JetBrains Mono | 12–13px | Code, terminal output, numbersCódigo, salida terminal, números |
Note: Cursor uses data-os=linux to switch to system font stack. Respects user's OS font preference — a developer-first accessibility decision.Nota: Cursor usa data-os=linux para cambiar al stack de fuente del sistema. Respeta la preferencia de fuente del OS del usuario — una decisión de accesibilidad developer-first.
Button SystemSistema de Botones
● Primary: bg #f54e00 · radius 6px · padding 8px 16px · font-weight 600 · no border
● Secondary: bg rgba(fff,.07) · border rgba(fff,.12) · radius 4px · font-weight 400
● Accent text buttons: color #f54e00 · bg transparent · hover underline onlyBotones de texto acento: color #f54e00 · bg transparent · hover solo subrayado
● Rule: orange CTA used ONCE per screen. Second most important action is always ghost.Regla: CTA naranja usado UNA VEZ por pantalla. La segunda acción más importante siempre es ghost.
What Shopilot InheritsQué Hereda Shopilot
Split pane 70/30 · WebContentsView architecture · --g/--v base units · opacity token scale · status bar 24px · ultra-minimal shadows · one orange CTA ruleSplit pane 70/30 · Arquitectura WebContentsView · Unidades base --g/--v · Escala de tokens de opacidad · Status bar 24px · Sombras ultra-mínimas · Regla de un CTA naranja
HubSpot / Canvas Design System
CRM & Marketing Platform · Cambridge MA · 2006 · Public ($HUBS)Plataforma CRM y Marketing · Cambridge MA · 2006 · Pública ($HUBS)
Brand PhilosophyFilosofía de Marca
"Sprocket-right: interfaces must work for the user, not impress other designers"
HubSpot's Canvas system represents 20 years of B2B SaaS learning. Their core insight: beautiful design at enterprise scale means designing for efficiency and clarity, not aesthetics. Every component is tested against "does this help the user complete their task faster?" The orange brand color (#ff7a00) was chosen for energy, approachability, and differentiation from blue-dominant CRM competitors (Salesforce). Canvas explicitly codifies the philosophy that function precedes form.El sistema Canvas de HubSpot representa 20 años de aprendizaje en SaaS B2B. Su insight principal: diseño hermoso a escala enterprise significa diseñar para eficiencia y claridad, no estética. Cada componente se prueba contra "¿esto ayuda al usuario a completar su tarea más rápido?". El color naranja de marca (#ff7a00) fue elegido por energía, cercanía y diferenciación de los competidores CRM dominados por azul (Salesforce). Canvas codifica explícitamente la filosofía de que la función precede a la forma.
Color SystemSistema de Color
#ffffff
Base White
Primary background
#2D3E50
Midnight Blue
Primary text · headers
#ff7a00
Calypso Orange
Brand · CTAs
#00BDA5
Teal
Success · secondary CTA
#F5C26B
Flax
Warning · alerts
#EAF0F6
Mist Gray
Panel backgrounds
#516F90
Regent Gray
Secondary text
#F2545B
Alizarin
Error · destructive
Typography + ButtonsTipografía + Botones
FontsFuentes
Display: HubSpot Serif (custom, Typekit)
UI: HubSpot Sans (custom, Typekit)
Code: Lucida Console / Courier New (fallback)
Scale: 12 · 14 · 16 · 20 · 24 · 32 · 40 · 48px
Radius: --cl-radius ~6px standard
Icons: SVG fill:currentColor · 2rem default · .cl-icon class
ButtonsBotones
What Shopilot InheritsQué Hereda Shopilot
Merchant-first philosophy · Data table density · Function over aesthetics principle · Multiple semantic colors for different alert types · Sprocket-right thinkingFilosofía merchant-first · Densidad de tablas de datos · Principio función sobre estética · Múltiples colores semánticos para tipos de alerta · Pensamiento Sprocket-right
Linear
Project Management Tool · San Francisco · 2019 · YC W20Herramienta de Gestión de Proyectos · San Francisco · 2019 · YC W20
Brand PhilosophyFilosofía de Marca
"Speed is a feature — every interaction must feel instantaneous"
Linear's brand is built on the premise that design debt in productivity tools costs people hours every week. Their aesthetic is extreme minimalism — not because it looks good, but because every unnecessary element steals attention. The indigo brand color (#5e6ad2) was chosen for calm authority: it communicates "serious tool for serious work" without being cold or aggressive. Background Woodsmoke (#1a1a1e) is the darkest of the reference brands — near-black, but slightly purple-shifted for warmth.La marca de Linear se construye sobre la premisa de que la deuda de diseño en herramientas de productividad le cuesta a la gente horas cada semana. Su estética es minimalismo extremo — no porque se vea bien, sino porque cada elemento innecesario roba atención. El color índigo de marca (#5e6ad2) fue elegido por autoridad tranquila: comunica "herramienta seria para trabajo serio" sin ser frío ni agresivo. El fondo Woodsmoke (#1a1a1e) es el más oscuro de las marcas de referencia — casi negro, pero ligeramente desplazado hacia el púrpura para dar calidez.
Color SystemSistema de Color
#1a1a1e
Woodsmoke
Primary bg · dark
#111116
Sidebar BG
Navigation panel
#5e6ad2
Indigo Brand
Logo · selected · CTAs
#8b8fa8
Oslo Gray
Secondary text
#25252a
Surface
Card backgrounds
#2e3035
Hover Surface
Row hover state
#4cb782
Done Green
Completed state
#eb5757
Cancelled Red
Error · blocked state
● Design rules:Reglas de diseño: No gradients ever · No decorative shadows · Use opacity over new colors · Border: 1px rgba(255,255,255,.06) onlySin gradients nunca · Sin sombras decorativas · Usar opacidad en lugar de nuevos colores · Borde: solo 1px rgba(255,255,255,.06)
● Keyboard-first: every action reachable without mouse. Speed is communicated through interaction, not animation.cada acción alcanzable sin mouse. La velocidad se comunica a través de la interacción, no de la animación.
Typography + ButtonsTipografía + Botones
Display: Inter Display · weights 300 (light) + 700 (bold)
UI: Inter · weights 400/500
Code: JetBrains Mono · 12–13px
Scale: 11 · 12 · 13 · 14 · 16 · 20 · 28 · 40px
Line height: 1.4 UI · 1.6 body
What Shopilot InheritsQué Hereda Shopilot
No gradients / no decorative shadows · Opacity token approach · Keyboard-first mindset · Dark bg with slight warm purple shift · Extreme information density without visual noiseSin gradients / sin sombras decorativas · Enfoque de tokens de opacidad · Mentalidad keyboard-first · Fondo oscuro con ligero tono púrpura cálido · Densidad de información extrema sin ruido visual
Vercel / Geist Design System
Frontend Cloud Platform · San Francisco · 2015 · YC W16Plataforma Cloud Frontend · San Francisco · 2015 · YC W16
Brand PhilosophyFilosofía de Marca
"Black canvas: dark mode is not a theme, it's the identity"
Vercel's brand is the most radical of the six. Pure black (#000000) as the primary background — not dark navy, not warm dark, pure black. This is intentional: developers live in dark mode, and Vercel wants to be the platform that feels like the best developer tool they've ever used. Maximum contrast, maximum focus. The Geist typeface (custom, now open source) was designed specifically for developer interfaces: geometric sans for UI, geometric mono for code. No accent color — pure black/white/gray hierarchy.La marca de Vercel es la más radical de las seis. Negro puro (#000000) como fondo primario — no navy oscuro, no oscuro cálido, negro puro. Esto es intencional: los developers viven en dark mode, y Vercel quiere ser la plataforma que se siente como la mejor herramienta de developer que han usado. Contraste máximo, foco máximo. La tipografía Geist (custom, ahora open source) fue diseñada específicamente para interfaces de developer: geométrica sans para UI, geométrica mono para código. Sin color de acento — jerarquía pura negro/blanco/gris.
Color System — Pure Grayscale + FunctionalSistema de Color — Escala de Grises Pura + Funcional
#000
#111
#333
#444
#666
#888
#eaeaea
#fafafa
Blue · Links · Info
Cyan · Success
Pink · Error/Warning
TypographyTipografía
Display/UI: Geist Sans (open source, Google Fonts)
Code/Data: Geist Mono (open source, Google Fonts)
Scale: 12 · 14 · 16 · 20 · 24 · 32 · 48 · 64px
Weight: 400 body · 500 medium · 600 semibold · 700 bold
Radius: 6px standard · 8px cards · 12px modal
ButtonsBotones
What Shopilot InheritsQué Hereda Shopilot
Dark-first approach · Pure functional color (no decoration) · High contrast focus ring · Developer-dense information hierarchy · Geist Mono (open source alternative to JetBrains Mono)Enfoque dark-first · Color puramente funcional (sin decoración) · Focus ring alto contraste · Jerarquía de información densa para developers · Geist Mono (alternativa open source a JetBrains Mono)
Shopify / Polaris Design System
Commerce Platform · Ottawa · 2006 · Public ($SHOP)Plataforma de Comercio · Ottawa · 2006 · Pública ($SHOP)
Brand PhilosophyFilosofía de Marca
"Merchant-first: every decision evaluated from the merchant's perspective"
Polaris is the most mature design system in this study — 7+ years of iteration, thousands of components, and a philosophy that has been consistently proven: clarity beats elegance. Shopify's merchant is not a designer or developer — they're a small business owner who needs to act fast and make money. The design system's entire vocabulary is optimized for task completion speed, not visual delight. The green brand color grew from the Shopify logo and represents growth, money, and success.Polaris es el sistema de diseño más maduro de este estudio — 7+ años de iteración, miles de componentes, y una filosofía consistentemente probada: la claridad supera a la elegancia. El comerciante de Shopify no es diseñador ni developer — es un dueño de pequeño negocio que necesita actuar rápido y ganar dinero. El vocabulario completo del sistema de diseño está optimizado para la velocidad de completar tareas, no para el deleite visual. El color verde de marca creció del logo de Shopify y representa crecimiento, dinero y éxito.
Color SystemSistema de Color
#FAFAFA
Background
Light mode primary
#202223
Ink
Primary text
#008060
Interactive Green
CTAs · brand
#95BF47
Logo Green
Brand logo only
#5C5F62
Subdued
Secondary text
#D82C0D
Critical
Error · destructive
#FFC453
Warning
Alert states
#AEE9D1
Success Light
Success bg tint
TypographyTipografía
All: Inter (UI) · system-ui fallback
Scale: 12 · 14 · 16 · 20 · 26 · 32px
Radius: 4px inputs · 8px cards · 12px modals
Data Viz Rules:Reglas de Data Viz:
● Totals bold + row 1 · Focus: 1 insight/chartTotales en negrita + fila 1 · Foco: 1 insight/chart
● Multiple data formats (table + chart always)Múltiples formatos de datos (tabla + chart siempre)
ButtonsBotones
What Shopilot InheritsQué Hereda Shopilot
Seller-first decision framework · Data viz rules (totals first, 1 insight) · Semantic color discipline · Clarity > elegance principle · A11y requirements for data tablesFramework de decisiones seller-first · Reglas data viz (totales primero, 1 insight) · Disciplina de color semántico · Principio claridad > elegancia · Requisitos a11y para tablas de datos
Brex
Corporate Fintech · San Francisco · 2017 · YC W17 · $12.3B valuationFintech Corporativo · San Francisco · 2017 · YC W17 · Valoración $12.3B
Brand PhilosophyFilosofía de Marca
"Make money management effortless for ambitious companies"
Brex is the closest contextual analogue to Shopilot in terms of trust architecture. Both handle real money on behalf of businesses, both require the UI to communicate precision and authority. Brex's design evolved from a startup-y orange era to a mature, premium dark theme. Current palette: near-black backgrounds (#0E0E0E), warm coral/salmon accent for CTA emphasis, Söhne as the premium custom typeface. The warm coral (not pure orange) signals "approachable financial authority" — slightly warmer than corporate, slightly cooler than consumer fintech.Brex es el análogo contextual más cercano a Shopilot en términos de arquitectura de confianza. Ambos manejan dinero real en nombre de negocios, ambos requieren que la UI comunique precisión y autoridad. El diseño de Brex evolucionó de una era naranja de startup a un tema oscuro premium maduro. Paleta actual: fondos casi-negros (#0E0E0E), acento coral/salmón cálido para énfasis CTA, Söhne como la tipografía premium custom. El coral cálido (no naranja puro) señala "autoridad financiera accesible" — ligeramente más cálido que el corporativo, ligeramente más frío que el fintech consumer.
Color SystemSistema de Color
#0E0E0E
Background Dark
Near-black · product UI
#FFFDF9
Background Light
Warm off-white
#F27B6B
Coral Accent
CTAs · brand emphasis
#FF5200
Hot Orange
High-urgency CTAs
#1A1A1A
Surface
Cards, panels
#2D2D2D
Border/Stroke
Dividers, outlines
#00C278
Success Green
Positive states
#FF4444
Error Red
Errors · blocks
TypographyTipografía
Display: Söhne (Klim Type Foundry) · €€€
UI: Söhne · weights 300/400/600
Data: Söhne Mono (tabular figures)
Scale: 11 · 13 · 15 · 18 · 24 · 36 · 48px
Key: Tabular figures for all financial data (tnum feature)Figuras tabulares para todos los datos financieros (feature tnum)
ButtonsBotones
Key Insights for ShopilotInsights Clave para Shopilot
● Trust architecture: Trust-critical data (balances, transactions) gets highest contrast (white-on-black). Secondary info gets progressively less contrast.Arquitectura de confianza: Los datos críticos de confianza (balances, transacciones) obtienen el mayor contraste (blanco sobre negro). La info secundaria obtiene progresivamente menos contraste.
● Tabular nums: All financial data uses font-variant-numeric: tabular-nums so numbers align vertically in tables.Nums tabulares: Todos los datos financieros usan font-variant-numeric: tabular-nums para que los números se alineen verticalmente en tablas.
● Could inspire Shopilot: Near-black background · Coral warm accent · Tabular nums for prices · Söhne inspiration (use Inter + JetBrains Mono as accessible equivalent)Podría inspirar a Shopilot: Fondo casi-negro · Acento coral cálido · Nums tabulares para precios · Inspiración Söhne (usar Inter + JetBrains Mono como equivalente accesible)
Mercury
Neobank for Startups · San Francisco · 2019 · YC S19 · $1.62B valuationNeobank para Startups · San Francisco · 2019 · YC S19 · Valoración $1.62B
Brand PhilosophyFilosofía de Marca
"Banking that gets out of your way"
Mercury achieved something extremely rare: making banking software look desirable. Their dark-mode-first interface (a radical choice for financial software in 2019) communicated that they understood their customer — tech founders who live in dark terminals. The Mercury Sans custom typeface has a slight humanist influence that prevents the bank UI from feeling cold and bureaucratic. The teal/blue accent is intentionally understated — mercury (the element) is subtle, precise, reflects its environment.Mercury logró algo extremadamente raro: hacer que el software bancario se viera deseable. Su interfaz dark-mode-first (una elección radical para software financiero en 2019) comunicó que entendían a su cliente — fundadores tech que viven en terminales oscuros. La tipografía custom Mercury Sans tiene una ligera influencia humanista que evita que la UI bancaria se sienta fría y burocrática. El acento teal/azul es intencionalmente contenido — el mercurio (el elemento) es sutil, preciso, refleja su entorno.
Color SystemSistema de Color
#0A0A0A
Background
Near-pure black
#FAFAF9
Light BG
Warm off-white
#4AA8FF
Mercury Blue
CTAs · links · selected
#00BFA5
Teal
Balance · positive
#141414
Surface
Cards · panels
#1E1E1E
Hover Surface
Row hover
#FF5F5F
Alert Red
Errors · negative bal.
#F5A623
Warning Amber
Low balance · pending
TypographyTipografía
Display/UI: Mercury Sans (custom, humanist geometric)
Numbers: Tabular lining figures (font-variant-numeric)
Code: Fira Code / iA Writer Mono (code blocks)
Weight: 300 light · 400 regular · 500 medium · 600 semibold
Spacing: letter-spacing: -0.01em for display text
Buttons + UI PatternsBotones + Patrones UI
Radius: 12px (rounded, approachable) · Borders: ultra-subtle rgba · Balance displayed in large mono at top of every pageRadio: 12px (redondeado, accesible) · Bordes: ultra-sutiles rgba · Balance mostrado en mono grande al inicio de cada página
Key Insights for ShopilotInsights Clave para Shopilot
Dark-first banking sets the precedent that serious financial tools CAN be dark mode · Balance/KPI always displayed in large mono (same as Shopilot GMV) · 12px radius makes data dense while remaining approachable · Warm off-white light mode for reports/print contextsEl banking dark-first sienta el precedente de que las herramientas financieras serias PUEDEN ser dark mode · Balance/KPI siempre mostrado en mono grande (igual que GMV de Shopilot) · Radio 12px hace los datos densos mientras permanecen accesibles · Off-white cálido modo claro para reportes/contextos de impresión
Retool
Internal Tools Builder · San Francisco · 2017 · YC S17 · $3.2B valuationConstructor de Herramientas Internas · San Francisco · 2017 · YC S17 · Valoración $3.2B
Brand PhilosophyFilosofía de Marca
"Build internal tools, 10x faster"
Retool is the master of data-dense UI. Their product is literally a table+form builder — every design decision serves the goal of making dense grids of data scannable and actionable. Their canvas-style editor is perhaps the most data-rich interface in SaaS. Blue accent (#3B5EE7) was chosen for authority and trust — similar to financial platforms but more "engineering-y" than coral/orange. The dark background (#202124) is slightly warm-gray, similar to VS Code, which their developer audience knows instinctively.Retool es el maestro de la UI densa en datos. Su producto es literalmente un constructor de tabla+formulario — cada decisión de diseño sirve al objetivo de hacer que las cuadrículas de datos densas sean escaneables y accionables. Su editor canvas es quizás la interfaz más rica en datos del SaaS. El acento azul (#3B5EE7) fue elegido por autoridad y confianza — similar a las plataformas financieras pero más "ingenieril" que coral/naranja. El fondo oscuro (#202124) es ligeramente gris cálido, similar a VS Code, que su audiencia de developers conoce instintivamente.
Color SystemSistema de Color
#202124
BG Dark
Warm gray (VS Code-ish)
#F8F9FA
BG Light
Default canvas
#3B5EE7
Blue Brand
Selected · CTAs
#5C7CFA
Blue Light
Hover · focus
#2C2D30
Surface
Panel bg
#37383B
Border
Dividers
#2ECC71
Success
OK states
#E74C3C
Error
Error states
Data Table Design (Core Pattern)Diseño de Data Table (Patrón Core)
● Row height: 32px compact · 40px default · 48px comfortable (user-configurable)Altura de fila: 32px compacto · 40px default · 48px cómodo (configurable por usuario)
● Header: sticky · sortable · resizable columns · filter per columnHeader: sticky · ordenable · columnas redimensionables · filtro por columna
● Numbers: Right-aligned in all numeric columns · font-variant-numeric: tabular-numsNúmeros: Alineados a la derecha en todas las columnas numéricas · font-variant-numeric: tabular-nums
● Could inspire Shopilot: Compact table density · Column sorting + filtering · Right-aligned numbers · VS Code-familiar warm gray bgPodría inspirar a Shopilot: Densidad de tabla compacta · Ordenación + filtrado de columnas · Números alineados a la derecha · Fondo gris cálido familiar de VS Code
Supabase
Open Source Firebase Alternative · Singapore · 2020 · YC S20 · $200M+ raisedAlternativa Firebase Open Source · Singapur · 2020 · YC S20 · +$200M recaudados
Brand PhilosophyFilosofía de Marca
"Build in a weekend, scale to millions"
Supabase's brand is perhaps the most distinctive in this study: an aggressive, developer-native green (#3ECF8E) on pure dark backgrounds. The green was chosen for its association with databases (terminal text), open source culture (GitHub green), and PostgreSQL. Their brand radiates developer confidence — "we're not trying to be enterprise, we're trying to be the best developer experience." The contrast between near-black backgrounds and the bright emerald is high (7.2:1), making every UI element immediately visible.La marca de Supabase es quizás la más distintiva de este estudio: un verde agresivo y developer-native (#3ECF8E) sobre fondos oscuros puros. El verde fue elegido por su asociación con bases de datos (texto terminal), cultura open source (GitHub verde) y PostgreSQL. Su marca irradia confianza de developer — "no estamos tratando de ser enterprise, estamos tratando de ser la mejor experiencia de developer". El contraste entre fondos casi-negros y el esmeralda brillante es alto (7.2:1), haciendo que cada elemento UI sea inmediatamente visible.
Color SystemSistema de Color
#1C1C1C
BG Dark
Primary background
#111111
BG Deeper
Sidebar / nav
#3ECF8E
Supabase Green
Brand · CTAs · selected
#00C973
Green Vivid
Running / active states
#262626
Surface
Cards
#3F3F3F
Border
Dividers
#F97316
Warning Amber
Attention states
#EF4444
Error Red
Errors · destructive
Typography + Key Insights for ShopilotTipografía + Insights Clave para Shopilot
● Display/UI: Inter (all weights) · Code: Fira Code / UI Monospace
● Radius: 6px uniform — very slightly rounded, feels professional not playfulRadio: 6px uniforme — muy ligeramente redondeado, se siente profesional no juguetón
● Could inspire Shopilot: Proof that a single strong accent color CAN be green for marketplaces (Shopify marketplace tab) · Dark + bright single accent contrast pattern · Warning using orange (candidate reference for Shopilot)Podría inspirar a Shopilot: Prueba de que un solo color de acento fuerte PUEDE ser verde para marketplaces (tab marketplace Shopify) · Patrón de contraste oscuro + acento único brillante · Warning usando naranja (referencia candidata para Shopilot)
PostHog
Open Source Product Analytics · London · 2020 · YC W20 · $225M raisedAnalytics de Producto Open Source · Londres · 2020 · YC W20 · $225M recaudados
Brand PhilosophyFilosofía de Marca
"The only product analytics platform where data stays yours"
PostHog is the most boldly-branded in this study. Hedgehog mascot, golden yellow (#F9BD2B) that actually glows, developer-irreverent tone. Their design deliberately breaks "enterprise SaaS" conventions to signal: we're built by developers, for developers, and we refuse to look boring. However, beneath the playfulness, the data visualization is meticulously precise. Their dark UI (#1D1D27 with purple-shifted dark) keeps analytics dashboards readable 8+ hours a day. The yellow is used sparingly for the most important elements.PostHog es la marca más audaz de este estudio. Mascota de erizo, amarillo dorado (#F9BD2B) que literalmente brilla, tono irreverente de developer. Su diseño rompe deliberadamente las convenciones de "enterprise SaaS" para señalar: somos construidos por developers, para developers, y nos negamos a vernos aburridos. Sin embargo, debajo del juego, la visualización de datos es meticulosamente precisa. Su UI oscura (#1D1D27 con oscuro desplazado hacia púrpura) mantiene los dashboards de analytics legibles 8+ horas al día. El amarillo se usa con moderación para los elementos más importantes.
Color SystemSistema de Color
#1D1D27
BG Dark
Purple-shifted dark
#FFFEF0
BG Light
Golden cream
#F9BD2B
PostHog Yellow
Brand · emphasis
#F54E00
Hot Orange
CTAs · high-priority
#2C2C3A
Surface
Cards · panels
#3C3C50
Border
Dividers
#2AC940
Success
Positive events
#F04438
Error
Error states
Key Insights for ShopilotInsights Clave para Shopilot
Purple-shifted dark backgrounds feel "deeper" than neutral dark — great for analytics views · Data precision underneath playful branding · Yellow used ONLY for the most important metric on screen (same principle that could apply to Shopilot's chosen accent (TBD)) · Chart color palette: 8 distinct hues, all at 60% saturation for harmonyLos fondos oscuros desplazados hacia púrpura se sienten "más profundos" que el oscuro neutro — excelente para vistas de analytics · Precisión de datos bajo una marca juguetona · Amarillo usado SOLO para la métrica más importante en pantalla (mismo principio que podría aplicar al acento elegido de Shopilot (por definir)) · Paleta de colores de charts: 8 tonos distintos, todos al 60% de saturación para armonía
Resend
Developer Email Platform · San Francisco · 2022 · YC W23 · $26M raisedPlataforma de Email para Developers · San Francisco · 2022 · YC W23 · $26M recaudados
Brand PhilosophyFilosofía de Marca
"Email for developers, built by developers"
Resend's brand is pure monochromatic minimalism — perhaps the most extreme in this study. Pure black (#000000), pure grays, one orange accent for the logo and primary CTA only. The philosophy: email infrastructure should be completely invisible, the developer's code is the product. Their UI is so stripped down it looks like GitHub's settings page elevated to art. This design communicates: we're not trying to impress you with UI, we're trying to not get in your way. Strong influence from Vercel's aesthetic (same investor: Guillermo Rauch's orbit).La marca de Resend es minimalismo monocromático puro — quizás el más extremo de este estudio. Negro puro (#000000), grises puros, un acento naranja para el logo y el CTA primario únicamente. La filosofía: la infraestructura de email debe ser completamente invisible, el código del developer es el producto. Su UI está tan despojada que parece la página de configuración de GitHub elevada a arte. Este diseño comunica: no estamos tratando de impresionarte con UI, estamos tratando de no interponernos en tu camino. Fuerte influencia de la estética de Vercel (mismo inversor: órbita de Guillermo Rauch).
Color System — Pure MonochromaticSistema de Color — Monocromático Puro
#000
BG
#0a0a
Surface
#171717
Card
#262626
Border
#525252
Muted
#a3a3a3
Secondary
#ededed
Primary
#fff
Headings
#FF5700
Logo Orange · CTA only
TypographyTipografía
All: Geist Sans + Geist Mono (open source)
Scale: 13 · 14 · 16 · 20 · 28 · 40px
Tracking: letter-spacing: -0.02em headings
Radius: 8px standard (slightly rounded)
ButtonsBotones
Key Insights for ShopilotInsights Clave para Shopilot
Proof that monochromatic + one accent works at scale · #000 vs #171717 vs #262626 — subtle layering creates depth without color · Code + logs = always Geist Mono / JetBrains Mono → reinforces precisionPrueba de que monocromático + un acento funciona a escala · #000 vs #171717 vs #262626 — capas sutiles crean profundidad sin color · Código + logs = siempre Geist Mono / JetBrains Mono → refuerza precisión
Clerk
Authentication Platform · San Francisco · 2021 · YC W22 · $170M raisedPlataforma de Autenticación · San Francisco · 2021 · YC W22 · $170M recaudados
Brand PhilosophyFilosofía de Marca
"The most comprehensive User Management Platform"
Clerk's brand sits at the intersection of developer tools and security software. Purple (#6C47FF) was chosen to differentiate from both the "enterprise blue" space (Okta, Auth0) and the "startup orange" space. It communicates "modern, premium, slightly magical" — auth happens in the background, Clerk makes it elegant. Their dark UI (#131316 — warm-shifted very dark) uses glass-morphism for the prebuilt UI components, an unusual choice that works because authentication is a "gateway moment" that benefits from premium feel.La marca de Clerk se sitúa en la intersección entre herramientas de developer y software de seguridad. El púrpura (#6C47FF) fue elegido para diferenciarse tanto del espacio "azul enterprise" (Okta, Auth0) como del espacio "naranja startup". Comunica "moderno, premium, ligeramente mágico" — la autenticación ocurre en el fondo, Clerk la hace elegante. Su UI oscura (#131316 — muy oscura con tono cálido) usa glass-morphism para los componentes UI prefabricados, una elección inusual que funciona porque la autenticación es un "momento puerta de entrada" que se beneficia de la sensación premium.
Color SystemSistema de Color
#131316
BG Dark
Warm-shifted dark
#FAFAFA
BG Light
Dashboard light mode
#6C47FF
Clerk Purple
Brand · CTAs · focus
#9B7DFF
Purple Light
Hover · secondary
#1C1C21
Surface
Cards
#2C2C35
Border
Dividers
#12B76A
Success
Auth success
#F04438
Error
Auth failure
Key Insights for ShopilotInsights Clave para Shopilot
Glass-morphism for "gateway moments" (login, confirmation dialogs) · Purple differentiation shows you don't need orange to be distinctive · #131316 warm-dark-shifted background similar to Shopilot's own bg · Onboarding modal design: clean step indicators, focus on one action per stepGlass-morphism para "momentos puerta de entrada" (login, diálogos de confirmación) · Diferenciación púrpura muestra que no necesitas naranja para ser distintivo · Fondo oscuro cálido #131316 similar al fondo propio de Shopilot · Diseño de modal de onboarding: indicadores de paso limpios, foco en una acción por paso
Deel
Global HR & Payroll · San Francisco · 2019 · YC W19 · $12B valuationRRHH y Nómina Global · San Francisco · 2019 · YC W19 · Valoración $12B
Brand PhilosophyFilosofía de Marca
"Hire anyone, anywhere — with compliance built in"
Deel handles international payroll for 35,000+ companies — arguably the most complex, trust-critical SaaS product in this study. Their design reflects that weight: corporate navy blue (#1D2130) backgrounds, conservative button styles, clear error states for compliance failures. Nothing flashy — a company trusting you with their global payroll needs you to look like you know what you're doing. The blue palette (#2B6EE4) is authoritative without being aggressive, similar to how a bank presents itself.Deel maneja la nómina internacional de 35,000+ empresas — posiblemente el producto SaaS más complejo y crítico de confianza de este estudio. Su diseño refleja ese peso: fondos azul marino corporativo (#1D2130), estilos de botón conservadores, estados de error claros para fallas de cumplimiento. Nada llamativo — una empresa que te confía su nómina global necesita que parezcas saber lo que estás haciendo. La paleta azul (#2B6EE4) es autoritaria sin ser agresiva, similar a cómo un banco se presenta.
Color SystemSistema de Color
#1D2130
BG Dark Navy
Primary dark surface
#F4F6FA
BG Light
Blue-tinted white
#2B6EE4
Deel Blue
CTAs · brand
#4D8FF0
Blue Light
Hover · secondary
#252A3C
Surface
Cards
#2F3547
Border
Dividers
#00C48C
Success Teal
Paid · approved
#FF647C
Error Coral
Failed · blocked
Key Insights for ShopilotInsights Clave para Shopilot
Navy-shifted dark bg (#1D2130) creates more "financial authority" feel than neutral dark · Compliance status rows: clear color coding (approved=teal, pending=amber, failed=red) · Dense multi-level table hierarchy (company > employee > payment) — similar to Shopilot's ASIN > marketplace > metric hierarchyFondo oscuro desplazado hacia navy (#1D2130) crea más sensación de "autoridad financiera" que el oscuro neutro · Filas de estado de cumplimiento: codificación de color clara (aprobado=teal, pendiente=amber, fallido=rojo) · Jerarquía de tabla multi-nivel densa (empresa > empleado > pago) — similar a jerarquía ASIN > marketplace > métrica de Shopilot
Replit
Browser-based IDE · San Francisco · 2016 · YC W18 · $1.16B valuationIDE en Navegador · San Francisco · 2016 · YC W18 · Valoración $1.16B
Brand PhilosophyFilosofía de Marca
"Code, create, and learn together"
Replit's brand bridges developer-serious and beginner-accessible. Their orange (#F56C2A) is warmer and more playful than Cursor's (#f54e00) — intentional, as Replit serves both students and professionals. The dark background (#0D1117) is identical to GitHub's dark mode — leveraging existing mental models for developers. Their recent pivot to "Replit AI" accelerated their design maturity: more glass effects, more gradient accents, more AI-native patterns. Strong parallel to Shopilot: both are Electron-like experiences where the IDE/marketplace is the primary canvas and AI assistance is the sidebar.La marca de Replit hace un puente entre serio-developer y accesible-principiante. Su naranja (#F56C2A) es más cálido y juguetón que el de Cursor (#f54e00) — intencional, ya que Replit sirve tanto a estudiantes como a profesionales. El fondo oscuro (#0D1117) es idéntico al modo oscuro de GitHub — aprovechando modelos mentales existentes de developers. Su pivot reciente a "Replit AI" aceleró su madurez de diseño: más efectos de vidrio, más acentos degradados, más patrones AI-native. Fuerte paralelismo con Shopilot: ambas son experiencias tipo Electron donde el IDE/marketplace es el canvas primario y la asistencia AI es la sidebar.
Color SystemSistema de Color
#0D1117
BG Dark
GitHub-identical dark
#F6F8FA
BG Light
GitHub-identical light
#F56C2A
Replit Orange
Brand · CTAs
#FF7B54
Orange Light
Hover state
#161B22
Surface
Cards · panels
#21262D
Border
Dividers
#3FB950
Success
Build success
#F85149
Error
Build error
Key Insights for ShopilotInsights Clave para Shopilot
Split IDE+AI sidebar = exact Shopilot architecture · GitHub-familiar dark (#0D1117) leverages existing developer trust · Orange on very dark bg creates high contrast CTA that developers actually click · AI sidebar streaming pattern identical to Shopilot's coaching sidebarSplit IDE+AI sidebar = arquitectura exacta de Shopilot · Oscuro familiar de GitHub (#0D1117) aprovecha confianza existente de developers · Naranja sobre fondo muy oscuro crea CTA de alto contraste que developers realmente hacen click · Patrón de streaming de sidebar AI idéntico a la sidebar de coaching de Shopilot
Luma
Event Platform · San Francisco · 2020 · YC W21 · $150M raisedPlataforma de Eventos · San Francisco · 2020 · YC W21 · $150M recaudados
Brand PhilosophyFilosofía de Marca
"Beautiful event pages that convert"
Luma is the most aesthetically-ambitious brand in this study. Where other products in this list use minimalism as a constraint, Luma uses it as a canvas. Their gradient-based identity (iridescent teal-purple-magenta) feels luxurious without being cluttered. Dark background (#09090B — the darkest in this study, almost absolute black) makes the gradients pop like neon lights in a dark room. Included here because Luma shows what happens when you invest in aesthetic excess as a differentiator — events need to feel exciting, and Luma's brand creates that emotional response. Relevant to Shopilot's onboarding and marketing pages.Luma es la marca más ambiciosa estéticamente de este estudio. Donde otros productos de esta lista usan el minimalismo como restricción, Luma lo usa como lienzo. Su identidad basada en degradados (teal-púrpura-magenta iridiscente) se siente lujosa sin estar saturada. El fondo oscuro (#09090B — el más oscuro de este estudio, casi negro absoluto) hace que los degradados resalten como luces de neón en una habitación oscura. Incluida aquí porque Luma muestra lo que sucede cuando inviertes en exceso estético como diferenciador — los eventos necesitan sentirse emocionantes, y la marca de Luma crea esa respuesta emocional. Relevante para las páginas de onboarding y marketing de Shopilot.
Color SystemSistema de Color
#09090B
BG Absolute
Near-perfect black
#FAFAFA
BG Light
Clean off-white
gradient
Brand Iridescent
Teal→Purple→Pink
#A855F7
Primary Purple
CTAs on dark bg
#141416
Surface
Cards
#1C1C1F
Surface 2
Nested cards
#4FACFE
Teal Blue
Info · links
#EC4899
Pink Accent
Featured · special
Key Insights for ShopilotInsights Clave para Shopilot
Gradient accents for marketing pages only (NOT product UI) — this is the lesson · #09090B absolute black → glass cards on top create incredible depth with zero shadows · Premium "entrance" moments deserve gradient treatment (Shopilot: first-login, marketplace activation) · Inter Display with tight letter-spacing (-0.04em) = expensive look at zero costAcentos degradados solo para páginas de marketing (NO UI de producto) — esta es la lección · Negro absoluto #09090B → tarjetas de vidrio encima crean profundidad increíble con cero sombras · Los momentos "entrada" premium merecen tratamiento degradado (Shopilot: primer login, activación marketplace) · Inter Display con espaciado de letras ajustado (-0.04em) = apariencia costosa a costo cero
Brand Identity: NOT DEFINED YETIdentidad de Marca: AÚN NO DEFINIDA
This section is a decision framework — a structured guide to the brand choices that must be made before any design system can be built. Nothing here is decided. The references above are inspiration material only.Esta sección es un framework de decisiones — una guía estructurada de las decisiones de marca que deben tomarse antes de construir cualquier design system. Nada aquí está decidido. Las referencias anteriores son solo material de inspiración.
Brand Decision Log — StatusRegistro de Decisiones de Marca — Estado
| # | DecisionDecisión | OptionsOpciones | StatusEstado |
|---|---|---|---|
| 01 | Brand philosophy / taglineFilosofía de marca / tagline | 3 candidates below3 candidatos abajo | PENDING |
| 02 | Primary colorColor primario | 4 palette candidates below4 paletas candidatas abajo | PENDING |
| 03 | Typography stackStack tipográfico | 3 pairings below3 combinaciones abajo | PENDING |
| 04 | Logo directionDirección del logo | Wordmark / Icon+Text / Abstract markWordmark / Icono+Texto / Marca abstracta | PENDING |
| 05 | Dark vs Light vs BothOscuro vs Claro vs Ambos | Lean: dark-first · Risk: alienates someRecomendación: dark-first · Riesgo: aliena a algunos | PENDING |
| 06 | Brand voice / personalityVoz / personalidad de marca | Expert coach / Trusted advisor / Efficient toolCoach experto / Asesor de confianza / Herramienta eficiente | PENDING |
Decision 01 · Brand PhilosophyDecisión 01 · Filosofía de Marca
CHOOSE ONEBased on the 16 brands studied, three directions emerged as viable for Shopilot. Each implies a different visual language, color family, and interaction tone.De las 16 marcas estudiadas, surgieron tres direcciones viables para Shopilot. Cada una implica un lenguaje visual, familia de colores e interacción diferente.
A · "Warm Precision"
Warm neutral backgrounds, orange/amber accent, trust through clarity. References: Linear + HubSpot. Best for: sellers who want a tool that feels like a trusted advisor, not a cold dashboard.Fondos neutrales cálidos, acento naranja/ámbar, confianza a través de la claridad. Referencias: Linear + HubSpot. Mejor para: sellers que quieren una herramienta que se siente como asesor de confianza, no un dashboard frío.
B · "Data Intelligence"
Pure dark, electric blue accent, Bloomberg-inspired density. References: Datadog + Bloomberg. Best for: power sellers who see the product as a professional data terminal, prioritizing information density over warmth.Oscuro puro, acento azul eléctrico, densidad estilo Bloomberg. Referencias: Datadog + Bloomberg. Mejor para: sellers avanzados que ven el producto como terminal de datos profesional, priorizando densidad sobre calidez.
C · "Growth Engine"
Dark with green/teal accent, optimistic tone. References: Shopify + Notion. Best for: growth-focused sellers who associate green with profit and want the tool to feel empowering and action-oriented.Oscuro con acento verde/teal, tono optimista. Referencias: Shopify + Notion. Mejor para: sellers orientados al crecimiento que asocian el verde con ganancia y quieren una herramienta empoderada.
Recomendación del estudio: Direction A ("Warm Precision") differentiates most from Helium 10 (purple/2018), Jungle Scout (green/consumer), and Repricer (corporate blue). It positions Shopilot as the only warm, AI-native seller tool. However — this is a recommendation, not a decision.La dirección A ("Warm Precision") diferencia más de Helium 10 (morado/2018), Jungle Scout (verde/consumidor) y Repricer (azul corporativo). Posiciona a Shopilot como la única herramienta de vendedor cálida y AI-native. Sin embargo — esto es una recomendación, no una decisión.
Decision 02 · Primary Color PaletteDecisión 02 · Paleta de Color Principal
CHOOSE ONEThese 4 candidates were derived from the competitive analysis. Each avoids direct collision with existing tools in the market.Estos 4 candidatos se derivaron del análisis competitivo. Cada uno evita colisión directa con herramientas existentes en el mercado.
Orange — #F97316
Energy + action. Competitive differentiation from purple (Helium 10), blue (Repricer), green (Jungle Scout). HubSpot owns "CRM orange" — risk: some overlap perception.Energía + acción. Diferenciación de morado (Helium 10), azul (Repricer), verde (Jungle Scout). HubSpot posee "CRM naranja" — riesgo: percepción de overlap.
Indigo — #6366F1
Intelligence + trust. Used by Linear. Risk: perceived as too similar to Helium 10's purple. Benefit: associates with AI/tech precision.Inteligencia + confianza. Usado por Linear. Riesgo: percibido demasiado similar al morado de Helium 10. Beneficio: asocia con precisión AI/tech.
Sky Blue — #0EA5E9
Clarity + openness. Clean differentiation. Risk: overly generic in SaaS. Benefit: universally accessible, no color blindness issues.Claridad + apertura. Diferenciación limpia. Riesgo: demasiado genérico en SaaS. Beneficio: universalmente accesible, sin problemas de daltonismo.
Violet — #8B5CF6
Premium + AI. High association with AI products (Claude, Perplexity). Risk: Helium 10 has purple brand equity. Benefit: strong AI-native signal to tech-savvy sellers.Premium + AI. Alta asociación con productos AI (Claude, Perplexity). Riesgo: Helium 10 tiene equity de marca morada. Beneficio: señal AI-native fuerte para sellers tech-savvy.
What the study recommends:Lo que el estudio recomienda: Orange (#F97316) for maximum warm contrast. But this requires a final call from the team — specifically: does Shopilot want to feel more like a financial tool (blue/indigo) or more like an action-oriented coach (orange)?Naranja (#F97316) para máximo contraste cálido. Pero esto requiere una decisión final del equipo — específicamente: ¿quiere Shopilot sentirse más como herramienta financiera (azul/índigo) o más como coach orientado a la acción (naranja)?
Decision 03 · Typography StackDecisión 03 · Stack Tipográfico
CHOOSE ONE| OptionOpción | Display / UIDisplay / UI | Numbers / CodeNúmeros / Código | ReferenceReferencia |
|---|---|---|---|
| A | Inter | JetBrains Mono | Linear, Vercel — neutral, modern, safe |
| B | Geist / DM Sans | JetBrains Mono | Vercel, Framer — slightly more personality |
| C | IBM Plex Sans | IBM Plex Mono | IBM, Datadog — technical authority, B2B trust |
All 3 options are free, widely available, and render well in Electron. The mono font for numbers is non-negotiable across all options — see Section 14 design rationale for why.Las 3 opciones son gratuitas, ampliamente disponibles y renderizan bien en Electron. La fuente mono para números es innegociable en todas las opciones — ver la sección 14 para el fundamento del diseño.
Decision 04 · Logo DirectionDecisión 04 · Dirección de Logo
CHOOSE ONEWordmark Only
Just the "Shopilot" name in custom lettering. Simple, flexible. Risk: hard to use at small sizes (tray icon, favicon).Solo el nombre "Shopilot" en lettering personalizado. Simple, flexible. Riesgo: difícil a tamaños pequeños.
Icon + Wordmark
Symbol that works standalone (tray, favicon, app icon) + name for contexts with space. Most flexible system.Símbolo que funciona solo (tray, favicon, ícono de app) + nombre para contextos con espacio. Sistema más flexible.
Abstract Mark
Unique geometric shape with no letterform. High memorability ceiling. Risk: requires brand awareness to work — too early for a v1 product.Forma geométrica única sin letterform. Alto techo de memorabilidad. Riesgo: requiere conocimiento de marca — demasiado pronto para v1.
Recommended for v1:Recomendado para v1: Option B (Icon + Wordmark). Allows a small icon in the macOS tray, a medium icon in the dock, and full wordmark in the sidebar. But the icon design itself is a separate creative decision — do not ship a placeholder.Opción B (Ícono + Wordmark). Permite un ícono pequeño en el tray de macOS, ícono mediano en el dock, y wordmark completo en el sidebar. Pero el diseño del ícono en sí es una decisión creativa separada — no hacer ship con un placeholder.
Decision 05 · Dark vs Light ModeDecisión 05 · Modo Oscuro vs Claro
CHOOSE ONEDark-first (recommended by study)Dark-first (recomendado por el estudio)
Cursor, Linear, Arc, Datadog, Claude — all dark-first. Reduces eye strain in long sessions. Numbers pop on dark backgrounds. All reference brands studied use dark mode as the primary experience. Competitive differentiation from Helium 10 (light default).Cursor, Linear, Arc, Datadog, Claude — todos dark-first. Reduce fatiga visual en sesiones largas. Los números destacan sobre fondos oscuros. Diferenciación de Helium 10 (claro por defecto).
Risk of dark-onlyRiesgo de solo oscuro
Some sellers work in bright environments (warehouses, offices). If Shopilot is dark-only, it may feel hard to read in those contexts. A light mode in Phase 2 is strongly advisable. V1: dark only to reduce scope.Algunos sellers trabajan en ambientes brillantes (almacenes, oficinas). Si Shopilot es solo oscuro, puede ser difícil de leer en esos contextos. Un modo claro en Fase 2 es muy recomendable. V1: solo oscuro para reducir el alcance.
→ How to use this section→ Cómo usar esta sección
- Review the 16 reference brand books above — understand what each brand does and why.Revisar los 16 brand books de referencia arriba — entender qué hace cada marca y por qué.
- Make a decision on each of the 6 items in the tracker at the top of this section. Pablo + Mateo + Sergio should be in the room.Tomar una decisión en cada uno de los 6 ítems del tracker al inicio de esta sección. Pablo + Mateo + Sergio deben estar presentes.
- Document the chosen direction back into this spec — replace "PENDING" with the decided value and the rationale.Documentar la dirección elegida de vuelta en este spec — reemplazar "PENDING" con el valor decidido y el razonamiento.
- Only then build design tokens (§14 · Stack) — the CSS custom properties, the Tailwind config, the Style Dictionary pipeline. Building tokens before the brand decisions are made is wasted work.Solo entonces construir los design tokens (§14 · Stack) — las propiedades CSS, el config de Tailwind, el pipeline de Style Dictionary. Construir tokens antes de decidir la marca es trabajo desperdiciado.
- Commission a designer for the logo once the color and philosophy direction are locked. Do not use AI-generated or placeholder marks in any public-facing context.Contratar a un diseñador para el logo una vez que la dirección de color y filosofía esté definida. No usar marcas generadas por AI ni placeholders en ningún contexto público.
Study Synthesis — Patterns Found Across All 16 Brands Síntesis del Estudio — Patrones Encontrados en las 16 Marcas
After analyzing 16 world-class products (Anthropic, Cursor, Linear, Arc, Figma, Stripe, Vercel, HubSpot, Shopify, Datadog, Bloomberg, Notion, Intercom, Brex, Mercury, Luma), 7 universal patterns emerged that every top-tier product shares — regardless of industry, color, or audience. These are conclusions, not recommendations for Shopilot. Tras analizar 16 productos de clase mundial, emergieron 7 patrones universales que comparten todos los productos de primer nivel — independientemente de industria, color o audiencia. Estas son conclusiones del estudio, no recomendaciones para Shopilot.
One strong primary color = brand ownership Un color primario fuerte = propiedad de categoría
Every studied brand owns exactly ONE color. Not two, not a gradient system as their identity — one color that is unmistakably theirs. This color appears on buttons, on the favicon, on the loading state, on the cursor. It becomes the brand. Cada marca estudiada posee exactamente UN color. No dos, no un sistema de gradientes como identidad — un color que es inconfundiblemente suyo. Aparece en botones, favicon, estado de carga y cursor. Se convierte en la marca.
Anthropic #CC785C copper
Linear #5e6ad2 indigo
HubSpot #FF7A59 orange
Shopify #96BF48 green
What the study shows:Lo que el estudio muestra: Color category ownership is first-come-first-served. Purple → Figma/Anthropic. Green → Shopify/Notion. Blue → almost every generic SaaS. Orange → HubSpot. The strongest move for a new brand is to claim a color that no dominant competitor owns in its specific category.La propiedad de color por categoría es "el primero en llegar se sirve primero". Morado → Figma/Anthropic. Verde → Shopify/Notion. Azul → casi todo SaaS genérico. Naranja → HubSpot. El movimiento más fuerte para una nueva marca es reclamar un color que ningún competidor dominante posea en su categoría específica.
Power tools are dark-first — light mode is an afterthought Las herramientas de poder son dark-first — el modo claro es secundario
Of the 16 brands studied: Cursor, Linear, Arc, Datadog, Bloomberg, Claude, Vercel, Brex, Mercury, Retool, PostHog — all ship dark as primary. Light mode exists but is not the designed-for experience. The pattern holds across every product category where users are professionals staring at screens for 6+ hours. De las 16 marcas estudiadas: Cursor, Linear, Arc, Datadog, Bloomberg, Claude, Vercel, Brex, Mercury, Retool, PostHog — todas hacen dark como primario. El modo claro existe pero no es la experiencia diseñada. El patrón se mantiene en toda categoría donde los usuarios son profesionales mirando pantallas por 6+ horas.
| ProductProducto | Primary modeModo primario | BackgroundFondo |
|---|---|---|
| Cursor | Dark | #1B1B1F — near-black, warm |
| Linear | Dark | #0F0F11 — pure dark |
| Claude / Anthropic | Dark | #1A1A2E — violet-shifted dark |
| Arc Browser | Dark | #1C1C1E — macOS standard dark |
| Datadog | Dark | #14131A — purple-shifted |
| Vercel | Dark | #000000 — pure black |
| HubSpot / Stripe | Light | #FFFFFF — pure white |
Study finding:Hallazgo del estudio: The dark backgrounds that work best are NOT pure black (#000). They are near-blacks with a hue shift — warm (#1B1B1F), cool (#0F0F11), violet (#14131A), or macOS system (#1C1C1E). Pure black creates harshness; hue-shifted dark creates depth. Also: the darker the background, the more the accent color pops — which is why dark-first products can use a single, lower-saturation accent and still feel branded.Los fondos oscuros que mejor funcionan NO son negro puro (#000). Son near-blacks con un cambio de tono — cálido (#1B1B1F), frío (#0F0F11), violeta (#14131A) o sistema macOS (#1C1C1E). El negro puro crea dureza; el oscuro con tono crea profundidad.
Typography: 2 fonts maximum — one sans, one mono Tipografía: máximo 2 fuentes — una sans, una mono
Every studied product uses a sans-serif for UI text and a monospace font for all data, code, and numbers. No exceptions. The monospace font for numbers is not a stylistic choice — it is functional: proportional fonts create unstable number columns. Monospace makes data scannable. Cada producto estudiado usa una sans-serif para texto de UI y una mono para datos, código y números. Sin excepciones. La fuente mono para números no es elección estilística — es funcional: las fuentes proporcionales crean columnas de números inestables. La mono hace los datos escaneables.
Sans-serif findingsHallazgos sans-serif
- Inter — Linear, Vercel, Notion, PostHog
- Geist — Vercel (custom, based on Inter)
- SF Pro — Arc, Cursor (system default)
- Söhne / Graphik — Anthropic, Figma
- IBM Plex Sans — Datadog, IBM products
Finding: Inter dominates because it's free, variable weight, and optimized for screens. The system font (SF Pro on Mac) is the "invisible" choice that native apps use for maximum rendering quality.Hallazgo: Inter domina por ser gratuita, variable y optimizada para pantallas.
Mono findingsHallazgos mono
- JetBrains Mono — Cursor, Linear, Vercel
- Fira Code — developer tools generally
- SF Mono — Arc, macOS native
- IBM Plex Mono — Datadog, Brex
- Geist Mono — Vercel (v2)
Finding: JetBrains Mono is the modern standard for developer-adjacent tools. Its ligatures are readable at 10–12px which is where data tables live.Hallazgo: JetBrains Mono es el estándar moderno para herramientas para desarrolladores. Sus ligaduras son legibles a 10-12px.
Motion is functional, not decorative — and it's invisible when done right El movimiento es funcional, no decorativo — es invisible cuando está bien hecho
None of the studied products use animation for visual delight. Every transition serves a purpose: orientation (this panel came from the right), state change (this button is now loading), hierarchy (this modal is above the content). The rule: if you can remove the animation and the user still understands what happened, the animation was decorative. Remove it. Ninguno de los productos estudiados usa animación para deleite visual. Cada transición sirve un propósito: orientación, cambio de estado, jerarquía. Regla: si puedes quitar la animación y el usuario aún entiende qué pasó, la animación era decorativa. Elimínala.
| AnimationAnimación | DurationDuración | PurposePropósito | Seen inVisto en |
|---|---|---|---|
| Hover bg change | 100–150ms | Acknowledge interaction | All products |
| Button press scale | 80ms ease-out | Physical click feedback | Linear, Arc, Luma |
| Modal slide-up | 200–250ms spring | Layer hierarchy | Figma, Notion, Linear |
| Streaming text fade | 80ms per word | Show AI is generating | Claude, Cursor |
| Thinking pulse ··· | 1.2s infinite | AI is processing | Claude, Cursor, Copilot |
| Sidebar collapse | 200ms ease-in-out | Preserve spatial orientation | Linear, Arc, Notion |
AI products share a specific visual language for trust and transparency Los productos AI comparten un lenguaje visual específico de confianza y transparencia
The study of Anthropic, Cursor, and Claude Code revealed a distinct pattern absent in non-AI products: every AI action is visually accountable. You always see what the AI is doing, what tool it used, how long it took. There are no black boxes in the UI of the best AI products. El estudio de Anthropic, Cursor y Claude Code reveló un patrón distinto ausente en productos no-AI: cada acción de la IA es visualmente accountable. Siempre ves qué está haciendo, qué herramienta usó, cuánto tardó. No hay cajas negras en la UI de los mejores productos AI.
AI-native patterns (present in all studied AI products)Patrones AI-native (presentes en todos los AI estudiados)
- ✓Streaming first: never show a spinner while generating textnunca mostrar spinner mientras se genera texto
- ✓Tool transparency: show every tool call with name + duration + resultmostrar cada tool call con nombre + duración + resultado
- ✓Reversibility signals: visually distinguish reversible from irreversible actions before confirmationdistinguir visualmente reversible de irreversible antes de confirmar
- ✓Context visibility: always show what the AI knows (context window, memory, recent files)siempre mostrar qué sabe la IA (ventana de contexto, memoria, archivos recientes)
- ✓Interrupt capability: stop button always visible during AI generationbotón de stop siempre visible durante generación
Anti-patterns (absent in top AI products)Anti-patrones (ausentes en top AI products)
- ✗Skeleton loaders for AI output — creates false expectation of content structureSkeleton loaders para output AI — crea expectativa falsa de estructura
- ✗Generic spinners while thinking — no information, builds anxietySpinners genéricos mientras piensa — sin información, genera ansiedad
- ✗Hiding tool execution — users don't know what changed in their systemsOcultar ejecución de herramientas — usuarios no saben qué cambió
- ✗One-shot confirmation dialogs — no diff, no preview, just "Are you sure?"Confirmaciones de un solo paso — sin diff, sin preview, solo "¿Estás seguro?"
Information density is a product decision, not a design afterthought La densidad de información es una decisión de producto, no un afterthought de diseño
The studied products cluster into two density philosophies — and both work, but for different users. The choice of density must be made at the product level before any design work begins, because it determines spacing tokens, component heights, font sizes, and the entire information architecture. Los productos estudiados se agrupan en dos filosofías de densidad — ambas funcionan, pero para usuarios distintos. La elección de densidad debe hacerse a nivel de producto antes de cualquier trabajo de diseño, porque determina tokens de espaciado, alturas de componentes, tamaños de fuente y toda la arquitectura de información.
High density — expert toolsAlta densidad — herramientas expertas
Bloomberg, Datadog, Retool, Brex. Row height ≈ 32px. Font size: 11–12px. Assume users know what they're looking at. More information per screen = fewer clicks. Used by professionals who stare at it for hours.Bloomberg, Datadog, Retool, Brex. Altura de fila ≈ 32px. Tamaño de fuente: 11-12px. Los usuarios saben lo que están mirando. Más información por pantalla = menos clics.
Comfortable density — balanced toolsDensidad confortable — herramientas balanceadas
Linear, Notion, Intercom, Luma. Row height ≈ 44px. Font size: 13–14px. Sufficient whitespace to feel premium without hiding data. Works for both new and expert users.Linear, Notion, Intercom, Luma. Altura de fila ≈ 44px. Tamaño de fuente: 13-14px. Suficiente espacio en blanco para sentirse premium sin ocultar datos.
Brand = how you speak, not just how you look La marca es cómo hablas, no solo cómo te ves
The strongest brands in the study have a distinct voice in every single word of their UI — button labels, error messages, onboarding copy, empty states, confirmation dialogs. The voice is as distinctive as the color. Stripe writes error messages like a knowledgeable friend. Linear writes UI copy with extreme brevity. Anthropic writes with careful epistemic humility ("I think", "Based on what I know"). Las marcas más fuertes del estudio tienen una voz distintiva en cada palabra de su UI — etiquetas de botones, mensajes de error, copy de onboarding, estados vacíos, diálogos de confirmación. La voz es tan distintiva como el color.
Stripe
Error: "Your card was declined. This sometimes happens if the issuing bank suspects fraud. Try a different card or contact your bank."Error: "Tu tarjeta fue rechazada. A veces ocurre si el banco sospecha fraude. Intenta con otra tarjeta."
Linear
Error: "Failed to sync." ← That's it. No explanation. They trust users to understand context. Extreme brevity as brand.Error: "No se pudo sincronizar." ← Eso es todo. Sin explicación. Brevedad extrema como marca.
Anthropic / Claude
Response: "I'm not certain, but based on what I know..." — epistemic humility baked into every sentence.Respuesta: "No estoy seguro, pero basándome en lo que sé..." — humildad epistémica en cada frase.
Summary — What all world-class products shareResumen — Lo que comparten todos los productos de clase mundial
| DimensionDimensión | Universal patternPatrón universal | Applies to Shopilot?¿Aplica a Shopilot? |
|---|---|---|
| Color | 1 primary accent, 2 functional (success/error), neutral scale | Yes — must decide |
| Background | Near-black with hue shift (not #000 or #111) | Yes — must decide hue |
| Typography | 1 sans for UI + 1 mono for all numbers/data | Yes — must choose pair |
| Motion | 100–250ms, purposeful only, spring easing | Yes — adopt directly |
| AI states | Streaming text, thinking pulse, tool transparency | Yes — core requirement |
| Density | Choose high or comfortable — don't mix | Yes — must decide |
| Voice | Every word of UI reflects brand personality | Yes — must define |
| Logo | Works at 16px (favicon/tray) AND at 200px | Yes — must commission |
What Shopilot Needs — Design Requirements Analysis Lo que Shopilot Necesita — Análisis de Requerimientos de Diseño
Based on the study synthesis and Shopilot's product definition (AI-native Electron desktop app for e-commerce sellers, 70/30 split, 36 tools, marketplace integration), here is every design element the product needs — independent of brand decisions. These are requirements, not solutions. Basado en la síntesis del estudio y la definición del producto Shopilot (app Electron desktop AI-native para sellers de e-commerce, split 70/30, 36 herramientas, integración de marketplace), aquí están todos los elementos de diseño que el producto necesita — independientemente de las decisiones de marca. Estos son requerimientos, no soluciones.
The 15 things Shopilot must complete to have a world-class designLas 15 cosas que Shopilot debe completar para tener un diseño de clase mundial
Single source of truth. Everything in one place. The detailed breakdown is in the categories below — this is the executive view.Fuente única de verdad. Todo en un lugar. El desglose detallado está en las categorías debajo — esta es la vista ejecutiva.
Phase 1 — Brand IdentityFase 1 — Identidad de Marca (before writing a single line of UI code)
| # | TaskTarea | OutputOutput | OwnerOwner | StatusEstado |
|---|---|---|---|---|
| 01 | Run brand workshop — choose Brand Philosophy (what emotion does Shopilot own?)Realizar brand workshop — elegir Filosofía de Marca (¿qué emoción posee Shopilot?) | 1-sentence brand positionPosición de marca en 1 oración | Pablo | PENDING |
| 02 | Decide primary brand color — pick from candidates (see §Brand Decision Framework)Decidir color primario de marca — elegir de candidatos (ver §Brand Decision Framework) | 1 hex value, named, documented1 valor hex, nombrado, documentado | Pablo + team | PENDING |
| 03 | Choose typography pair — UI sans + data mono (see §24 References for options)Elegir par tipográfico — UI sans + data mono (ver §24 Referencias para opciones) | 2 font names, weight scale defined2 nombres de fuentes, escala de pesos definida | Pablo + Sergio | PENDING |
| 04 | Build the color system — dark bg scale (4 tones) + text scale (4 levels) + semantic colorsConstruir el sistema de color — escala dark bg (4 tonos) + escala de texto (4 niveles) + colores semánticos | design-tokens.json — color sectiondesign-tokens.json — sección de color | Sergio | BLOCKED by 02 |
| 05 | Commission logo — wordmark + icon mark, works at 16px and 512pxEncargar logo — wordmark + icon mark, funciona a 16px y 512px | SVG files: logo.svg, icon.svg, favicon.svgArchivos SVG: logo.svg, icon.svg, favicon.svg | Pablo (hire) | BLOCKED by 01+02 |
Phase 2 — UI FoundationFase 2 — Fundación UI (tokens → CSS vars → Tailwind config, semanas 1–2)
| # | TaskTarea | OutputOutput | OwnerOwner | StatusEstado |
|---|---|---|---|---|
| 06 | Complete design-tokens.json — spacing (--g / --v system), radii, shadows, durationCompletar design-tokens.json — espaciado (sistema --g / --v), radios, sombras, duración | tokens.json W3C DTCG format |
Sergio + Mateo | BLOCKED by 04 |
| 07 | Run Style Dictionary pipeline — tokens.json → CSS :root vars + tailwind.config.jsEjecutar pipeline Style Dictionary — tokens.json → CSS :root vars + tailwind.config.js | tokens.css, tailwind.config.js |
Mateo | BLOCKED by 06 |
| 08 | Build Electron window shell — frameless + drag region + macOS traffic lights + 70/30 splitConstruir shell de ventana Electron — frameless + drag region + botones macOS + split 70/30 | Running Electron with correct window chromeElectron corriendo con chrome de ventana correcto | Sergio | PENDING |
| 09 | Implement base atoms — Button (6 variants), Badge, Input, Spinner, Tooltip, DividerImplementar átomos base — Button (6 variantes), Badge, Input, Spinner, Tooltip, Divider | 6 React components using tokens6 componentes React usando tokens | Sergio | BLOCKED by 07 |
Phase 3 — Core ComponentsFase 3 — Componentes Core (semanas 2–6)
| # | TaskTarea | OutputOutput | OwnerOwner | StatusEstado |
|---|---|---|---|---|
| 10 | Build Coach screen — streaming text cursor ▊ + thinking pulse ··· + tool accordion (4 states) + chat inputConstruir pantalla Coach — cursor de texto streaming ▊ + pulso thinking ··· + tool accordion (4 estados) + input de chat | Functional coach view with AI state machineVista coach funcional con máquina de estados AI | Sergio | BLOCKED by 09 |
| 11 | Build Confirmation Dialog — reversible (amber) vs irreversible (red) variants + diff displayConstruir Confirmation Dialog — variantes reversible (amber) vs irreversible (rojo) + diff display | ConfirmationDialog.tsx 2 variants |
Sergio | BLOCKED by 09 |
| 12 | Build KPI card + data table (sortable) + delta badges — the 80% of the Dashboard screenConstruir KPI card + data table (sortable) + delta badges — el 80% de la pantalla Dashboard | Dashboard screen with real dataPantalla Dashboard con datos reales | Sergio + Andrés | BLOCKED by 09 |
| 13 | Build status bar (24px) — agent state dot left + credits + model name rightConstruir status bar (24px) — punto de estado del agente izquierda + créditos + nombre de modelo derecha | StatusBar.tsx always visible |
Sergio | BLOCKED by 08 |
| 14 | Build context bar — active ASIN + marketplace dot + context window progress barConstruir context bar — ASIN activo + punto de marketplace + barra de progreso de context window | ContextBar.tsx |
Sergio | BLOCKED by 09 |
| 15 | Accessibility audit — WCAG AA contrast check on all components, keyboard nav, focus ringsAuditoría de accesibilidad — verificación de contraste WCAG AA en todos los componentes, navegación por teclado, focus rings | 0 WCAG AA violations0 violaciones WCAG AA | Sergio + Andrés | BLOCKED by 09-14 |
Critical path:Ruta crítica: 01 (brand workshop) unblocks everything. Nothing else can start until the team aligns on what emotion Shopilot owns. That's the only decision that can't be delegated or automated.01 (brand workshop) desbloquea todo. Nada más puede empezar hasta que el equipo se alinee en qué emoción posee Shopilot. Es la única decisión que no puede ser delegada ni automatizada.
Category 1 — Brand Identity Elements (detail)Categoría 1 — Elementos de Identidad de Marca (detalle)
ALL MISSINGTODO FALTANTE| ElementElemento | Why neededPor qué se necesita | Used whereUsado dónde | StatusEstado |
|---|---|---|---|
| Logo mark (icon) | Works at 16px — macOS dock, tray, favicon | Electron dock icon, tray, browser tab | MISSING |
| Wordmark (logotype) | Full name, readable at 120px+ | App sidebar header, landing page, screenshots | MISSING |
| Primary brand color | Buttons, links, active states, focus rings | Everywhere interactive — 200+ UI elements | PENDING DECISION |
| Background color scale | Base, surface, card, elevated — 4 dark tones | Every screen, every component | PENDING DECISION |
| Foreground color scale | Primary text, secondary, muted, disabled — 4 levels | All text, labels, placeholders | DERIVES FROM BG |
| Functional colors | Success (green), Warning (amber), Error (red), Info (blue) | Alerts, badges, status indicators, audit log | STANDARD — PICK |
| UI typography (sans) | All text except numbers | Labels, paragraphs, headings, button text | PENDING DECISION |
| Data typography (mono) | All numbers, prices, percentages, code | KPI cards, tables, status bar, audit log | PENDING DECISION |
Category 2 — UI Components Required by the ProductCategoría 2 — Componentes UI Requeridos por el Producto
These are derived from Shopilot's 36 tools and 4 core screens (Coach view, Dashboard, Settings, Billing). Not a design choice — a product requirement.Se derivan de las 36 herramientas de Shopilot y 4 pantallas principales. No es elección de diseño — es un requerimiento del producto.
Foundation (week 1)Fundación (semana 1)
- • Design tokens (CSS vars)
- • Button (6 variants)
- • Input / Textarea
- • Badge / Tag
- • Icon system (Lucide)
- • Tooltip
- • Spinner / Loading
- • Divider
Coach screen (week 2-3)Pantalla Coach (semana 2-3)
- • Chat message (user/AI)
- • Streaming text cursor ▊
- • Thinking pulse ···
- • Tool accordion (4 states)
- • Confirmation dialog
- • Proactive suggestion card
- • Context bar (ASIN + tokens)
- • Chat input + send button
Data screens (week 4-6)Pantallas de datos (semana 4-6)
- • KPI metric card
- • Data table (sortable)
- • Buy Box indicator
- • Price delta bar
- • BSR sparkline
- • Audit log timeline
- • Credit economy bar
- • Fraud alert banner
Category 3 — Electron Desktop-Specific RequirementsCategoría 3 — Requerimientos Específicos de Desktop Electron
These have no equivalent in web apps. Required because Shopilot ships as a native macOS/Windows app, not a browser tab.No tienen equivalente en apps web. Requeridos porque Shopilot es app nativa macOS/Windows, no una pestaña de browser.
- →Title bar: frameless window with drag region + macOS traffic lightsventana sin marco con región de arrastre + botones macOS
- →Tab bar: marketplace switcher (Amazon / MeLi / Shopify) with colored dotsswitcher de marketplace (Amazon / MeLi / Shopify) con puntos de color
- →Status bar: 24px bottom bar — agent state left, credits + model rightbarra inferior 24px — estado del agente izq, créditos + modelo der
- →Tray icon: 16x16 mono SVG + badge count for alertsSVG mono 16x16 + badge para alertas
- →70/30 split: marketplace WebView (left) + React sidebar (right) — visual seam between themWebView de marketplace (izq) + sidebar React (der) — costura visual entre ellos
- →Update modal: version info + changelog + progress + restart buttoninfo de versión + changelog + progreso + botón de reinicio
- →Notification system: 3 levels: in-app banner → OS push → tray badge3 niveles: banner in-app → push OS → badge del tray
- →App icon: 1024×1024px for App Store + 512px for macOS dock1024×1024px para App Store + 512px para dock macOS
Workflow 0 → Complete Brand — The Efficient Path Workflow 0 → Marca Completa — El Camino Eficiente
The most efficient process to go from "no brand" to a production-ready design system that rivals Anthropic, Cursor, or Linear. This is the process — not based on opinion, but on how the reference brands actually built their design systems. El proceso más eficiente para ir de "sin marca" a un design system listo para producción que rivalice con Anthropic, Cursor o Linear. Este es el proceso — no basado en opinión, sino en cómo las marcas de referencia construyeron sus design systems.
The 5-Phase ProcessEl Proceso de 5 Fases
Brand WorkshopBrand Workshop
1–2 days · Pablo + Mateo + Sergio1-2 días · Pablo + Mateo + SergioMake the 6 brand decisions from the Decision Framework above. No design tools needed — just a whiteboard or Notion doc. Output: a 1-page brand brief with every decision locked.Tomar las 6 decisiones de marca del Framework de Decisiones anterior. No se necesitan herramientas de diseño — solo una pizarra o doc de Notion. Output: un brand brief de 1 página con cada decisión bloqueada.
Decisions to lock in this phase:Decisiones a bloquear en esta fase:
Visual Identity in FigmaIdentidad Visual en Figma
3–5 days · Designer (contract) + Pablo review3-5 días · Diseñador (contrato) + revisión PabloThis is where Figma enters — but only for visual identity exploration, not for UI design. The goal is to validate color, logo, and typography before writing a single line of code. Figma is used here because visual decision-making is faster with a canvas tool than in code.Aquí es donde entra Figma — pero solo para exploración de identidad visual, no para diseño de UI. El objetivo es validar color, logo y tipografía antes de escribir una sola línea de código. Figma se usa aquí porque la toma de decisiones visuales es más rápida con una herramienta canvas.
What goes into Figma in Phase 2:Qué va a Figma en la Fase 2:
- • Logo mark explorations (6–10 directions)Exploraciones del logo (6-10 direcciones)
- • Color palette validation (light + dark test)Validación de paleta (test claro + oscuro)
- • Typography specimens (all weights + sizes)Especímenes tipográficos (todos los pesos + tamaños)
- • 3 brand application mockups (app icon, sidebar header, marketing screenshot)3 mockups de aplicación de marca
What does NOT go into Figma in Phase 2:Qué NO va a Figma en la Fase 2:
- • Full UI screens — premature without tokensPantallas completas de UI — prematuro sin tokens
- • Component library — built in code, not FigmaLibrería de componentes — se construye en código
- • User flows — too earlyUser flows — demasiado pronto
Tools for Phase 2:Herramientas para la Fase 2: Figma (free tier is enough) · fontpair.co for typography pairing · Coolors.co or Realtime Colors for palette generation · Adobe Color for accessibility check · Contrast.app for WCAG validationFigma (tier gratuito es suficiente) · fontpair.co para combinación tipográfica · Coolors.co o Realtime Colors para generación de paleta · Adobe Color para verificación de accesibilidad
Design Tokens → CodeDesign Tokens → Código
2 days · Sergio + Mateo2 días · Sergio + MateoOnce brand decisions are locked from Phase 2, translate them into code immediately. This is where Figma connects to Claude Code: take the approved color values and typography from Figma, encode them as design tokens, and generate the CSS + Tailwind config. Claude Code accelerates this from 2 days to 4 hours.Una vez bloqueadas las decisiones de marca de la Fase 2, traducirlas a código inmediatamente. Aquí es donde Figma se conecta con Claude Code: tomar los valores de color y tipografía aprobados de Figma, codificarlos como design tokens, y generar el CSS + Tailwind config.
Figma → Claude Code integration flow:Flujo de integración Figma → Claude Code:
- Export approved brand values from Figma as JSON (Figma Variables → JSON via plugin "Variables Import Export")Exportar valores de marca aprobados desde Figma como JSON (Figma Variables → JSON via plugin)
- Paste JSON into Claude Code: "Convert these brand values to a W3C DTCG tokens.json file"Pegar JSON en Claude Code: "Convierte estos valores de marca a un archivo tokens.json DTCG W3C"
- Claude Code generates: tokens.json + style-dictionary.config.mjs + globals.css + tailwind.config.tsClaude Code genera: tokens.json + style-dictionary.config.mjs + globals.css + tailwind.config.ts
- Run Style Dictionary → CSS custom properties are live in the appEjecutar Style Dictionary → propiedades CSS custom están vivas en la app
- Validate: open Electron app, confirm colors match Figma specValidar: abrir app Electron, confirmar que los colores coinciden con el spec de Figma
Component Library with Claude CodeLibrería de Componentes con Claude Code
3–6 weeks · Sergio (primary) + Claude Code3-6 semanas · Sergio (principal) + Claude CodeThis is the main build phase. All components are defined in Figma (#18 Design System) following Atomic Design (atoms, molecules, organisms, templates, pages). Claude reads the Figma via Figma MCP and implements matching React components in #1 Native Shell. No components are created outside of what is defined in the Figma.Esta es la fase de construcción principal. Todos los componentes están definidos en Figma (#18 Design System) siguiendo Atomic Design (átomos, moléculas, organismos, plantillas, páginas). Claude lee el Figma via Figma MCP e implementa componentes React en #1 Native Shell. No se crean componentes fuera de lo definido en el Figma.
How Claude Code works in this phase:Cómo trabaja Claude Code en esta fase:
- • Spec → component:Spec → componente: Give Claude Code a description from this spec (e.g., "build ToolAccordion with 4 states: queued/running/success/error, using design tokens from globals.css") → it generates the full TypeScript componentDarle a Claude Code una descripción de este spec → genera el componente TypeScript completo
- • Figma → React:Figma → React: Claude reads the Figma component via Figma MCP and generates all variants automatically with matching props and statesClaude lee el componente en Figma via Figma MCP y genera todas las variantes automáticamente con props y estados que coinciden
- • Accessibility audit:Auditoría de accesibilidad: "Review this component for WCAG AA compliance and fix any issues" — Claude Code runs the audit inline"Revisa este componente para cumplimiento WCAG AA y arregla los problemas"
Velocity benchmark:Benchmark de velocidad: A senior engineer without AI: 1 component/week (design + code + test + docs). With Claude Code: 1 component/day. 25 core components in 5 weeks instead of 25 weeks. This is the 5x leverage.Un ingeniero senior sin IA: 1 componente/semana. Con Claude Code: 1 componente/día. 25 componentes core en 5 semanas en lugar de 25. Este es el apalancamiento 5x.
First Real Screen → Test with SellersPrimera Pantalla Real → Test con Sellers
1 week · Full team1 semana · Equipo completoAssemble the Coach View (the 70/30 split screen) using the built components and tokens. Show it to 3 real sellers. At this point the brand is real — not a Figma mockup, not a code spec, but a running Electron application with real brand tokens, real components, and real data. Collect feedback. Iterate.Ensamblar el Coach View (pantalla 70/30) usando los componentes y tokens construidos. Mostrárselo a 3 sellers reales. En este punto la marca es real — no un mockup de Figma, no un code spec, sino una aplicación Electron corriendo con tokens de marca reales, componentes reales y datos reales.
Figma vs Code — When to Use EachFigma vs Código — Cuándo Usar Cada Uno
This is the most common source of wasted effort in early-stage product design. The answer depends on what you're deciding, not on preference.Esta es la fuente más común de esfuerzo desperdiciado en diseño de producto en etapas tempranas. La respuesta depende de qué estás decidiendo, no de preferencia.
| TaskTarea | Use Figma?¿Usar Figma? | WhyPor qué |
|---|---|---|
| Logo explorationExploración de logo | Yes — requiredSí — requerido | Bezier curves, vector editing, proportions — impossible to do well in codeCurvas bezier, edición vectorial — imposible hacerlo bien en código |
| Color palette validationValidación de paleta de color | Yes — fastSí — rápido | Seeing colors in context (on dark bg, next to text) is faster in Figma than spinning up codeVer colores en contexto es más rápido en Figma que arrancar el código |
| Typography testingTesting de tipografía | Yes — fastSí — rápido | Font pairing decisions are visual, not technical. Figma + Google Fonts is 10x faster than code for thisDecisiones de pares de fuentes son visuales. Figma + Google Fonts es 10x más rápido que código para esto |
| User flow diagramsDiagramas de flujo de usuario | OptionalOpcional | Can also use FigJam, Miro, or paper. The flow is the output, not the toolTambién se puede usar FigJam, Miro o papel. El flujo es el output, no la herramienta |
| Individual component designDiseño de componente individual | OccasionallyOcasionalmente | Only for complex components (confirmation dialog, onboarding flow). Simple components: just build in code with Claude CodeSolo para componentes complejos. Simples: construir directo en código con Claude Code |
| Component libraryLibrería de componentes | Yes — source of truthSí — fuente de verdad | Figma (#18 Design System) is the single source of truth following Atomic Design. Claude reads via Figma MCP and implements matching React components. No components created outside FigmaFigma (#18 Design System) es la fuente única de verdad siguiendo Atomic Design. Claude lee via Figma MCP e implementa componentes React. No se crean componentes fuera del Figma |
| Design tokensDesign tokens | No — live in tokens.jsonNo — viven en tokens.json | Figma Variables exist but are secondary. The tokens.json → CSS pipeline is the real systemFigma Variables existen pero son secundarias. El pipeline tokens.json → CSS es el sistema real |
| Full screen prototypesPrototipos de pantalla completa | No — build in ElectronNo — construir en Electron | A running Electron app with real data is a better prototype than any Figma mockup. With Claude Code, the delta in effort is smallUna app Electron corriendo con datos reales es mejor prototipo que cualquier mockup de Figma |
Time to World-Class Brand — Realistic EstimateTiempo para Marca de Clase Mundial — Estimado Realista
Phase 1Fase 1
2d
Brand workshopBrand workshop
Phase 2Fase 2
5d
Visual identityIdentidad visual
Phase 3Fase 3
2d
Tokens → codeTokens → código
Phase 4Fase 4
6w
Component libraryLibrería componentes
Phase 5Fase 5
1w
First real screenPrimera pantalla real
Total: ~8 weeks from zero to a brand that rivals Linear or Cursor. The bottleneck is Phase 2 (finding a designer) and Phase 4 (component build). Everything else is decisions + Claude Code automation.Total: ~8 semanas de cero a una marca que rivaliza con Linear o Cursor. El cuello de botella es la Fase 2 (encontrar diseñador) y la Fase 4 (construcción de componentes). Todo lo demás son decisiones + automatización de Claude Code.
References — Figma, OS Design Systems & Desktop Apps Referencias — Figma, Design Systems de SO y Apps Desktop
The authoritative sources every world-class desktop app is built on: Apple's Human Interface Guidelines, Microsoft Fluent Design, how the best companies use Figma, what Figma Community files to download today, and visual references of the exact apps Shopilot should emulate as a macOS Electron product. Las fuentes autoritativas sobre las que se construye toda app desktop de clase mundial: Apple Human Interface Guidelines, Microsoft Fluent Design, cómo las mejores empresas usan Figma, qué archivos de Figma Community descargar hoy, y referencias visuales de las apps exactas que Shopilot debe emular como producto Electron macOS.
Apple Human Interface Guidelines (HIG)
developer.apple.com/design/human-interface-guidelines · The bible for macOS app designLa biblia del diseño de apps macOS
Every app that feels "native" on macOS — Arc, Cursor, Notion, Linear — follows Apple's HIG. Not as rules, but as a foundation. Understanding HIG tells you why certain things feel right on Mac and wrong on Windows, and what Shopilot must do to feel like a first-class macOS citizen. Cada app que se siente "nativa" en macOS — Arc, Cursor, Notion, Linear — sigue el HIG de Apple. No como reglas, sino como base. Entender el HIG explica por qué ciertas cosas se sienten bien en Mac y mal en Windows.
6 Core HIG Principles — and what they mean for Shopilot6 Principios HIG — y qué significan para Shopilot
1 · Aesthetic Integrity
The app's visual appearance and behavior must be consistent with its purpose. A data tool (Shopilot) should look precise and professional — not playful. Applies to: spacing consistency, typography alignment, color restraint.La apariencia visual y comportamiento deben ser consistentes con el propósito. Una herramienta de datos (Shopilot) debe verse precisa y profesional. Aplica a: consistencia de espaciado, alineación tipográfica, restricción de color.
2 · Consistency
Use standard macOS controls and terminology where possible. Users already know what a sidebar, toolbar, and panel are on Mac. Don't reinvent them — use them. Shopilot's window chrome (title bar, traffic lights, resize handle) must behave as users expect.Usar controles y terminología estándar de macOS donde sea posible. Los usuarios ya saben qué es un sidebar, toolbar y panel en Mac. El chrome de ventana de Shopilot debe comportarse como esperan.
3 · Direct Manipulation
Users should feel they're directly controlling the content on screen. For Shopilot: clicking an ASIN row should immediately feel responsive. Dragging, hovering, and focusing must have immediate visual feedback (≤100ms).Los usuarios deben sentir que controlan directamente el contenido en pantalla. Para Shopilot: hacer clic en una fila ASIN debe sentirse inmediatamente responsivo. Hover y foco deben tener respuesta visual inmediata (≤100ms).
4 · Feedback
Every action must acknowledge the user. Shopilot specifics: button press = visual depress + sound (optional). Loading = progress indicator, not frozen UI. AI thinking = animated cursor ▊ or pulse ···. Error = banner with next action, not silent failure.Cada acción debe reconocer al usuario. Botón = depresión visual. Carga = indicador de progreso. IA pensando = cursor animado. Error = banner con siguiente acción.
5 · User Control
Users — not the app — initiate actions. The AI coach can suggest, but must not act without confirmation on irreversible actions. HIG says: "people should always be in control." This is the origin of Shopilot's reversibility system.Los usuarios — no la app — inician acciones. El coach AI puede sugerir, pero no debe actuar sin confirmación en acciones irreversibles. Esta es la base del sistema de reversibilidad de Shopilot.
6 · Metaphors
Use familiar real-world concepts. Shopilot uses the "coach" metaphor — a trusted advisor who sees the same screen you do and gives guidance. This is why the sidebar is positioned like a coach standing next to you: right side, always visible, never blocking the main view.Usar conceptos reales familiares. Shopilot usa la metáfora del "coach" — un asesor de confianza que ve la misma pantalla. Por eso el sidebar está a la derecha, siempre visible, sin bloquear la vista principal.
macOS Patterns that Shopilot must implement correctlyPatrones macOS que Shopilot debe implementar correctamente
| PatternPatrón | HIG specSpec HIG | Shopilot implementationImplementación Shopilot |
|---|---|---|
| Traffic lights | Red/Yellow/Green at 12px diameter, 8px gap, 20px from left | Frameless window + titleBarStyle:'hiddenInset' preserves native buttons |
| Sidebar | Min width 220px, vibrancy background, grouped sections with headers | Shopilot right sidebar 320px — deviates intentionally (coach, not nav) |
| Toolbar | Height 52px, icon + label, unified with title bar on macOS 11+ | Tab bar (marketplace switcher) sits at top of left pane, height 40px |
| Menu bar | Every Mac app has native menu bar: File, Edit, View, Window, Help | Electron: Menu.setApplicationMenu() — must exist, even if minimal |
| Keyboard shortcuts | Cmd+W close, Cmd+Q quit, Cmd+, preferences — always expected | Must register all standard Mac shortcuts + Shopilot custom (Cmd+K = chat) |
| System colors | Use NSColor system colors that adapt to dark/light automatically | In Electron: CSS env(--system-background-color) or manual token switch |
| Focus ring | Blue ring 3px at system accent color — do NOT remove, required for a11y | Override with brand accent color ring, same shape — never remove entirely |
Reference: What Arc Browser takes from HIGReferencia: Lo que Arc Browser toma del HIG
Arc uses native macOS vibrancy for its sidebar, native traffic lights at the exact HIG position, native context menus via NSMenu, native keyboard shortcut conventions, and the native font stack (SF Pro) for all system-level text. Where Arc deviates from HIG is intentional and branded: the tab bar is vertical instead of horizontal, the command bar replaces the URL bar, the sidebar IS the app chrome. Deviation from HIG is a product decision — but you must know the rules before you break them.Arc usa vibrancy nativa de macOS para su sidebar, traffic lights en la posición exacta del HIG, menús contextuales nativos, convenciones de teclado nativas, y SF Pro para todo el texto del sistema. Donde Arc se desvía del HIG es intencional y de marca: la barra de tabs es vertical, la barra de comandos reemplaza la URL. La desviación del HIG es una decisión de producto — pero debes conocer las reglas antes de romperlas.
Microsoft Fluent Design System 2
fluent2.microsoft.design · Windows 11 design languageLenguaje de diseño Windows 11
Shopilot targets macOS first, but Windows build comes in Sprint 11-12. Fluent Design 2 is the official design system for Windows 11 apps. Understanding it now prevents a costly redesign later — and it informs several patterns (Acrylic material, Mica background) that translate beautifully to dark Electron apps on both platforms. Shopilot apunta a macOS primero, pero el build de Windows viene en Sprint 11-12. Fluent Design 2 es el design system oficial para apps Windows 11. Entenderlo ahora previene un rediseño costoso después.
5 Fluent Design Principles5 Principios de Fluent Design
Light
Light as a design element — Reveal highlight: a subtle glow appears under the cursor on interactive elements. Creates depth without shadows. In Electron: CSS radial-gradient on mousemove.La luz como elemento de diseño — Reveal highlight: brillo sutil bajo el cursor en elementos interactivos. En Electron: CSS radial-gradient en mousemove.
Depth
Layers at different Z-levels with Acrylic (frosted glass) and Mica (wallpaper-blended background) materials. For Shopilot: the glass-card pattern directly adopts this — backdrop-filter: blur() is Electron's Acrylic.Capas en diferentes niveles Z con materiales Acrílico (cristal esmerilado) y Mica. Para Shopilot: el patrón glass-card adopta esto — backdrop-filter: blur() es el Acrílico de Electron.
Motion
Connected animations — elements travel between states instead of disappearing and reappearing. Fluent easing: cubic-bezier(0.1, 0.9, 0.2, 1). Used by VS Code, Microsoft Edge, Teams.Animaciones conectadas — los elementos viajan entre estados en lugar de desaparecer y reaparecer. Easing Fluent: cubic-bezier(0.1, 0.9, 0.2, 1).
Material
Acrylic: backdrop-filter: blur(30px) saturate(180%) — used for sidebars, flyouts, menus. Mica: wallpaper color extracted and used as tint in app chrome. Both create sense of app being part of the OS.Acrílico: backdrop-filter: blur(30px) saturate(180%) — para sidebars, flyouts, menús. Mica: color del fondo del escritorio extraído como tinte en el chrome de la app.
Scale
Design for multiple device types. In Shopilot's context: design for minimum 900×600px window, scale gracefully to 2560×1440 (UltraWide). Touch targets minimum 44×44px even on desktop (for touch-screen Windows laptops).Diseñar para múltiples tipos de dispositivos. Contexto Shopilot: mínimo 900×600px, escalar a 2560×1440. Touch targets mínimo 44×44px incluso en desktop.
Fluent Typography — Segoe UI VariableTipografía Fluent — Segoe UI Variable
Windows 11 uses Segoe UI Variable — a variable font that covers all weights and optical sizes. On Windows, Electron apps that use Inter or system-ui automatically map to Segoe UI Variable. No action needed for the font on Windows builds.Windows 11 usa Segoe UI Variable — fuente variable que cubre todos los pesos. En Windows, apps Electron que usan Inter o system-ui mapean automáticamente a Segoe UI Variable.
Fluent Type Ramp (Windows 11):Escala tipográfica Fluent (Windows 11):
- Caption · 12px · Regular
- Body · 14px · Regular
- Body Strong · 14px · Semibold
- Subtitle · 20px · Semibold
- Title · 28px · Semibold
- Title Large · 40px · Semibold
- Display · 68px · Semibold
Key difference vs Apple HIG:Diferencia clave vs Apple HIG:
Apple HIG uses 17pt as base body size (SF Pro at 17pt = Inter at ~14px). Fluent uses 14px body. On Windows, everything feels slightly larger. If you design for macOS at 13px body text, Windows will look right at 14px. Build token --body-size to switch per platform.Apple HIG usa 17pt como base (SF Pro 17pt = Inter ~14px). Fluent usa 14px body. En Windows, todo se ve ligeramente más grande. Construir el token --body-size para cambiar por plataforma.
How the Best Companies Use FigmaCómo Usan Figma las Mejores Empresas
figma.com · The industry standard for design — and how to use it efficientlyEl estándar de la industria para diseño — y cómo usarlo eficientemente
Figma is not a drawing tool — it's a design system management platform. Companies like Vercel, Linear, Airbnb, and Shopify use Figma as their source of truth for visual decisions, but NOT for everything. Understanding what they put in Figma vs what they build directly in code is what separates efficient teams from slow ones. Figma no es una herramienta de dibujo — es una plataforma de gestión de design systems. Empresas como Vercel, Linear, Airbnb y Shopify usan Figma como fuente de verdad para decisiones visuales, pero NO para todo.
The 5 ways top companies use FigmaLas 5 formas en que las mejores empresas usan Figma
Figma Variables = Design Tokens (the right way)Figma Variables = Design Tokens (la forma correcta)
Since Figma 2023, Variables replace Styles for colors, spacing, radii, and typography. Variables in Figma map 1:1 to CSS custom properties. The best companies (Vercel, Shopify, Atlassian) define their entire token system in Figma Variables, then export to JSON using the "Variables Import/Export" plugin (free). This JSON becomes the tokens.json that feeds Style Dictionary.Desde Figma 2023, Variables reemplaza Styles para colores, espaciado, radios y tipografía. Variables en Figma mapean 1:1 a propiedades CSS custom. Las mejores empresas definen su sistema de tokens en Figma Variables, luego exportan a JSON usando el plugin "Variables Import/Export". Este JSON se convierte en el tokens.json que alimenta Style Dictionary.
Figma Variable group → CSS output:Grupo de Variables Figma → output CSS:
color/brand/primary → --color-brand-primary: #F97316
spacing/4 → --spacing-4: 16px
radius/lg → --radius-lg: 8px
Auto Layout = Responsive Components that match CSS FlexboxAuto Layout = Componentes Responsivos que coinciden con CSS Flexbox
Figma's Auto Layout mirrors CSS Flexbox exactly. When a designer builds a button with Auto Layout (direction, gap, padding, alignment), it translates directly to a Tailwind class. This is how Linear, Vercel, and Shopify achieve zero friction between design and code: the designer thinks in flex terms, the developer writes flex terms.El Auto Layout de Figma refleja CSS Flexbox exactamente. Cuando un diseñador construye un botón con Auto Layout, se traduce directamente a una clase de Tailwind. Así Linear, Vercel y Shopify logran cero fricción entre diseño y código.
In Figma Auto Layout:En Figma Auto Layout:
Direction: Horizontal
Gap: 8px
Padding: 10px 16px
Align: Center
In Tailwind CSS:En Tailwind CSS:
flex
gap-2
px-4 py-2.5
items-center
Component Properties = Variant SystemComponent Properties = Sistema de Variantes
Top companies define every component with Properties (variant=primary/secondary/ghost, size=sm/md/lg, state=default/hover/disabled/loading). This creates a single source of truth for all component states. In Figma, you see all variants in one frame. In code, this maps to props. The designer and developer speak the same language.Las mejores empresas definen cada componente con Properties (variante=primary/secondary/ghost, tamaño=sm/md/lg, estado=default/hover/disabled/loading). Esto crea una fuente de verdad para todos los estados. El diseñador y el desarrollador hablan el mismo idioma.
Button component properties:Propiedades del componente Button:
variant: primary | secondary | ghost | danger | outline | link
size: sm | md | lg
state: default | hover | focus | disabled | loading
icon: none | left | right | only
Dev Mode = the handoff from designer to Claude CodeDev Mode = el handoff del diseñador a Claude Code
Figma Dev Mode (free for 1 viewer) lets developers inspect every design decision: exact pixel values, spacing, CSS properties, and exported assets. The workflow for Shopilot: designer finalizes a complex component in Figma → developer opens Dev Mode → copies the exact values into a prompt for Claude Code: "Build this component using these exact specs from Figma Dev Mode: [paste]." Claude Code generates the TypeScript in seconds.Figma Dev Mode permite a los desarrolladores inspeccionar cada decisión de diseño: valores exactos en píxeles, espaciado, propiedades CSS, y assets exportados. El flujo para Shopilot: diseñador finaliza componente → desarrollador abre Dev Mode → pega valores exactos en prompt para Claude Code.
The Claude Code + Figma prompt template:Template de prompt Claude Code + Figma:
"Build a React TypeScript component for [ComponentName]. Read the Figma component via Figma MCP for exact specs (dimensions, colors, spacing, states, variants). Use design tokens from globals.css. Include all states defined in the Figma component.""Construye un componente React TypeScript para [NombreComponente]. Lee el componente en Figma via Figma MCP para las specs exactas (dimensiones, colores, espaciado, estados, variantes). Usa los design tokens de globals.css. Incluye todos los estados definidos en el componente de Figma."
Figma as the single source of truth for all visual componentsFigma como fuente única de verdad para todos los componentes visuales
The Figma file (#18 Design System, core-product-design-system) follows Atomic Design (atoms, molecules, organisms, templates, pages) and is the single source of truth. Claude reads Figma via Figma MCP and implements matching React components in #1 Native Shell. No React components are created outside of what is defined in the Figma. The external design team maintains Figma; the engineering team consumes it.El archivo Figma (#18 Design System, core-product-design-system) sigue Atomic Design (átomos, moléculas, organismos, plantillas, páginas) y es la fuente única de verdad. Claude lee Figma via Figma MCP e implementa componentes React en #1 Native Shell. No se crean componentes React fuera de lo definido en el Figma. El equipo externo de diseño mantiene Figma; el equipo de ingeniería lo consume.
Figma Community Files — Download These TodayArchivos de Figma Community — Descargar Hoy
These are official or highly-used public Figma files from the reference companies. Duplicating them to your Figma account is free. Study how they structure components, Variables, and design systems — this is how the best companies work.Estos son archivos públicos de Figma oficiales o muy utilizados de las empresas de referencia. Duplicarlos a tu cuenta de Figma es gratuito. Estudia cómo estructuran componentes, Variables y design systems.
| FileArchivo | PublisherEditor | What to studyQué estudiar | Search in CommunityBuscar en Community |
|---|---|---|---|
| Apple Design Resources | Apple (official) | macOS UI components, SF Symbols, HIG spacing | "Apple Design Resources macOS" |
| Microsoft Fluent 2 | Microsoft (official) | Fluent component library, Acrylic, tokens system | "Microsoft Fluent 2 Web" |
| Vercel Design System | Vercel (official) | Dark-first tokens, Geist font usage, Storybook link | "Vercel Design" |
| Shadcn/ui Figma Kit | Community (official-ish) | How shadcn components map to Figma — the bridge | "shadcn ui" |
| Tailwind CSS UI Kit | Community | Tailwind spacing / color scales in Figma Variables | "Tailwind CSS UI Kit" |
| Linear App Design | Community recreation | Dark sidebar, speed-first interactions, kbd badges | "Linear design system" |
| Electron UI Patterns | Community | Title bar, tray, window chrome patterns for Electron | "Electron desktop UI" |
| Figma Variables Starter | Figma (official) | How to structure Variables for a design system | "Variables starter kit Figma" |
How to use these files:Cómo usar estos archivos: Don't copy components. Study structure. Look at: how they name Variables (tokens), how they organize component pages, how they document states, what their spacing system looks like. These are the patterns to replicate in Shopilot's Figma file when the brand is decided.No copiar componentes. Estudiar la estructura. Ver: cómo nombran Variables (tokens), cómo organizan páginas de componentes, cómo documentan estados, cómo se ve su sistema de espaciado. Estos son los patrones a replicar en el archivo Figma de Shopilot cuando la marca esté decidida.
Desktop App Visual References — What to EmulateReferencias Visuales de Apps Desktop — Qué Emular
These are the specific macOS Electron apps that Shopilot should study in detail as running software — not in Figma, but as installed apps. Each has a specific pattern Shopilot must adopt or consciously decide to deviate from.Estas son las apps Electron macOS específicas que Shopilot debe estudiar en detalle como software corriendo — no en Figma, sino como apps instaladas. Cada una tiene un patrón específico que Shopilot debe adoptar o decidir conscientemente desviarse.
Cursor — cursor.sh
MOST RELEVANT — study firstMÁS RELEVANTE — estudiar primeroThe closest structural reference to Shopilot. Both are: Electron, AI-native, dark-first, split-pane (editor left + chat right). Download and install. Study: how the title bar works, how the chat panel opens/closes, how the AI response streams, how tool calls (terminal runs) are displayed, how the status bar at the bottom shows AI state. This is the gold standard for Shopilot's interaction model.La referencia estructural más cercana a Shopilot. Ambos son: Electron, AI-native, dark-first, split-pane. Descargar e instalar. Estudiar: cómo funciona la title bar, cómo abre/cierra el panel de chat, cómo hace streaming la respuesta AI, cómo se muestran las tool calls, cómo muestra el estado AI en el status bar. Este es el estándar de oro para el modelo de interacción de Shopilot.
Adopt from Cursor:Adoptar de Cursor:
- Status bar 24px bottom
- Streaming word-by-word
- Tool call accordion
- Thinking indicator
Adapt for Shopilot:Adaptar para Shopilot:
- Split: code→marketplace
- Tabs: files→marketplaces
- Context: project→ASIN
Don't copy:No copiar:
- Code editor UI
- File tree sidebar
- Diff view
Arc Browser — arc.net
The reference for rethinking desktop chrome. Arc proves that you can break HIG conventions (vertical tabs instead of horizontal, sidebar IS the app, no visible URL bar) and still feel native and premium. Study specifically: how Arc handles the title bar with traffic lights + drag region + custom controls in the same 40px zone. This is exactly what Shopilot's top bar needs to solve.La referencia para repensar el chrome de desktop. Arc prueba que puedes romper las convenciones HIG (tabs verticales, sidebar ES la app) y aún sentirte nativo y premium. Estudiar específicamente: cómo Arc maneja la title bar con traffic lights + drag region + controles custom en la misma zona de 40px. Esto es exactamente lo que necesita resolver el top bar de Shopilot.
Key lesson:Lección clave: Arc's sidebar gradient background (multi-color per space) is possible in Electron via CSS linear-gradient on the sidebar container. The space color customization is what makes Arc feel personal — a pattern Shopilot could adopt for marketplace color coding (Amazon=orange, MeLi=yellow, Shopify=green).El gradiente del sidebar de Arc es posible en Electron via CSS. La personalización de color por espacio hace que Arc se sienta personal — un patrón que Shopilot podría adoptar para codificación de colores por marketplace.
Linear — linear.app
The reference for performance as a design value. Every interaction in Linear is under 100ms. Study: the keyboard shortcut system (every action has a shortcut visible in the UI), the command palette (Cmd+K), the sidebar collapse behavior, and most importantly — how Linear handles empty states (no data = inspirational, not depressing). Also study: the data tables. Linear's issue list is the closest reference to Shopilot's ASIN product list.La referencia para el rendimiento como valor de diseño. Cada interacción en Linear es menor de 100ms. Estudiar: el sistema de atajos de teclado, la paleta de comandos (Cmd+K), el comportamiento de colapso del sidebar, los estados vacíos, y las tablas de datos — la lista de issues de Linear es la referencia más cercana a la lista de productos ASIN de Shopilot.
Notion — notion.so
The reference for Electron done right at scale (30M+ users). Study: how Notion handles window resizing (the sidebar collapses progressively), how they manage a complex sidebar with nested items without it feeling cluttered, and their hover-reveal interactions (properties appear on hover, not always). Also: Notion's dark mode implementation is one of the cleanest in any Electron app — study how they handle the transition between surface layers.La referencia para Electron bien hecho a escala (30M+ usuarios). Estudiar: cómo maneja el redimensionado de ventana (el sidebar colapsa progresivamente), el sidebar con items anidados sin sentirse abarrotado, interacciones hover-reveal, y la implementación del modo oscuro — una de las más limpias en cualquier app Electron.
VS Code — code.visualstudio.com
THE Electron referenceLA referencia ElectronVS Code is the most used Electron app in the world with 30M+ daily active users. It is the definitive reference for what is possible technically and visually in Electron. Study: the status bar (bottom, 22px, same as Shopilot's 24px), the split pane system, the extension panel (same concept as Shopilot's sidebar), the command palette, and the theming system. VS Code themes are CSS token swaps — identical to what Shopilot's design token system will do. The VS Code GitHub repo is public — the theming architecture is directly applicable.VS Code es la app Electron más usada del mundo con 30M+ usuarios activos diarios. Es la referencia definitiva para lo que es posible en Electron. Estudiar: el status bar (inferior, 22px, similar a los 24px de Shopilot), el sistema de split pane, el panel de extensiones, la paleta de comandos, y el sistema de theming. Los temas de VS Code son intercambios de tokens CSS — idéntico a lo que hará el sistema de tokens de diseño de Shopilot.
Action: Install and study these 5 apps this weekAcción: Instalar y estudiar estas 5 apps esta semana
Cursor
cursor.sh
Arc
arc.net
Linear
linear.app
Notion
notion.so
VS Code
code.visualstudio.com
For each: spend 30 min using it normally, then 30 min inspecting specific patterns (title bar, sidebars, status bar, hover states, loading states, dark mode). Document what you want to adopt, adapt, or avoid. This is the most efficient design research you can do before the brand workshop.Para cada una: 30 min usándola normalmente, luego 30 min inspeccionando patrones específicos (title bar, sidebars, status bar, hover states, loading states, dark mode). Documentar qué adoptar, adaptar o evitar. Esta es la investigación de diseño más eficiente que se puede hacer antes del brand workshop.
Essential Figma Plugins for this WorkflowPlugins Esenciales de Figma para este Workflow
| PluginPlugin | What it doesQué hace | PhaseFase | CostCosto |
|---|---|---|---|
| Variables Import/Export | Exports Figma Variables to JSON → feeds tokens.json | Phase 2→3 bridge | Free |
| Tokens Studio | Full design token management in Figma (W3C DTCG format) | Phase 2→3 bridge | $20/mo |
| Contrast | WCAG AA/AAA contrast checker on any color pair in canvas | Phase 2 · color decisions | Free |
| Able | Accessibility checker — contrast, focus order, WCAG annotations | Phase 4 · component review | Free |
| Iconify | All Lucide icons available in Figma — same library as the code | Phase 2+ ongoing | Free |
| Figma to Code | Exports Figma frames as HTML/Tailwind/React snippets | Phase 4 · component start | Free |
| Color Blind | Simulates 8 types of color blindness on any frame | Phase 2 · color decisions | Free |
Full-Stack Design IntegrationIntegración Full-Stack de Diseño
The missing 30%: exact technology stacks, how everything wires together, Claude API integration patterns with real code, what's still undocumented, and 2026 AI-native design methodology. Actionable — not theoretical.El 30% que faltaba: stacks tecnológicos exactos, cómo todo se conecta, patrones de integración Claude API con código real, qué aún está sin documentar, y metodología de diseño AI-native 2026. Accionable — no teórico.
01 · The 6-Layer Stack — How Everything Connects01 · El Stack de 6 Capas — Cómo Todo Se Conecta
LAYER 6 · Quality Gates
Figma ↔ Code consistency review · axe-core a11y · Playwright e2e · PR blocked if component deviates from Figma
LAYER 5 · Claude AI Integration
Anthropic SDK v0.30+ · Messages streaming API · Tool use (36 tools) · Prompt caching · Multi-LLM router
LAYER 4 · Electron App Shell
Electron 33+ · WebContentsView (70%) · React 19 sidebar (30%) · IPC contextBridge · Auto-updater
LAYER 3 · React Component Library
shadcn/ui (Radix primitives) · Figma Atomic Design (#18) · Figma MCP · Tailwind 4 · Framer Motion 11
LAYER 2 · Design Token Pipeline
tokens.json (W3C DTCG) → Style Dictionary 4 → CSS custom properties → tailwind.config.ts → CSS vars
LAYER 1 · Design Spec (This File)
shopilot_v6.html · Single source of truth · Pablo approves · Sergio implements · Mateo owns tokens
Complete Package Manifest
| Package | Version | Purpose | Layer | Owner |
|---|---|---|---|---|
| @anthropic-ai/sdk | ^0.30 | Claude API: streaming, tools, caching | 5 | Andrés |
| electron | ^33 | Desktop shell, WebContentsView, IPC | 4 | Mateo |
| react + react-dom | ^19 | UI renderer, concurrent features | 3 | Sergio |
| tailwindcss | ^4 | Utility CSS, token consumption | 3 | Sergio |
| @radix-ui/react-* | latest | Accessible primitives (via shadcn) | 3 | Sergio |
| shadcn/ui | CLI 2.x | Component generator on Radix + Tailwind | 3 | Sergio |
| framer-motion | ^11 | Animations: word-stream, slide-up, spring | 3 | Sergio |
| lucide-react | ^0.43 | Icon library — 1.5px stroke, currentColor | 3 | Sergio |
| recharts | ^2 | Charts only (BSR sparkline, KPI gauge) | 3 | Andrés |
| style-dictionary | ^4 | Token transform: JSON → CSS → Tailwind | 2 | Mateo |
| @axe-core/react | ^4 | Accessibility audit (WCAG AA) | 6 | Sergio |
| zod | ^3 | Tool input/output validation schema | 5 | Andrés |
| zustand | ^5 | Agent state machine store | 3-5 | Sergio |
02 · Design Token Pipeline — tokens.json → Production CSS02 · Pipeline de Tokens — tokens.json → CSS Producción
tokens.json
W3C DTCG format · source of truth
style-dictionary build
→
design-tokens.css + tailwind-tokens.ts
auto-generated, never edit manually
▶ tokens.json — Full Example (W3C DTCG format)▶ tokens.json — Ejemplo Completo (formato W3C DTCG)
{
"$schema": "https://design-tokens.org/schema.json",
"sp": {
"color": {
"bg": {
"base": { "$value": "#0A0A0F", "$type": "color", "$description": "App background — near-black warm" },
"01": { "$value": "#0F0F18", "$type": "color" },
"02": { "$value": "#14141F", "$type": "color" },
"03": { "$value": "#1A1A28", "$type": "color" }
},
"orange": {
"50": { "$value": "rgba(249,115,22,0.08)", "$type": "color" },
"500": { "$value": "#F97316", "$type": "color", "$description": "CANDIDATE — replace with decided brand color" },
"600": { "$value": "#EA6005", "$type": "color" }
},
"fg": {
"100": { "$value": "#F4F4F6", "$type": "color", "$description": "Primary text" },
"80": { "$value": "#D4D4E4", "$type": "color" },
"60": { "$value": "#A4A4B8", "$type": "color" },
"40": { "$value": "#7A7A90", "$type": "color" }
},
"success": { "$value": "#22C55E", "$type": "color" },
"warning": { "$value": "#F59E0B", "$type": "color" },
"error": { "$value": "#EF4444", "$type": "color" },
"info": { "$value": "#3B82F6", "$type": "color" }
},
"space": {
"g": { "$value": "10px", "$type": "dimension", "$description": "base grid unit" },
"v": { "$value": "22px", "$type": "dimension", "$description": "vertical rhythm" },
"4": { "$value": "4px", "$type": "dimension" },
"8": { "$value": "8px", "$type": "dimension" },
"12": { "$value": "12px", "$type": "dimension" },
"16": { "$value": "16px", "$type": "dimension" },
"24": { "$value": "24px", "$type": "dimension" },
"32": { "$value": "32px", "$type": "dimension" }
},
"radius": {
"sm": { "$value": "4px", "$type": "dimension" },
"md": { "$value": "6px", "$type": "dimension" },
"lg": { "$value": "8px", "$type": "dimension" },
"xl": { "$value": "12px", "$type": "dimension" },
"2xl": { "$value": "16px", "$type": "dimension" },
"full": { "$value": "9999px", "$type": "dimension" }
},
"duration": {
"instant": { "$value": "80ms", "$type": "duration" },
"fast": { "$value": "150ms", "$type": "duration" },
"normal": { "$value": "200ms", "$type": "duration" },
"slow": { "$value": "350ms", "$type": "duration" },
"scenic": { "$value": "500ms", "$type": "duration" }
}
}
}
▶ style-dictionary.config.mjs — Build Config▶ style-dictionary.config.mjs — Configuración de Build
// style-dictionary.config.mjs
import StyleDictionary from 'style-dictionary';
export default {
source: ['tokens.json'],
platforms: {
// → CSS custom properties (--sp-color-orange-500)
css: {
transformGroup: 'css',
files: [{
destination: 'src/styles/design-tokens.css',
format: 'css/variables',
options: { selector: ':root', outputReferences: true }
}]
},
// → Tailwind config (for extend.colors, extend.spacing)
tailwind: {
transformGroup: 'js',
files: [{
destination: 'src/styles/tailwind-tokens.ts',
format: 'javascript/esm'
}]
}
}
}
// Run: npx style-dictionary build
// Output:
// src/styles/design-tokens.css ← import in main.tsx
// src/styles/tailwind-tokens.ts ← import in tailwind.config.ts
▶ tailwind.config.ts — Token Consumption▶ tailwind.config.ts — Consumo de Tokens
// tailwind.config.ts
import type { Config } from 'tailwindcss'
const config: Config = {
content: ['./src/**/*.{ts,tsx}'],
theme: {
extend: {
colors: {
// Reference CSS custom properties so Tailwind + Style Dictionary stay in sync
'sp-bg-base': 'var(--sp-color-bg-base)',
'sp-orange': 'var(--sp-color-orange-500)',
'sp-fg-100': 'var(--sp-color-fg-100)',
'sp-success': 'var(--sp-color-success)',
'sp-warning': 'var(--sp-color-warning)',
'sp-error': 'var(--sp-color-error)',
},
spacing: {
'sp-g': 'var(--sp-space-g)', // 10px
'sp-v': 'var(--sp-space-v)', // 22px
},
borderRadius: {
'sp-sm': 'var(--sp-radius-sm)',
'sp-lg': 'var(--sp-radius-lg)',
'sp-xl': 'var(--sp-radius-xl)',
},
fontFamily: {
'display': ['Inter Display', 'Inter', 'sans-serif'],
'mono': ['JetBrains Mono', 'Fira Code', 'monospace'],
},
transitionDuration: {
'sp-fast': 'var(--sp-duration-fast)',
'sp-normal': 'var(--sp-duration-normal)',
'sp-slow': 'var(--sp-duration-slow)',
}
}
},
plugins: []
}
export default config
03 · shadcn/ui Integration with Shopilot Tokens03 · Integración shadcn/ui con Tokens Shopilot
shadcn/ui is NOT a component library — it's a code generator. Components are copied into your repo and 100% customizable. Use it for accessibility-correct primitives, then override with Shopilot tokens.shadcn/ui NO es una librería — es un generador de código. Los componentes se copian a tu repo y son 100% personalizables. Úsalo para primitivas accesibles, luego sobrescribe con los tokens Shopilot.
▶ Setup Commands + globals.css Override▶ Comandos de Setup + Override globals.css
# 1. Init shadcn (say YES to CSS variables, pick Neutral base)
npx shadcn@latest init
# When prompted:
# ✓ Style: Default
# ✓ Base color: Neutral (we override below)
# ✓ CSS variables: YES (critical — this is how tokens flow in)
# ✓ src directory: YES
# 2. Add the components Shopilot needs (never add all at once)
npx shadcn@latest add button
npx shadcn@latest add dialog
npx shadcn@latest add dropdown-menu
npx shadcn@latest add tooltip
npx shadcn@latest add select
npx shadcn@latest add scroll-area
npx shadcn@latest add collapsible # ← ToolAccordion base
npx shadcn@latest add badge
npx shadcn@latest add separator
npx shadcn@latest add progress # ← ContextWindowBar
# 3. Override src/app/globals.css with Shopilot tokens:
@import 'design-tokens.css'; /* Style Dictionary output */
@layer base {
:root {
/* Map shadcn vars → Shopilot tokens */
--background: 240 6% 7%; /* #0A0A0F */
--foreground: 240 6% 96%; /* #F4F4F6 */
--card: 240 6% 10%; /* #14141F */
--card-foreground: 240 6% 87%; /* #D4D4E4 */
--popover: 240 6% 10%;
--popover-foreground: 240 6% 96%;
--primary: 25 95% 53%; /* CANDIDATE: #F97316 orange — replace once brand color decided */
--primary-foreground: 0 0% 100%;
--secondary: 240 4% 16%; /* #28283C */
--secondary-foreground: 240 6% 87%;
--muted: 240 4% 16%;
--muted-foreground: 240 6% 47%; /* #7A7A90 */
--accent: 25 95% 53%; /* orange accent */
--accent-foreground: 0 0% 100%;
--destructive: 0 84% 60%; /* #EF4444 */
--border: 240 6% 20%; /* rgba(255,255,255,.06) approx */
--input: 240 6% 16%;
--ring: 25 95% 53%; /* orange focus ring */
--radius: 0.5rem; /* 8px = --sp-radius-lg */
}
}
# Result: shadcn components automatically use Shopilot colors.
# Edit src/components/ui/button.tsx to change size tokens to sp-* vars.
Which shadcn components to use vs build customCuáles usar de shadcn vs construir custom
| Component | Source | Why |
|---|---|---|
| Button (6 variants) | shadcn base → customize | Radix provides correct focus/disabled states; we override styles |
| Dialog / Confirmation Card | shadcn Dialog → customize | Radix handles focus trap + aria-modal correctly; style from scratch |
| Tooltip | shadcn Tooltip → light override | Positioning engine is complex; only needs color/font token override |
| Select / Dropdown | shadcn → heavy customize | Radix handles keyboard nav; we rebuild visual completely |
| Tool Accordion | BUILD CUSTOM | Streaming state machine, badge states, JSON viewer — too specific |
| ReAct Stream | BUILD CUSTOM | Word-by-word animation, thinking pulse — unique to Shopilot |
| KPI Card | BUILD CUSTOM | JetBrains Mono + delta badge + sparkline — fully custom |
| Context Window Bar | shadcn Progress → customize | Stacked segments on top of Progress primitive |
| Data Table | shadcn Table + TanStack Table | TanStack handles sort/filter; shadcn provides base HTML table |
| Proactive Suggestion Card | BUILD CUSTOM | Animated slide-up, dismiss swipe, max-2-simultaneous logic |
| Date Picker | react-day-picker (NEVER BUILD) | Calendar UI is complex; use library, override tokens only |
| Charts (sparkline, gauge) | recharts (NEVER BUILD) | Math-heavy; only override colors and font |
04 · Claude API Streaming Integration — Real Implementation04 · Integración Claude API Streaming — Implementación Real
The complete chain from user input → Claude API → word-by-word UI animation → tool execution display. Every piece has a specific design pattern.La cadena completa desde input del usuario → Claude API → animación palabra-a-palabra → display de tool execution. Cada pieza tiene un patrón de diseño específico.
Agent State Machine
→
thinking ···
CSS: opacity 0.4→1→0.4, 1.2s infinite · NO elapsed time shown · Status bar: animated dot
streaming ▊
Each word: fadeIn 80ms ease-out · Cursor: blinking 0.6s · NO skeleton, NO spinner
awaiting_confirm
Confirmation card slide-up 250ms spring · Input disabled · Backdrop dims 20%
▶ useStream.ts — Complete React Hook Implementation▶ useStream.ts — Implementación Completa del React Hook
// src/hooks/useStream.ts
import { useState, useCallback, useRef } from 'react';
import Anthropic from '@anthropic-ai/sdk';
import { shopilotTools } from '@/tools/definitions';
import { useAgentStore } from '@/stores/agentStore';
type AgentState =
| 'idle' | 'thinking' | 'streaming'
| 'tool_running' | 'awaiting_confirm' | 'done' | 'error';
interface StreamMessage {
role: 'user' | 'assistant';
content: string;
}
export function useStream() {
const [agentState, setAgentState] = useState<AgentState>('idle');
const [words, setWords] = useState<string[]>([]);
const [currentToolCall, setCurrentToolCall] = useState<string | null>(null);
const abortRef = useRef<AbortController | null>(null);
const { addTool, updateTool } = useAgentStore();
const stream = useCallback(async (messages: StreamMessage[]) => {
abortRef.current = new AbortController();
setWords([]);
setAgentState('thinking');
// NOTE: In Electron, Anthropic SDK runs in main process.
// Renderer sends via IPC → main runs SDK → streams back via IPC.
// This hook shows the renderer-side pattern.
try {
const client = new Anthropic(); // API key from env via contextBridge
const stream = await client.messages.stream({
model: 'claude-opus-4-6',
max_tokens: 8192,
system: SHOPILOT_SYSTEM_PROMPT,
messages,
tools: shopilotTools,
// Prompt caching — reduces cost 60-80% on repeated context:
betas: ['prompt-caching-2024-07-31'],
});
for await (const event of stream) {
switch (event.type) {
case 'content_block_start':
if (event.content_block.type === 'text') {
setAgentState('streaming');
}
if (event.content_block.type === 'tool_use') {
setAgentState('tool_running');
const toolId = event.content_block.id;
const toolName = event.content_block.name;
setCurrentToolCall(toolName);
addTool({ id: toolId, name: toolName, state: 'running', startMs: Date.now() });
}
break;
case 'content_block_delta':
if (event.delta.type === 'text_delta') {
// Word-by-word: split on spaces, animate each word
const newWords = event.delta.text.split(/(?<=\s)/);
setWords(prev => [...prev, ...newWords]);
}
break;
case 'content_block_stop':
setCurrentToolCall(null);
break;
case 'message_stop':
setAgentState('done');
break;
}
}
} catch (err) {
if ((err as Error).name !== 'AbortError') {
setAgentState('error');
}
}
}, [addTool]);
const abort = useCallback(() => {
abortRef.current?.abort();
setAgentState('idle');
setWords([]);
}, []);
return { agentState, words, currentToolCall, stream, abort };
}
▶ StreamingText.tsx — Word-by-Word Animation Component▶ StreamingText.tsx — Componente de Animación Palabra a Palabra
// src/components/StreamingText.tsx
import { motion, AnimatePresence } from 'framer-motion';
interface StreamingTextProps {
words: string[];
isStreaming: boolean;
}
// Design rule: each word fades in at 80ms.
// Cursor blinks at 0.6s cycle when streaming.
// No skeleton, no placeholder, no loading bar.
export function StreamingText({ words, isStreaming }: StreamingTextProps) {
return (
<div className="text-sp-fg-100 text-sm leading-relaxed">
{words.map((word, i) => (
<motion.span
key={i}
initial={{ opacity: 0 }}
animate={{ opacity: 1 }}
transition={{ duration: 0.08, ease: 'easeOut' }} // 80ms per word
>
{word}
</motion.span>
))}
{/* Blinking cursor — only while streaming */}
<AnimatePresence>
{isStreaming && (
<motion.span
initial={{ opacity: 1 }}
animate={{ opacity: [1, 0, 1] }}
transition={{ duration: 0.6, repeat: Infinity, ease: 'linear' }}
className="inline-block ml-0.5 font-mono text-sp-orange"
style={{ fontFamily: 'JetBrains Mono' }}
>
▊
</motion.span>
)}
</AnimatePresence>
</div>
);
}
// ThinkingPulse — shown when agent is thinking (no tokens yet)
export function ThinkingPulse() {
return (
<motion.span
animate={{ opacity: [0.4, 1, 0.4] }}
transition={{ duration: 1.2, repeat: Infinity, ease: 'easeInOut' }}
className="text-sp-fg-40 font-mono text-sm"
>
···
</motion.span>
);
}
★ Prompt Caching — 60-80% Cost Reduction★ Prompt Caching — Reducción de Costo 60-80%
Mark static parts of context with cache_control: {type: 'ephemeral'} — system prompt + marketplace context + seller profile. TTL: 5 minutes. Every subsequent request in a session reuses cached tokens. At 1,000 sellers × 50 requests/day = $4,800/mo → $960/mo with caching.Marca las partes estáticas del contexto con cache_control: {type: 'ephemeral'} — system prompt + contexto marketplace + perfil del vendedor. TTL: 5 minutos. Cada request subsiguiente en sesión reutiliza tokens cacheados. A 1,000 vendedores × 50 requests/día = $4,800/mes → $960/mes con caching.
05 · Tool Call UI — Visual Patterns for 36 Tools05 · UI de Tool Calls — Patrones Visuales para 36 Tools
This was the biggest gap identified in the audit: the spec described the tool accordion but never showed the complete visual spec or component code. Fixed here.Este era el mayor gap identificado en el audit: el spec describía el tool accordion pero nunca mostraba el spec visual completo ni el código del componente. Corregido aquí.
Live Tool Accordion States
analyze_buy_box ✓ 847ms
Input
{ "asin": "B08XYZABC",
"marketplace": "amazon_mx" }Output
{ "buybox_winner": "us",
"our_share": 0.78,
"competitors": 3 }▶ ToolAccordion.tsx — Complete Component▶ ToolAccordion.tsx — Componente Completo
// src/components/ToolAccordion.tsx
import { motion } from 'framer-motion';
import { Check, X, AlertTriangle, Loader2 } from 'lucide-react';
type ToolState = 'queued' | 'running' | 'success' | 'error' | 'awaiting_confirm';
type RiskLevel = 'read_only' | 'reversible' | 'irreversible';
interface ToolAccordionProps {
id: string;
name: string;
state: ToolState;
riskLevel: RiskLevel;
durationMs?: number;
input?: Record<string, unknown>;
output?: Record<string, unknown>;
errorMessage?: string;
onConfirm?: () => void;
onCancel?: () => void;
}
const stateConfig = {
queued: { icon: null, color: '#7A7A90', bg: 'rgba(122,122,144,0.06)', border: 'rgba(122,122,144,0.2)' },
running: { icon: 'spin', color: '#3B82F6', bg: 'rgba(59,130,246,0.05)', border: 'rgba(59,130,246,0.2)' },
success: { icon: 'check', color: '#22C55E', bg: 'rgba(34,197,94,0.05)', border: 'rgba(34,197,94,0.2)' },
error: { icon: 'x', color: '#EF4444', bg: 'rgba(239,68,68,0.05)', border: 'rgba(239,68,68,0.2)' },
awaiting_confirm: { icon: 'warn', color: '#F59E0B', bg: 'rgba(245,158,11,0.05)', border: 'rgba(245,158,11,0.25)' },
};
export function ToolAccordion({ id, name, state, riskLevel, durationMs, input, output, errorMessage, onConfirm, onCancel }: ToolAccordionProps) {
const cfg = stateConfig[state];
const isDestructive = riskLevel === 'irreversible';
return (
<motion.div
layout
initial={{ opacity: 0, y: 4 }}
animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.2, ease: [0.16, 1, 0.3, 1] }}
style={{ background: cfg.bg, border: `1px solid ${cfg.border}`, borderRadius: 10 }}
>
<details>
<summary style={{ display: 'flex', alignItems: 'center', gap: 10, padding: '10px 16px', cursor: 'pointer', listStyle: 'none' }}>
{/* State icon */}
{state === 'running' && <Loader2 size={14} color={cfg.color} className="animate-spin" />}
{state === 'success' && <Check size={14} color={cfg.color} strokeWidth={2.5} />}
{state === 'error' && <X size={14} color={cfg.color} strokeWidth={2.5} />}
{state === 'awaiting_confirm' && <AlertTriangle size={14} color={cfg.color} />}
<span style={{ fontSize: 12, fontWeight: 500, color: '#D4D4E4', flex: 1 }}>{name}</span>
{/* Right badges */}
{isDestructive && (
<span style={{ fontSize: 9, fontWeight: 700, color: '#EF4444', textTransform: 'uppercase', letterSpacing: '0.1em' }}>
IRREVERSIBLE
</span>
)}
{state === 'success' && durationMs && (
<span style={{ fontSize: 10, fontFamily: 'JetBrains Mono', color: cfg.color }}>
✓ {durationMs}ms
</span>
)}
{state === 'error' && (
<span style={{ fontSize: 10, fontFamily: 'JetBrains Mono', color: cfg.color }}>
✗ Error
</span>
)}
</summary>
{/* Expanded content */}
<div style={{ padding: '0 16px 12px', borderTop: '1px solid rgba(255,255,255,0.05)' }}>
{/* Confirmation card for irreversible actions */}
{state === 'awaiting_confirm' && (
<ConfirmationCard input={input} riskLevel={riskLevel} onConfirm={onConfirm} onCancel={onCancel} />
)}
{/* JSON viewer for success/error */}
{(state === 'success' || state === 'error') && (
<JsonViewer input={input} output={output} error={errorMessage} />
)}
</div>
</details>
</motion.div>
);
}
06 · Previously Undocumented Patterns — Now Complete06 · Patrones Previamente Indocumentados — Ahora Completos
Empty States — 8 Variants
No ASINs Yet
First-run. CTA: "Add your first product"
No Search Results
Show query, suggest correction
All Caught Up
No pending actions. Positive reinforcement.
Sync Pending
Data loading from marketplace. Progress bar.
Not Connected
OAuth not done. CTA: "Connect marketplace"
No History
Audit log empty. "Actions will appear here"
Credits Zero
Agent paused. Upgrade CTA dominant.
No Reports
Pro feature gate. "Available in Pro plan"
Empty State Rules:
- ① Icon: 32px, colored by context (orange=action, blue=info, green=success, red=error)
- ② Title: max 4 words, sentence case, no period
- ③ Description: 1 line, explains why + what to do next
- ④ CTA: only if there's a direct action. Never show CTA on "All Caught Up"
- ⑤ Never show empty state while loading — show progress instead
Error State Taxonomy — 3 Categories
API timeout, validation error, missing field. Amber border + icon. Show specific message + retry button. Auto-retry after 3s with countdown.Timeout API, error de validación, campo faltante. Borde ámbar + ícono. Mensaje específico + botón retry. Auto-retry después de 3s con countdown.
Auth revoked, account suspended, critical DB error. Red banner. Explain what happened, what user must do. No auto-retry. Support link if relevant.Auth revocado, cuenta suspendida, error crítico de DB. Banner rojo. Explica qué pasó, qué debe hacer el usuario. Sin auto-retry. Link de soporte si es relevante.
Rate limit, credit exhausted, feature not in plan. Blue info banner. Calm tone. Clear path forward (upgrade, wait, etc). Agent pauses gracefully.Rate limit, créditos agotados, feature no en el plan. Banner azul informativo. Tono calmado. Camino claro hacia adelante (upgrade, esperar, etc). El agente pausa graciosamente.
Accessibility — WCAG AA Contrast Ratios
| Text | Background | Ratio | WCAG AA | Use |
|---|---|---|---|---|
| #F4F4F6 | #0A0A0F | 15.8:1 | PASS AAA | Primary text on bg |
| #A4A4B8 | #0A0A0F | 7.1:1 | PASS AA | Secondary text |
| #F97316 | #0A0A0F | 5.8:1 | PASS AA | Orange on bg |
| #FFFFFF | #F97316 | 3.2:1 | PASS (large only) | White on orange btn |
| #7A7A90 | #0A0A0F | 4.2:1 | PASS AA | Tertiary text |
| #54546A | #0A0A0F | 2.8:1 | FAIL — captions only | Placeholder, metadata (decorative) |
| #22C55E | #0A0A0F | 7.0:1 | PASS AA | Success text |
| #EF4444 | #0A0A0F | 4.8:1 | PASS AA | Error text |
⚠ #54546A fails WCAG AA — use only for decorative metadata (timestamps, IDs) where context is clear. Never for interactive or status-critical text.
07 · 2026 AI-Native Design Trends — Applied to Shopilot07 · Tendencias de Diseño AI-Native 2026 — Aplicadas a Shopilot
Agentic UI — "Doing" not "Saying"
2025 was chatbots. 2026 is agents that take actions in real systems. The UI must show what the agent DID, not just what it said. Tool call accordion + audit log + rollback panel are the core of agentic UI.2025 fue chatbots. 2026 son agentes que toman acciones en sistemas reales. La UI debe mostrar lo que el agente HIZO, no solo lo que dijo. Tool accordion + audit log + rollback panel son el núcleo de la UI agéntica.
Shopilot: ✓ Already built — tool accordion + audit log + rollback token system
Progressive Disclosure for AI Outputs
Show the answer first, reasoning chain on demand. Users want results, not process. Collapsed tool calls are default; expanded for debugging. This is different from traditional progressive disclosure — AI reasoning is ALWAYS secondary.Muestra la respuesta primero, la cadena de razonamiento bajo demanda. Los usuarios quieren resultados, no proceso. Tool calls colapsados por defecto; expandidos para debugging. Esto es diferente — el razonamiento del AI es SIEMPRE secundario.
Shopilot: ✓ Tool accordion collapsed by default · expandable for transparency
Trust Signals — Provenance & Reversibility
2026 users are sophisticated — they've been burned by AI hallucinations. Every AI action must show: where did this data come from? Can I undo this? This is UX as trust infrastructure. Timestamp + source API + rollback button = trust signal trifecta.Los usuarios de 2026 son sofisticados — los han quemado las alucinaciones del AI. Cada acción AI debe mostrar: ¿de dónde vienen estos datos? ¿Puedo deshacer esto? Timestamp + fuente API + botón rollback = tríada de señal de confianza.
Shopilot: ✓ REVERSIBLE/IRREVERSIBLE labels · rollback tokens · marketplace source badges
Ambient Intelligence — Proactive without Interrupting
The trend away from "ask AI" toward "AI notices things." Proactive suggestion cards that appear from below without stealing focus. Max 2 simultaneous. Swipe to dismiss. The agent watches the marketplace and surfaces insights unprompted.La tendencia de "preguntar al AI" hacia "el AI nota cosas." Tarjetas de sugerencia proactiva que aparecen desde abajo sin robar el foco. Máximo 2 simultáneas. Deslizar para descartar. El agente monitorea el marketplace y presenta insights sin ser solicitado.
Shopilot: ✓ Proactive suggestion cards · slide-up 350ms · max-2-simultaneous rule
Memory Persistence UI — What Does AI Know?
Emerging pattern: showing users what the AI has "learned" about them. CLAUDE.md in Claude Code → SELLER_PROFILE.md in Shopilot. The UI should make this visible (settings panel "Your Coach Profile") and editable. Users trust AI more when they can see and correct its memory.Patrón emergente: mostrar a usuarios qué ha "aprendido" el AI sobre ellos. CLAUDE.md en Claude Code → SELLER_PROFILE.md en Shopilot. La UI debe hacerlo visible (panel settings "Tu Perfil de Coach") y editable. Los usuarios confían más en el AI cuando pueden ver y corregir su memoria.
Shopilot: ⚠ Partially — SELLER_PROFILE injected but not exposed in UI. Add "Coach Memory" settings panel in Phase 2.
Context Window as First-Class UI Element
Power users want to know what the AI "has in mind." The Context Window Bar (already in Section 14 state 26) is now mainstream. Show: system prompt tokens, conversation tokens, marketplace context, available. Compaction banner when 80% full. This is now table stakes for AI products in 2026.Los usuarios avanzados quieren saber qué tiene el AI "en mente." La Context Window Bar (ya en Section 14 estado 26) es ahora mainstream. Muestra: tokens de system prompt, conversación, contexto marketplace, disponibles. Banner de compaction cuando está al 80%. Esto es ahora estándar en productos AI 2026.
Shopilot: ✓ Context Window Bar in sidebar · compaction banner on 80% full
Multi-Model Transparency
Pro users in 2026 expect to know which model is running. Status bar bottom-right shows active model: "claude-opus-4-6" or "gpt-4o" based on LLM router decision. This is transparency as a feature, not just a debug tool. Linear shows which integrations are active — same pattern.Los usuarios Pro en 2026 esperan saber qué modelo está corriendo. La status bar bottom-right muestra el modelo activo: "claude-opus-4-6" o "gpt-4o" según la decisión del LLM router. Transparencia como feature, no solo herramienta de debug.
Shopilot: ✓ Status bar right: marketplace icon · model name · credit balance
Streaming as the Core Interaction Metaphor
The loading spinner is dead in 2026. Everything streams. Chat responses, tool results, sync status, data updates. The visual language of "tokens arriving" (word fade-in 80ms, blinking cursor) is now the universal AI loading pattern. Shopify just adopted it. HubSpot is adopting it.El spinner de carga está muerto en 2026. Todo hace streaming. Respuestas de chat, resultados de tools, estado de sync, actualizaciones de datos. El lenguaje visual de "tokens llegando" (word fade-in 80ms, cursor parpadeante) es ahora el patrón universal de loading AI.
Shopilot: ✓ Word-by-word streaming · blinking cursor · NO spinners, NO skeletons (by design)
08 · Development Methodology — How the 4-Person Team Ships08 · Metodología de Desarrollo — Cómo el Equipo de 4 Shippe
Phase 1 — Weeks 1-3
Design-in-Code · Ship Tokens + Atoms
- → Mateo: tokens.json + Style Dictionary setup
- → Sergio: Electron shell + React sidebar skeleton
- → Sergio: shadcn/ui init + Button + Input + Badge
- → Andrés: Anthropic SDK + IPC bridge + tool router
- → Pablo: this spec + design review on each PR
Deliverable: Electron window opens · sidebar renders · "Hello Claude" works
Phase 2 — Weeks 4-6
AI Agent Loop · Core Organisms
- → Sergio: StreamingText + ThinkingPulse + ToolAccordion
- → Sergio: ConfirmationCard + RollbackPanel
- → Andrés: 10 core tools (price read, competitor, buy box)
- → Mateo: Figma MCP integration + token pipeline setup
- → Pablo: design review of all organisms against Figma specs
Deliverable: Full coach loop working · tool calls visible · confirm/cancel works
Phase 3 — Weeks 7-10
Data + Quality Gates
- → Andrés: DataTable + KPI cards + Context Window Bar
- → Andrés: Audit log + proactive suggestion cards
- → Mateo: axe-core a11y audit + Figma ↔ Code consistency gates
- → Sergio: Empty states + error states all variants
- → Pablo: Beta onboarding + first seller feedback
Deliverable: Beta-ready · full data views · quality gates passing
Design Review Checklist — Every UI PRChecklist de Design Review — Cada PR de UI
Design System Maturity Score — Current State
Next: Sergio starts Week 1 → tokens.json file + shadcn init + Button component. Mateo sets up Style Dictionary. The spec is ready. Now we build.
World-Class Design StrategyEstrategia de Diseño de Clase Mundial
The gap between "good design" and "world-class" is not more components — it's precision at the product level: how screens compose, how the competition fails, what makes sellers trust the UI instantly, and the 20 invisible decisions that separate tier-1 products.La diferencia entre "buen diseño" y "clase mundial" no es más componentes — es precisión a nivel de producto: cómo se componen las pantallas, cómo falla la competencia, qué hace que los vendedores confíen en la UI al instante, y las 20 decisiones invisibles que separan los productos de primer nivel.
01 · B2B Product UX References — Not Brand Books, Product Patterns01 · Referencias de UX de Producto B2B — No Brand Books, Patrones de Producto
These 8 products are referenced for their UX patterns — specific interaction and layout decisions Shopilot should adopt. Different from Section 15 which analyzed brand identity.Estos 8 productos se referencian por sus patrones de UX — decisiones específicas de interacción y layout que Shopilot debe adoptar. Diferente a la Sección 15 que analizó identidad de marca.
Metric Card Pattern
Big mono number (42px) → small label above → percentage delta below with color · No chart inside the card (chart is separate) · Hover reveals tooltip with exact timestamp
→ Shopilot: KPI cards follow this exact hierarchy
Activity Timeline
Every action logged with: type icon + description + amount + timestamp · Clickable row reveals full detail · Infinite scroll (no pagination) · Timeline = trust
→ Shopilot: Audit Log follows this pattern exactly
Developer-first but accessible
API keys visible in UI · Raw JSON expandable · But non-technical users see clean summaries · Same data, two perspectives on same screen
→ Shopilot: Tool accordion shows summary + expandable JSON
Linear
Keyboard-first B2B product · Speed as primary UX feature
Speed as Marketing
Linear measured and published their p50/p95 load times. "Built for speed" is a design statement. Every interaction under 100ms feels intentional. This is a UX strategy, not just engineering.
→ Shopilot: Measure + display model response time. Make it a feature.
Status as Color Only
No status text ("In Progress", "Done") on lists — just colored dots. Experts read the color map in <0.5s. Power users trained to read color grids in one glance.
→ Shopilot: Buy Box status = orange dot, not "You have buy box: YES"
Kbd Shortcut Badges Everywhere
Every action in dropdown shows keyboard shortcut. This teaches users → makes them faster → makes them dependent → reduces churn. Shortcut visibility = retention feature.
→ Shopilot: All dropdowns show Cmd+K, Cmd+1, Esc shortcuts inline
Figma
Complex tool with zero cognitive friction · Panel architecture
3-Panel Information Architecture
Left: navigation/layers · Center: work surface · Right: contextual properties. This is the master pattern for complex tools. The content always has max space. Panels are tools, not content.
→ Shopilot: Marketplace=center, Sidebar=right panel. Left nav deferred to v2.
Context-Sensitive Right Panel
The right panel changes based on what's selected. Select a component → see its properties. Click away → see general settings. Sidebar in Shopilot should adapt to the marketplace page being viewed.
→ Shopilot Phase 2: sidebar context = active ASIN on marketplace page
Multiplayer Visual Cues
Other users visible as colored cursors. Multi-tab shows who's looking at what. In Shopilot context: the AI "cursor" — the coach's attention indicator (which tool it's running, what data it's looking at right now).
→ Shopilot: "Coach is analyzing ASIN B08XYZ" status in sidebar header
Datadog
The benchmark for monitoring dashboards · Density without chaos
Time as Primary Axis
Every metric in Datadog is a time series. The X axis is always time. This trains users to think in trends, not point-in-time snapshots. For Shopilot: Buy Box % over 30d is more actionable than Buy Box % right now.
→ Shopilot: All KPIs have 7d/30d sparklines. Point values only for current.
Alert Integration in Charts
Threshold lines appear ON charts, not in separate alerts. When a metric crosses a line, the chart background changes color. Alert IS the chart. No separate notification panel for threshold breaches.
→ Shopilot: Price threshold line on competitor chart. Red zone when below margin.
Faceted Filtering
Left sidebar has real-time faceted filters that update counts as you click. Tags/dimensions are first-class citizens. For sellers: filter by marketplace + category + status simultaneously. Update counts in real-time.
→ Shopilot Phase 2: ASIN table with faceted filters (marketplace, category, status)
Arc Browser
The best Electron app built — breaks every browser convention and wins
Sidebar IS the App Chrome
Arc moved ALL chrome (tabs, bookmarks, history) to the left sidebar. The content area is 100% undecorated. This is the insight: in Electron, the sidebar is where your app lives. The WebContentsView is sacred space.
→ Shopilot: sidebar has zero visual decoration except the chat + tool calls + status
Custom Title Bar Done Right
Arc's frameless window with custom controls that feel MORE native than native. The traffic light buttons are in their correct position, drag region is the entire top bar, full-screen transitions are perfect.
→ Shopilot: frameless + native traffic lights + 32px drag region + tab bar after
Command Bar as Primary Navigation
Arc's Cmd+T opens a search-everything command bar. This is the #1 power user feature. Arc trained millions of users to navigate entirely by keyboard. Once users find the command bar, they never use menus again.
→ Shopilot: Cmd+K opens command palette: "analyze B08XYZ", "reprice all", "show alerts"
Bloomberg Terminal
The extreme end of data density done right · Reference for seller data density
Density as Expertise Signal
Bloomberg is deliberately dense. It signals: "this is for professionals." The density IS the marketing — it makes users feel expert just by using it. Shopilot sellers are professionals. They can handle density. Don't dumb it down.
→ Shopilot: Don't simplify competitor tables. Show all 8 columns. Professionals want data.
Color = Directionality Only
Bloomberg uses green/red ONLY for up/down price movement. No other meaning. Nothing else is green or red. This absolute discipline means users process market data at a glance without thinking about color meaning.
→ Shopilot: #22C55E = price up / won Buy Box. #EF4444 = price down / lost Buy Box. Nothing else.
Monospace as Alignment
All Bloomberg data is monospace because financial data must align vertically. The $1,234.56 must be perfectly below $98.76 and $12,300.00. Misalignment breaks scanning. Monospace is structural, not decorative.
→ Shopilot: JetBrains Mono for all numbers is Bloomberg discipline applied to e-commerce.
Notion
Progressive disclosure master · Slash commands as interaction metaphor
Slash Command = AI Interaction
Notion's "/" opens inline commands. Claude Code uses the same pattern. This is now the universal AI interaction metaphor. Shopilot's chat input should support "/" for quick actions: "/reprice", "/analyze", "/report".
→ Shopilot: "/" in chat input opens quick-action palette with 36 tools
Properties Reveal on Hover
Notion rows show only essential data by default. Hover reveals additional properties. This keeps lists clean while preserving data access. For Shopilot: ASIN rows show Name + Price + Buy Box. Hover reveals: SKU, inventory, last sync.
→ Shopilot: ASIN row hover reveals secondary metrics (expandable hover card)
Everything is a Block
Notion's single abstraction ("a block") unifies all content types. For Shopilot: every item in the sidebar is "a message" — user message, assistant message, tool call, confirmation card, proactive suggestion. Same base type, different renders.
→ Shopilot: MessageBlock type with discriminated union: text | tool | confirm | proactive
Intercom
Fin AI + human handoff · The original AI product with trust signals
AI vs Human Indicator
Intercom shows whether Fin AI or a human is responding. The AI has a bot icon; human has a photo. For Shopilot: the coach always shows "Powered by Claude Opus 4.6" + current model. Users trust labeled AI more than unlabeled AI.
→ Shopilot: sidebar header shows model name + version. Always visible, never hidden.
Proactive + Reactive in Same UI
Intercom shows proactive campaigns AND reactive inbox in same interface. Two modes: outbound (AI initiates) and inbound (user initiates). For Shopilot: the coach can initiate conversations ("I noticed X") and respond to queries.
→ Shopilot: proactive suggestion cards (coach-initiated) + chat input (user-initiated) in same sidebar
Context Panel Always Visible
Intercom inbox shows customer context alongside every conversation — purchase history, previous tickets, plan level. The agent never has to "look it up." For Shopilot: the coach always has seller profile + marketplace data visible in context.
→ Shopilot: context bar top of sidebar shows active marketplace + seller plan + top ASIN count
02 · Screen Compositions — What Each Main Screen Actually Looks Like02 · Composiciones de Pantalla — Cómo se Ven Realmente las Pantallas Principales
The biggest gap in the spec before this section. Components are defined; screens are not. These CSS mockups show exact proportions, component placement, and information hierarchy.El mayor gap del spec antes de esta sección. Los componentes están definidos; las pantallas no. Estos mockups CSS muestran proporciones exactas, ubicación de componentes y jerarquía de información.
Screen 01 · Coach View — Main Application Screen (70/30)
Title bar
32px · frameless · traffic lights · tab bar after buttons · drag region
Marketplace 70%
WebContentsView · URL bar 28px · content scrolls natively · no interference
Sidebar 30%
React · header 36px · context bar · chat scroll · input sticky · status 20px
Status bar
20px · left: marketplace status · right: credit balance (JetBrains Mono)
Screen 02 · Dashboard View — Sidebar in "Overview" Mode
Dashboard mode: sidebar replaces chat history with KPI summary + opportunity list when agent is idle. Chat input always present. Click any opportunity → coach activates and analyzes it.
03 · Competitive Design Matrix — Why Shopilot Looks Different03 · Matriz Competitiva de Diseño — Por Qué Shopilot Se Ve Diferente
The existing seller tools (Helium 10, SellerBoard, Jungle Scout, Repricer.com) were designed in 2012-2018. They solve the right problems with completely wrong design language for 2026. This is Shopilot's visual competitive moat.Las herramientas actuales para vendedores fueron diseñadas en 2012-2018. Resuelven los problemas correctos con un lenguaje de diseño completamente equivocado para 2026. Esta es la ventaja competitiva visual de Shopilot.
| Dimension | Helium 10 | SellerBoard | Jungle Scout | Repricer.com | Shopilot ★ |
|---|---|---|---|---|---|
| Design Era | 2018 · SaaS purple | 2015 · Excel aesthetic | 2017 · Consumer green | 2013 · Corporate blue | 2026 · AI-native dark |
| Primary BG | #6B4FBB purple | #FFF white | #1D6F42 green | #1B4F8A navy | #0A0A0F near-black |
| AI Integration | Bolt-on chatbot (2024) | None | AI keywords only | Rule-based only | AI-first · agent loop · 36 tools |
| Number Display | Default browser font | Arial/Helvetica | Proxima Nova regular | System serif | JetBrains Mono always |
| Dark Mode | ✗ Light only | ✗ Light only | ⚠ Toggle (half done) | ✗ Light only | ✓ Dark-first · identity |
| Desktop App | ✗ Web only | ✗ Web only | ✗ Web only | ✗ Web only | ✓ Electron · native feel |
| Reversibility | ✗ Not labeled | ✗ Not labeled | ✗ Not labeled | ⚠ Confirm dialog only | ✓ REVERSIBLE/IRREVERSIBLE · rollback tokens |
| Typography system | 1-2 fonts, no scale | System fonts | Proxima Nova only | System fonts | Inter Display + JetBrains Mono · full scale |
| Context awareness | ✗ Manual switch | ✗ Manual switch | ✗ Manual switch | ✗ Manual switch | ✓ Coach sees active marketplace page |
| Perceived quality | Tool (functional) | Spreadsheet | Consumer app | Legacy SaaS | Precision instrument · Bloomberg meets Claude |
★ The Core Design Insight★ El Insight Central de Diseño
Every competitor was designed by engineers for engineers. Shopilot is designed by a seller who has used all of these tools and knows exactly what they get wrong. The dark + professional + monospace + AI-native aesthetic isn't a trend — it's the natural design language of a serious professional tool for 2026. This is the same design evolution that happened in finance (Bloomberg → Robinhood), in code (Eclipse → VS Code → Cursor), and in project management (JIRA → Linear).Cada competidor fue diseñado por ingenieros para ingenieros. Shopilot es diseñado por un vendedor que ha usado todas estas herramientas y sabe exactamente qué hacen mal. La estética dark + profesional + monospace + AI-native no es una tendencia — es el lenguaje de diseño natural de una herramienta profesional seria para 2026.
04 · Emotional Design Map — From First Install to Power User04 · Mapa de Diseño Emocional — Del Primer Install al Power User
0s · First Impression
"This looks serious"
Dark canvas opens. Orange accent. Shopilot logo. No splash screen, no loading animation. App IS the window.
Design: near-black bg · frameless · logo mark visible · zero clutter
30s · Onboarding
"This is fast"
5-step wizard. Step 1: value prop. Step 2: OAuth in 30s. Step 3: language/category. Skip from step 3.
Design: progress dots · one action per step · CTA dominant · NO form fields until step 3
2min · First Tool Call
"The AI knows my data"
Coach runs first analysis unprompted. Tool accordion shows real API calls to their real store. This is the trust moment.
Design: tool accordion opens · real ASIN names · JetBrains Mono numbers · "From Amazon API"
5min · Aha Moment
"I didn't know this"
Coach surfaces an insight the seller didn't have: "You lost Buy Box on 8 ASINs in the last 24h. Here's why." This is the aha moment.
Design: proactive card slides up · specific numbers · one-click action · orange CTA
Day 1 · First Win
"It actually worked"
Price was changed. Buy Box % goes up. Confirmation with actual before/after. The coach says "Buy Box recovered to 91%."
Design: success state · green + orange celebrate · audit log entry · rollback still visible
Week 1 · Habit
"I check this every morning"
Dashboard view shows overnight changes. 3 opportunities queued. Seller opens app and acts on them before coffee is done.
Design: dashboard mode · opportunities sorted by $$$ impact · 1-click actions · <60s daily ritual
Month 1 · Expert
"I can't operate without this"
Power user. Knows Cmd+K, "/" commands. Audit log is their source of truth. Coach Memory has learned their preferences.
Design: keyboard shortcuts visible · command palette muscle memory · history as data
Designed Delight Moments — The Details That StickMomentos de Deleite Diseñados — Los Detalles que Se Quedan
First Buy Box Win Celebration
When buy box goes from ✗ to ✓, the status dot pulses green 3x with scale(1.4). Subtle. Not a confetti explosion. Professional delight.
Typing Indicator Before Coach Responds
The ··· thinking pulse with "Shopilot is analyzing your store" appears immediately when user sends message. Never a blank moment.
Rollback Success State
When rollback completes, the audit log entry shows "↩ Reversed · 2.3s ago" in green. The system communicates "you're safe, it worked."
Coach Memory Acknowledgment
When coach uses seller's stored preference, it says "(using your saved preference: always protect margins >30%)". Shows it's paying attention.
Competitor Detected Alert
When a new seller lists on one of your ASINs, the proactive card appears with their name, price, and rating. Feels like having eyes everywhere.
Credit Milestone
When seller uses their 100th credit, a discreet banner: "100 actions taken · Avg response: 1.2s · $847 in revenue impacts attributed." Numbers build pride.
05 · E-Commerce Domain Visual Patterns — What No Other Design System Has05 · Patrones Visuales Específicos de E-Commerce — Lo que Ningún Otro Design System Tiene
Generic design systems cover buttons and inputs. Shopilot needs patterns specific to e-commerce seller intelligence. These are the domain-specific visual components that make the product feel built BY a seller.Los design systems genéricos cubren botones e inputs. Shopilot necesita patrones específicos de inteligencia de vendedores e-commerce. Estos son los componentes visuales específicos del dominio que hacen que el producto se sienta construido POR un vendedor.
Buy Box Indicator — 4 States
Rule: Buy Box % is ALWAYS JetBrains Mono. Color = status only. No text labels on list view (dot only). Labels on detail view.
Price Delta Display — Competitor Comparison
You row = orange bar. Winner row = highlighted. Relative bar shows price position visually. Delta shown as absolute + direction. Never percentage-only.
BSR Trend Sparkline — Inline in Table
BSR: LOWER = BETTER (rank #1 = bestseller). Sparkline: green slope = improving (going toward #1). ALWAYS show direction word, not just number. Color shadow band adds weight without legend.
Inventory Health Grid — Portfolio View
Each cell = one ASIN. Color = stock health (green >60d / amber 15-60d / red <15d). Number = days remaining. Glanceable portfolio status. No labels needed — color + number is sufficient.
06 · Color Blindness Safety — Accessible for All Sellers06 · Seguridad para Daltonismo — Accesible para Todos los Vendedores
~8% of men and ~0.5% of women have red-green color blindness. For Shopilot, this means Buy Box won (green) vs lost (red) may be indistinguishable to ~1 in 12 male sellers. The fix: never use color alone for meaning. Always pair with icon, text, or shape.~8% de hombres y ~0.5% de mujeres tienen daltonismo rojo-verde. Para Shopilot esto significa que Buy Box ganado (verde) vs perdido (rojo) puede ser indistinguible para ~1 de cada 12 vendedores hombres. La solución: nunca usar solo el color para transmitir significado.
Deuteranopia (Red-Green Blind)
Most common: green-blind. Reds appear brownish-yellow. Greens appear similar to orange.
Problem: green and red dots look identical to deuteranopes. Users can't distinguish Buy Box won vs lost by color alone.
Fix: Shape + Color (WCAG 1.4.1)
Never use color alone. Always pair color with shape, icon, or text pattern.
Solution: ✓/✗/— icons work even without color. Color still helps non-colorblind users scan faster.
Safe Color Pairs (Accessible)
These color combinations are distinguishable under all common color blindness types:
Testing tools: Figma "Color Blind" plugin · Chrome DevTools accessibility panel · coblis.de online simulator
07 · The 20 Invisible Decisions That Make Products World-Class07 · Las 20 Decisiones Invisibles que Hacen los Productos de Clase Mundial
Users can't name these details. But they feel them. A user who says "this just feels premium" is responding to some combination of these 20 decisions. None of them take more than a few hours to implement. All of them matter.Los usuarios no pueden nombrar estos detalles. Pero los sienten. Un usuario que dice "esto simplemente se siente premium" está respondiendo a alguna combinación de estas 20 decisiones. Ninguna toma más de pocas horas de implementar. Todas importan.
① Letter-spacing on headings
-0.03em on h2 makes text look designed, not default. Default tracking = amateur.
② Consistent 4px grid
Every spacing value divisible by 4. Not "16px here, 18px there." Inconsistency is invisible but users sense the chaos as "roughness."
③ Inset shadow on cards
inset 0 1px 0 rgba(255,255,255,.06) adds glass depth. Without it, dark cards look flat and dead.
④ Transition on color changes
transition: background 150ms ease, color 150ms ease on all interactive elements. Instant color changes feel abrupt and cheap.
⑤ Border on focus, not outline
Browser default outline is ugly. Replace with box-shadow: 0 0 0 2px rgba(249,115,22,.5). Same a11y benefit, premium look.
⑥ Disabled ≠ invisible
Disabled elements at 50% opacity tell users "this exists but you can't use it yet." Not display:none. Visibility + opacity = correct pattern.
⑦ Line-height on body text = 1.5
Dense data UIs are tempting to set to 1.2. Don't. AI-generated text needs 1.5 minimum for readability. Chat messages need 1.6.
⑧ Cursor: pointer on interactive divs
If it's clickable, it needs cursor: pointer. Forgetting this on tool accordions or proactive cards breaks the interaction expectation.
⑨ Tabular nums on ALL numbers
font-variant-numeric: tabular-nums makes numbers align in columns. Without it, a table of prices is unreadable.
⑩ Scrollbars styled or hidden
Default scrollbars look terrible on dark UIs. Either hide with ::-webkit-scrollbar or make them thin + dark. Visible ugly scrollbars = unfinished product.
⑪ No horizontal scroll on mobile
Electron windows can be resized smaller than expected. overflow-x:hidden on body, overflow-x:auto on tables only.
⑫ Semantic HTML elements
Use <button> not <div onclick>. <time> for timestamps. <output> for live AI output. Semantic = better a11y + better dev experience.
⑬ Will-change on animated elements
will-change: transform, opacity on sliding cards and streaming text. Moves animation to GPU. Eliminates jank at 60fps.
⑭ Error messages explain what to DO
"Error 403" = terrible. "Your Amazon credentials expired. Click Reconnect to re-authorize in 30 seconds." = world-class. Every error has a next step.
⑮ Timestamps in user timezone
Never show UTC. Use Intl.DateTimeFormat(locale, {timeZone}). "2:34 PM" not "19:34 UTC". Sellers check timestamps constantly.
⑯ Number formatting by locale
MeLi sellers in Mexico: $1,847.50 not 1847,50 MXN. Use Intl.NumberFormat. Wrong number format breaks trust immediately.
⑰ Empty inputs have placeholder text
Chat input: "Ask your coach about any ASIN, competitor, or pricing decision..." Not "Type here" or blank. Placeholder teaches the product's power.
⑱ Correct text cursor in inputs
Input fields: cursor: text. Buttons: cursor: pointer. Disabled: cursor: not-allowed. Every cursor state must be right.
⑲ Data source attribution
Below every KPI: "From Amazon Seller Central API · Synced 4 min ago" in 10px gray. This is the invisible trust builder. Users who see source attribution trust the numbers more.
⑳ Reduce motion for vestibular
@media (prefers-reduced-motion: reduce) { * { animation-duration: 0.01ms; } } — respects OS accessibility settings. Required for WCAG 2.3.3.
Production Readiness — Critical Gaps Listo para Producción — Brechas Críticas
30-point audit results · 14 gaps identified · All HIGH/MEDIUM severity specs Resultados de auditoría 30 puntos · 14 brechas identificadas · Specs severidad ALTA/MEDIA
This section was generated from a systematic 30-point codebase audit. Each sub-section contains actionable implementation specs. Address HIGH items before public beta. MEDIUM items before v1.0 GA. Esta sección fue generada a partir de una auditoría sistemática de 30 puntos. Cada sub-sección contiene specs de implementación accionables. Resolver ítems HIGH antes del beta público. MEDIUM antes de v1.0 GA.
HIGH 01 · Update Notification System 01 · Sistema de Notificación de Actualizaciones
MISSINGelectron-updater is configured for auto-download but the user-facing update experience is completely unspecified. Silent updates break trust — users need to know when and why the app changed. electron-updater está configurado para auto-descarga pero la experiencia de actualización para el usuario no está especificada. Las actualizaciones silenciosas rompen la confianza.
Update State Machine
Update Available Modal — Live Spec
Shopilot 1.3.0 available
You have version 1.2.4. Download is ready.
What's new
- Coach: 3x faster tool execution with parallel calls
- MercadoLibre: new competitor tracking for MX sellers
- Fixed: Rollback confirmation not dismissing on success
- Fixed: Credit balance not updating after top-up
Implementation — main process (click to expand)
// main/updater.ts
import { autoUpdater } from 'electron-updater';
import { BrowserWindow, dialog } from 'electron';
autoUpdater.autoDownload = true;
autoUpdater.autoInstallOnAppQuit = true;
autoUpdater.on('update-available', (info) => {
mainWindow.webContents.send('update:available', {
version: info.version,
releaseNotes: info.releaseNotes,
});
});
autoUpdater.on('download-progress', (progress) => {
mainWindow.webContents.send('update:progress', {
percent: Math.round(progress.percent),
bytesPerSecond: progress.bytesPerSecond,
});
});
autoUpdater.on('update-downloaded', () => {
mainWindow.webContents.send('update:ready');
});
// IPC handler — user clicks "Restart & Update"
ipcMain.handle('update:install', () => {
autoUpdater.quitAndInstall(false, true); // isSilent=false, forceRunAfter=true
});
// Check interval: on launch + every 4 hours
autoUpdater.checkForUpdatesAndNotify();
setInterval(() => autoUpdater.checkForUpdatesAndNotify(), 4 * 60 * 60 * 1000);
| State | UI Pattern | Dismissible? |
|---|---|---|
| checking | Status bar dot pulses blue — silent | Auto |
| available | In-app banner: "New version available. View details" | Yes (persists until restart) |
| downloading | Modal with changelog + progress bar (auto-shown) | Yes (download continues) |
| ready | Modal: "Ready to install. Restart now?" with changelog | Yes (installs on quit) |
| error | Silent (logged to Sentry) — do not bother user for update errors | N/A |
HIGH 02 · Local Chat Persistence 02 · Persistencia Local del Chat
MISSINGChat sessions vanish on app restart. No localStorage, no IndexedDB, no Zustand persist spec exists anywhere in the codebase. Sellers who close the app lose all context — a critical trust failure. Las sesiones de chat desaparecen al reiniciar la app. No hay spec de localStorage, IndexedDB, ni Zustand persist en todo el codebase. Los sellers que cierran la app pierden todo el contexto.
Data Model — What to Persist
ChatSession (IndexedDB — shopilot-chat store)
interface ChatSession {
id: string; // uuid
marketplaceId: 'amazon' | 'meli' | 'shopify';
asin?: string; // active context when session started
messages: Message[]; // all messages including tool calls
createdAt: number; // unix ms
updatedAt: number;
tokenCount: number; // for context window visualization
title?: string; // auto-generated from first user message (truncated 60 chars)
}
Zustand Store — React State Layer
import { create } from 'zustand';
import { persist, createJSONStorage } from 'zustand/middleware';
// Lightweight: only persist session index (not full messages)
// Full messages go to IndexedDB via idb-keyval
const useChatStore = create(persist(
(set, get) => ({
sessions: [] as SessionMeta[], // { id, title, updatedAt, marketplace }
activeSessionId: null as string | null,
setActiveSession: (id: string) => set({ activeSessionId: id }),
addSession: (meta: SessionMeta) =>
set(s => ({ sessions: [meta, ...s.sessions].slice(0, 100) })), // keep last 100
}),
{
name: 'shopilot-chat-store',
storage: createJSONStorage(() => localStorage), // session index only
}
));
// Full messages: idb-keyval (no serialization overhead)
import { get as idbGet, set as idbSet, del as idbDel } from 'idb-keyval';
export const loadSession = (id: string) => idbGet<ChatSession>(`session:${id}`);
export const saveSession = (s: ChatSession) => idbSet(`session:${s.id}`, s);
export const deleteSession = (id: string) => idbDel(`session:${id}`);
| Storage | What's stored | Retention | Size limit |
|---|---|---|---|
| localStorage | Session index (id, title, timestamp) | 100 sessions | ~20 KB |
| IndexedDB | Full message arrays with tool calls | 90 days, then pruned | ~50 MB soft cap |
| safeStorage | API keys, marketplace credentials | Until user logout | Negligible |
| SQLite (main) | Audit log, price history, snapshots | 180 days | 500 MB max |
Session History UI — Sidebar Panel
When chat input is empty: show last 5 sessions as clickable cards below input. Each card: title (auto) + marketplace icon + relative time. Clicking loads the session and resumes context. Pattern adopted from Claude.ai sidebar.
HIGH 03 · GDPR, Data Export & Account Deletion 03 · GDPR, Exportación de Datos y Eliminación de Cuenta
MISSINGZero documentation of user data download, account deletion, or data retention. Required by GDPR (EU), LGPD (Brazil — critical for MeLi sellers), and expected by Apple App Store Review. Must exist before any public release. Sin documentación de descarga de datos, eliminación de cuenta o retención. Requerido por GDPR (UE), LGPD (Brasil — crítico para sellers de MeLi), y App Store Review. Debe existir antes de cualquier lanzamiento público.
Personal Data Inventory (PII Map)
| Data Type | Where stored | Purpose | Retention | Exportable? |
|---|---|---|---|---|
| Email address | Supabase auth.users | Account identity | Until deletion | Yes |
| Marketplace credentials | Electron safeStorage (local) | API access | Until revoke | No (keys) |
| Chat history | Local IndexedDB | Session continuity | 90 days | Yes (JSON) |
| Audit log | Local SQLite | Rollback & trust | 180 days | Yes (CSV) |
| Usage telemetry | PostHog (cloud) | Product analytics | 24 months | On request |
| Credit transactions | Supabase billing | Billing history | 7 years (legal) | Yes (PDF) |
| Error/crash reports | Sentry (cloud) | Bug fixing | 90 days | No (aggregate) |
Data Export Package — ZIP Structure
shopilot-export-{userId}-{YYYYMMDD}.zip
├── README.txt # What's in this export, data policy link
├── account/
│ ├── profile.json # email, plan, created_at, last_login
│ └── billing_history.csv # date, amount, credits, description
├── chat_history/
│ ├── sessions_index.json # session metadata (title, date, marketplace)
│ └── session_{id}.json × N # full message arrays per session
├── audit_log/
│ └── actions.csv # timestamp, action, asin, old_value, new_value, reversible
└── telemetry_summary.json # aggregate usage stats (no PII included)
Account Deletion Flow (GDPR Article 17 — Right to Erasure)
- User navigates to Settings → Account → "Delete Account"
- Modal: "This will permanently delete your account and all data. Export your data first?" with [Export Data] + [Continue to Delete] buttons
- Type "DELETE" in text field to confirm (same pattern as Vercel, Supabase)
- Server-side: mark account deleted_at → Supabase Edge Function queues hard delete in 30 days (grace period for disputes)
- Local: clear all IndexedDB stores + localStorage + SQLite + safeStorage keys on next launch
- Confirmation email: "Your Shopilot account will be permanently deleted on {date+30d}. Cancel: {link}"
HIGH 04 · Observability & Error Tracking (Sentry) 04 · Observabilidad y Seguimiento de Errores (Sentry)
PARTIALSentry is mentioned in the stack but sampling rates, PII filtering, event taxonomy, and performance monitoring thresholds are not specified. Under-instrumented apps have silent failures in production. Sentry aparece en el stack pero sin tasas de muestreo, filtrado de PII, taxonomía de eventos ni umbrales de performance. Las apps sub-instrumentadas tienen fallos silenciosos en producción.
Sentry Configuration Spec (click to expand)
// renderer/main.tsx — Sentry init
import * as Sentry from '@sentry/electron/renderer';
Sentry.init({
dsn: process.env.VITE_SENTRY_DSN,
environment: process.env.NODE_ENV,
release: app.getVersion(),
// Sampling — aggressive in dev, conservative in prod
tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
profilesSampleRate: 0.05, // CPU profiling — 5% of transactions
// PII Scrubbing — NEVER send user content to Sentry
beforeSend(event) {
// Strip message content (chat messages may contain business data)
if (event.extra?.messages) delete event.extra.messages;
if (event.extra?.prompt) delete event.extra.prompt;
// Strip marketplace credentials from breadcrumbs
event.breadcrumbs?.values?.forEach(crumb => {
if (crumb.data?.token) crumb.data.token = '[Filtered]';
if (crumb.data?.apiKey) crumb.data.apiKey = '[Filtered]';
});
return event;
},
// Integrations
integrations: [
Sentry.browserTracingIntegration(),
Sentry.replayIntegration({
maskAllText: true, // block all text from session replay
blockAllMedia: true,
}),
],
});
Custom Event Taxonomy
| Event Name | Trigger | Severity | Alert? |
|---|---|---|---|
| tool_execution_failed | Tool returns error after 3 retries | Warning | No |
| irreversible_action_taken | Price change / inventory update confirmed | Info | No |
| credit_exhausted | Balance hits 0 | Warning | Yes (Slack) |
| marketplace_auth_expired | API returns 401/403 | Error | Yes (Slack) |
| claude_api_error | Anthropic API returns 5xx | Error | Yes (PagerDuty) |
| ipc_bridge_timeout | IPC call > 5s with no response | Critical | Yes (PagerDuty) |
| rollback_failed | Rollback tool returns error | Critical | Yes (PagerDuty) |
Performance Thresholds (alert if exceeded)
- App cold start: > 3s → warning
- IPC round-trip: > 500ms → warning
- Tool execution: > 10s → log
- First token latency: > 2s → log
- Chat render FPS: < 30fps → log
User Context (always attach)
Sentry.setUser({
id: userId, // NOT email
plan: 'pro',
});
Sentry.setContext('marketplace', {
active: 'amazon',
region: 'US',
});
// NEVER set: email, apiKey, sellerId
MEDIUM 05 · In-App Support & Help Center 05 · Soporte In-App y Centro de Ayuda
MISSINGNo help center, FAQ panel, or support chat specified. B2B desktop apps need accessible support without leaving the app. Pattern: ? button in status bar → slide-over panel with search + articles + live chat. Sin centro de ayuda, panel FAQ ni chat de soporte especificado. Las apps B2B necesitan soporte accesible sin salir de la app.
Support Entry Points
- ①Status bar ? — always visible, 24px tall, right corner. Opens help slide-over. Free plan gets async email; Pro gets live chat widget (Crisp or Intercom).
- ②Error recovery banners — Type A errors include "Need help?" link that pre-fills support form with error context.
- ③Keyboard: Cmd+Shift+? — opens help slide-over from anywhere in the app.
- ④First run onboarding — 3-step coach intro with "How does this work?" expandable FAQ inline.
Help Panel Anatomy
Popular Articles
Integration Recommendation: Crisp.chat (not Intercom)
Crisp is $25/mo vs Intercom's $74/mo minimum. Crisp has a WebView embed that works in Electron without SDK conflicts. For v1: embed Crisp chatbox in the help slide-over WebContentsView. For v2: evaluate Intercom when MRR > $10K.
MEDIUM 06 · Demo & Trial Mode 06 · Modo Demo y Trial
MISSINGNo sandbox or mock data strategy exists. New users who haven't connected a marketplace account see a blank app. Every B2B tool that converts well has a demo mode that shows the product's value immediately. No existe estrategia de sandbox o datos de prueba. Los usuarios nuevos sin cuenta marketplace conectada ven una app vacía. Todo B2B que convierte bien tiene un modo demo que muestra el valor del producto inmediatamente.
Demo Data Strategy
// demo/fixtures.ts
export const DEMO_SELLER = {
marketplace: 'amazon',
region: 'US',
storeName: 'Acme Electronics',
plan: 'pro',
};
export const DEMO_ASINS = [
{ asin: 'B08N5WRWNW', title: 'Wireless Earbuds Pro',
price: 49.99, buyBox: 78, bsr: 1247, stock: 342 },
{ asin: 'B09G9FPHY6', title: 'USB-C Hub 7-in-1',
price: 34.99, buyBox: 0, bsr: 891, stock: 12 }, // stock warning
{ asin: 'B0BDJ179PH', title: 'Phone Stand Aluminum',
price: 19.99, buyBox: 34, bsr: 3401, stock: 98 },
];
export const DEMO_COMPETITORS = {
'B08N5WRWNW': [
{ seller: 'TechDirect', price: 47.99, bbPercent: 22 },
{ seller: 'ElectroHub', price: 51.99, bbPercent: 0 },
],
};
// Demo coach responses — scripted for max "aha moment"
export const DEMO_CHAT_SCRIPT = [
{
trigger: 'first_message',
response: 'I can see your Buy Box win rate dropped 23% this week on B08N5WRWNW. Your main competitor TechDirect lowered their price to $47.99 two days ago. Want me to analyze if repricing to $46.49 would recover the Buy Box while maintaining your margin?',
},
];
Demo Banner — Persistent Indicator
Demo Mode
Simulated data — no real changes will be made
Demo Mode Rules
- • All tool calls return fixture data, never real API
- • Confirmation dialogs work but action is a no-op
- • Credits don't decrement (infinite demo credits)
- • Audit log shows demo actions with 🎭 prefix
- • "Connect Account" CTA always visible in sidebar
- • Demo mode auto-activates if no marketplace connected
MEDIUM 07 · Multi-Account Management 07 · Gestión Multi-Cuenta
MISSINGPower sellers operate 2-5 marketplace accounts (Amazon US + MX, MeLi MX + CO). No account switching UI is specified. This is a v1 blocker for agency users and will be requested in the first week of beta. Los sellers avanzados operan 2-5 cuentas de marketplace. No existe UI para cambio de cuenta. Es un bloqueador v1 para usuarios agencia.
Account Data Model
interface MarketplaceAccount {
id: string; // uuid
marketplace: 'amazon' | 'meli' | 'shopify';
region: string; // 'US' | 'MX' | 'CO' | etc.
displayName: string; // "Acme US Store"
avatarInitials: string; // "AC"
avatarColor: string; // auto-assigned from palette
lastSynced: number;
isDefault: boolean;
credentialKey: string; // safeStorage key reference
}
// Max accounts per plan:
// Free: 2 accounts
// Pro: 10 accounts
// (encourages Pro upsell for agencies)
Account Switcher UI
Switch Account
Acme US
Amazon US · ✓ active
Acme MX
Amazon MX
Add account
Account Switch Behavior
- • Context isolation: chat history, ASIN lists, and audit logs are scoped per account — switching loads the other account's data
- • Keyboard shortcut: Cmd+Shift+A opens account switcher dropdown
- • Status bar: shows active account name truncated to 20 chars + marketplace icon
- • Switch is instant: no reload, React state swap — chat input clears, context bar updates, tab bar highlights appropriate marketplace
MEDIUM 08 · Desktop OS Integration — Missing Specs 08 · Integración con el SO Desktop — Specs Faltantes
PARTIALSeveral Electron desktop OS integration points are specified at a high level but lack implementation detail: single-instance lock, deep link protocol, right-click context menus, tray badge counts, and drag-and-drop. Varios puntos de integración con el SO están especificados a alto nivel pero sin detalle de implementación.
Single-Instance Lock (prevents duplicate windows)
// main/index.ts
const gotTheLock = app.requestSingleInstanceLock();
if (!gotTheLock) {
app.quit(); // Second instance — quit immediately
} else {
// First instance: handle second-instance attempt
app.on('second-instance', (event, commandLine) => {
if (mainWindow) {
if (mainWindow.isMinimized()) mainWindow.restore();
mainWindow.focus();
// If launched with deep link (e.g., shopilot://auth/callback?code=...)
const deepLink = commandLine.find(arg => arg.startsWith('shopilot://'));
if (deepLink) handleDeepLink(deepLink);
}
});
}
Deep Link Protocol — shopilot://
// main/index.ts — Protocol registration
if (process.defaultApp) {
if (process.argv.length >= 2) {
app.setAsDefaultProtocolClient('shopilot', process.execPath, [path.resolve(process.argv[1])]);
}
} else {
app.setAsDefaultProtocolClient('shopilot');
}
// Supported deep link routes:
// shopilot://auth/callback?code=&state= → OAuth2 callback (Amazon/MeLi)
// shopilot://asin/{asin} → Focus chat on specific ASIN
// shopilot://alert/{alertId} → Open specific fraud/price alert
// shopilot://billing/upgrade → Jump to billing settings
function handleDeepLink(url: string) {
const parsed = new URL(url);
switch (parsed.pathname) {
case '/auth/callback':
mainWindow.webContents.send('auth:callback', {
code: parsed.searchParams.get('code'),
state: parsed.searchParams.get('state'),
});
break;
case `/asin/${parsed.pathname.split('/')[2]}`:
mainWindow.webContents.send('navigate:asin', parsed.pathname.split('/')[2]);
break;
}
}
Right-Click Context Menus
// main/contextMenu.ts
import { Menu, MenuItem, ipcMain } from 'electron';
ipcMain.on('show-context-menu', (event, context) => {
const menu = new Menu();
if (context.type === 'asin') {
menu.append(new MenuItem({
label: `Analyze ${context.asin}`,
click: () => event.sender.send('coach:analyze', context.asin),
}));
menu.append(new MenuItem({
label: 'View on Amazon',
click: () => shell.openExternal(`https://amazon.com/dp/${context.asin}`),
}));
menu.append(new MenuItem({ type: 'separator' }));
menu.append(new MenuItem({
label: 'Copy ASIN',
click: () => clipboard.writeText(context.asin),
}));
}
if (context.type === 'price') {
menu.append(new MenuItem({ label: 'Copy price', click: () => clipboard.writeText(context.value) }));
menu.append(new MenuItem({ label: 'Ask coach about this price', click: () => event.sender.send('coach:ask', `Why is this price ${context.value}?`) }));
}
menu.popup({ window: BrowserWindow.fromWebContents(event.sender)! });
});
Tray Menu + Badge Counts
// Update tray badge when alerts arrive
function updateTrayBadge(count: number) {
if (process.platform === 'darwin') {
app.dock.setBadge(count > 0 ? String(count) : '');
}
tray.setToolTip(`Shopilot — ${count > 0 ? `${count} alerts` : 'All clear'}`);
}
// Tray context menu
const trayMenu = Menu.buildFromTemplate([
{ label: 'Open Shopilot', click: () => mainWindow.show() },
{ label: 'Pause Coach', type: 'checkbox', checked: false,
click: (item) => mainWindow.webContents.send('coach:pause', item.checked) },
{ type: 'separator' },
{ label: 'Check for Updates', click: () => autoUpdater.checkForUpdatesAndNotify() },
{ label: 'Quit', click: () => app.quit() },
]);
PARTIAL 09 · E2E Testing Framework 09 · Framework de Pruebas E2E
INCOMPLETE SPECUnit tests and component tests are implied but no E2E testing framework is explicitly specified. For an Electron app making real API calls and marketplace mutations, E2E tests are non-negotiable before beta. Las pruebas unitarias están implícitas pero no se especifica framework de E2E. Para una app Electron que hace mutaciones reales en marketplaces, las pruebas E2E son no-negociables antes del beta.
Testing Pyramid for Shopilot
Critical E2E Test Cases (must pass before beta)
| Test Case | Why critical | Mode |
|---|---|---|
| App launches, shows demo mode, chat accepts input | Smoke test — must always pass | Demo |
| Connect Amazon account via OAuth → tokens stored in safeStorage | Auth is the first real action | Sandbox |
| Send message → tool executes → confirmation appears → user approves → audit log written | Core happy path | Mock API |
| Approve irreversible action → confirm with typed text → action recorded → rollback available | Trust-critical flow | Mock API |
| Credits hit 0 → coach blocks → credit exhausted modal shows → upgrade flow opens | Revenue-critical guard | Mock API |
| App restart → chat history loads from IndexedDB → last session visible | Persistence correctness | Mock API |
| Update available → modal shows → user clicks Restart → app re-opens at same state | Update UX must not lose work | Mocked updater |
Playwright + Electron Setup (click to expand)
// e2e/setup.ts
import { _electron as electron } from 'playwright';
import { test, expect } from '@playwright/test';
let electronApp: ElectronApplication;
test.beforeAll(async () => {
electronApp = await electron.launch({
args: ['dist/main/index.js'],
env: {
...process.env,
NODE_ENV: 'test',
SHOPILOT_DEMO_MODE: 'true', // use fixture data
},
});
});
test.afterAll(async () => {
await electronApp.close();
});
// Example test: coach chat flow
test('coach responds to ASIN query', async () => {
const window = await electronApp.firstWindow();
await window.fill('[data-testid="chat-input"]', 'What is happening with B08N5WRWNW?');
await window.press('[data-testid="chat-input"]', 'Enter');
await expect(window.locator('[data-testid="coach-response"]')).toBeVisible({ timeout: 10000 });
await expect(window.locator('[data-testid="tool-accordion"]')).toBeVisible();
});
10 · Production Readiness Checklist 10 · Checklist de Listo para Producción
GATE CRITERIAThese gates must pass before each release milestone. No gate can be manually overridden without written sign-off from CEO + CTO. Estos gates deben pasar antes de cada milestone de release. Ningún gate puede omitirse sin aprobación escrita del CEO + CTO.
GATE 1 — Private Beta (before any external user)
| ✓ | Requirement | Owner |
|---|---|---|
| ☐ | All 7 E2E test cases pass on macOS 14 + macOS 15 | Sergio |
| ☐ | Code signing + notarization working (Apple Developer cert) | Mateo |
| ☐ | Sentry DSN configured, PII filter verified, test event sent | Andrés |
| ☐ | Chat persistence: sessions survive app restart | Sergio |
| ☐ | Single-instance lock prevents duplicate window | Mateo |
| ☐ | Demo mode works without any marketplace credentials | Sergio |
| ☐ | Update notification modal tested with mock version bump | Mateo |
| ☐ | Privacy policy published at shopilot.ai/privacy | Pablo |
GATE 2 — Public Beta (before paid users)
| ✓ | Requirement | Owner |
|---|---|---|
| ☐ | GDPR data export (ZIP) working for all users | Andrés |
| ☐ | Account deletion flow tested end-to-end | Andrés |
| ☐ | In-app support (Crisp) embedded and tested | Sergio |
| ☐ | Multi-account: 2+ accounts with correct context isolation | Sergio |
| ☐ | Deep link protocol (shopilot://) working for OAuth callback | Mateo |
| ☐ | Tray menu + badge count for unread alerts | Sergio |
| ☐ | Right-click context menus on ASIN rows and prices | Sergio |
| ☐ | Terms of Service published + accepted on first launch | Pablo |
| ☐ | Stripe webhooks tested for subscription lifecycle | Andrés |
GATE 3 — v1.0 GA
| ✓ | Requirement | Owner |
|---|---|---|
| ☐ | Figma Atomic Design library complete (atoms + molecules + organisms) | External Design Team |
| ☐ | Figma MCP integration working (Claude reads components directly) | Mateo |
| ☐ | WCAG AA audit passing (axe-playwright on all screens) | Sergio |
| ☐ | Performance: cold start < 3s on 2019 MBP (8GB RAM) | Mateo |
| ☐ | Windows 11 build passing (secondary target) | Mateo |
| ☐ | SOC 2 Type I audit initiated (required for enterprise) | Pablo |
The 80/20 Rule for Production Readiness La Regla 80/20 para Estar Listo para Producción
80% of production incidents come from 20% of neglected areas: auth edge cases, update failures, data loss on crash, and silent API errors. This section addresses all four. Ship Gate 1 within the first 3 weeks of dev, Gate 2 before any paid user, and Gate 3 before any press mention. El 80% de los incidentes en producción vienen del 20% de áreas descuidadas: edge cases de auth, fallos de actualización, pérdida de datos en crash, y errores silenciosos de API.
15. Brand Intelligence Lab — 17 Brand Books + Shopilot Recommendation Brand Intelligence Lab — 17 Brand Books + Recomendación Shopilot
Deep-dive brand books for the 6 reference products + 10 YC-backed startups with similar contexts. Colors, typography, buttons, spacing, motion, voice — everything. Ends with the Shopilot Recommended Brand Book. Brand books a profundidad de los 6 productos de referencia + 10 startups respaldadas por YC con contextos similares. Colores, tipografía, botones, espaciado, motion, voz — todo. Termina con el Brand Book Recomendado de Shopilot.
6 Reference Brands6 Marcas de Referencia
10 YC Startups10 Startups YC
SynthesisSíntesis
Anthropic / Claude.ai
AI Safety Company · San Francisco · 2021 · YC Alumni (W21)Empresa de AI Safety · San Francisco · 2021 · Alumni YC (W21)
Brand PhilosophyFilosofía de Marca
"AI for human flourishing"
The Anthropic visual language is built around the concept of "clay" — unfired earth, warm, unfinished, human. The brand consciously rejects the cold blue-shifted AI aesthetic (think IBM, Microsoft Azure, early OpenAI). Instead: warmth, earth, copper, organic. The name "Claude" deliberately chosen for its French warmth and humanist connotations. Every color decision reflects: trustworthy AI that feels human, not robotic.El lenguaje visual de Anthropic se construye alrededor del concepto de "arcilla" — tierra sin cocer, cálida, inacabada, humana. La marca rechaza conscientemente la estética AI fría con tono azulado (como IBM, Microsoft Azure, OpenAI inicial). En cambio: calidez, tierra, cobre, orgánico. El nombre "Claude" elegido deliberadamente por su calidez francesa y connotaciones humanistas. Cada decisión de color refleja: AI confiable que se siente humana, no robótica.
Color SystemSistema de Color
#faf9f5
Background Light
RGB 250/249/245 · toasted cream
#141413
Background Dark
RGB 20/20/19 · warm undertone
#CC785C
Brand Copper
Logo · selection · icon
#d97757
UI Orange
CTAs · interactive elements
#6a9bcc
Muted Blue
Secondary · info states
#788c5d
Muted Green
Success · positive states
rgba(204,120,92,.15)
Selection BG
Text selection highlight
#1a1915
Surface Dark
Cards on dark bg
● Contrast: #141413 on #faf9f5 = 19.9:1 AAA · #CC785C on #faf9f5 = 5.0:1 AA · #d97757 on #141413 = 6.1:1 AAContraste: #141413 sobre #faf9f5 = 19.9:1 AAA · #CC785C sobre #faf9f5 = 5.0:1 AA · #d97757 sobre #141413 = 6.1:1 AA
● Rule: Never use pure black (#000) or pure white (#fff). The warmth delta of ~5 RGB units in each neutral makes everything feel premium vs. commodity.Regla: Nunca usar negro puro (#000) ni blanco puro (#fff). El delta de calidez de ~5 unidades RGB en cada neutro hace que todo se sienta premium vs. commodity.
Typography SystemSistema Tipográfico
| Role | Font | Weight | Usage |
|---|---|---|---|
| Display / Headlines | Styrene A / Styrene B | 400–700 | Hero titles, section headsTítulos hero, encabezados |
| Editorial / Long-form | Tiempos Text | 400 italic | Blog, docs, long readsBlog, docs, lectura larga |
| Product / UI Text | Styrene A | 400–500 | App UI, labels, bodyUI de app, etiquetas, cuerpo |
| Code / Data | JetBrains Mono | 400 | Code blocks, inline codeBloques de código, código inline |
| Accent / Quote | Galaxie Copernicus | 300 italic | Pull quotes, feature textPull quotes, texto destacado |
Type scale:Escala tipográfica: display-xxl: clamp(3rem, 5vw, 5rem) · display-lg: clamp(2rem, 3.5vw, 3.5rem) · display-xs: clamp(1.125rem, 1.5vw, 1.25rem) · body: 1rem/1.6
Button SystemSistema de Botones
● Primary: bg #d97757 · text white · radius 8px · padding 10px 20px · font-weight 600
● Secondary: border 1.5px #CC785C/50 · text #CC785C · bg transparent · same radii/padding
● Hover: filter: brightness(1.1) — never use a fixed darker hex, keep theming dynamicHover: filter: brightness(1.1) — nunca usar un hex oscuro fijo, mantener el theming dinámico
▶ Spacing · Shadows · Motion · Voice (deep spec) Espaciado · Sombras · Motion · Voz (spec profundo)
SpacingEspaciado
Site margin: clamp(2rem, 5rem)
Nav height: 68px (4.25rem)
Section gap: 96px–160px
Chat max-w: 768px (3xl)
Message max: 75ch
ShadowsSombras
Default: none
Flyout: 0 8px 32px rgba(0,0,0,.12)
Modal: 0 24px 64px rgba(0,0,0,.18)
Focus ring: 0 0 0 3px rgba(204,120,92,.3)
Motion
Menu open: 400ms
Dropdown: 200ms
Tooltip: 150ms
Easing: cubic-bezier(.4,0,.2,1)
Streaming: 0ms delay, instant
Brand VoiceVoz de Marca
Tone adjectivesAdjetivos de tono
Thoughtful · Warm · Honest · Direct · Curious · Humble
Anti-toneAnti-tono
Never: Hype-y · Corporate · Cold · Overpromising · Robotic
Writing styleEstilo de escritura
Conversational but precise. Short sentences. Active voice. Explains "why" not just "what".Conversacional pero preciso. Frases cortas. Voz activa. Explica el "por qué" no solo el "qué".
Shopilot inheritsShopilot hereda
Candidate inspiration: warm copper accent · dark backgrounds · trustworthy AI voiceInspiración candidata: acento cobre cálido · fondos oscuros · voz AI confiable
Cursor IDE
AI-Native Code Editor · Anysphere · 2022 · YC S22Editor de Código AI-Native · Anysphere · 2022 · YC S22
Brand PhilosophyFilosofía de Marca
"The AI-first code editor built for pair programming with AI"
Cursor's brand philosophy is hyper-functional. There is no decorative layer — every visual decision serves the task of writing code. The orange accent (#f54e00) is used only for the critical hot path: the most important action on screen. The warm off-white/off-black background signals "professional tool" vs. "consumer app." The UI is intentionally dense — developers are trained to read dense information quickly.La filosofía de marca de Cursor es híper-funcional. No hay capa decorativa — cada decisión visual sirve a la tarea de escribir código. El acento naranja (#f54e00) se usa solo para el hot path crítico: la acción más importante en pantalla. El fondo off-white/off-black cálido señala "herramienta profesional" vs. "app consumer". La UI es intencionalmente densa — los developers están entrenados para leer información densa rápidamente.
Color SystemSistema de Color
#f7f7f4
--color-theme-bg
Warm off-white
#26251e
--color-theme-fg
Warm off-black
#f54e00
--color-theme-accent
Hot orange · CTAs only
--fg-01 … --fg-100
Opacity Scale
Every 5% step from bg color
● Base units: --g: calc(10rem/16) ≈ 10px (grid) · --v: 1.375rem ≈ 22px (vertical rhythm)
● Duration: --duration: .14s · --duration-slow: .25s
● Easing: --ease-out-spring: cubic-bezier(.25,1,.5,1)
● Shadows: Ultra-minimal 0 0 1rem #00000005 — shadows only on flyouts, never on cards
● Border radii: 2 · 4 · 8 · 12 · 16px — smallest for inputs, largest for panels
TypographyTipografía
| Role | Font | Size | Notes |
|---|---|---|---|
| UI Product (sm) | System + custom | 11px (.6875rem) | --text-product-sm · labels, status--text-product-sm · etiquetas, estado |
| UI Product (base) | System + custom | 12px (.75rem) | --text-product-base · default text--text-product-base · texto por defecto |
| UI Product (lg) | System + custom | 13px (.8125rem) | --text-product-lg · section titles--text-product-lg · títulos de sección |
| Code / Data | JetBrains Mono | 12–13px | Code, terminal output, numbersCódigo, salida terminal, números |
Note: Cursor uses data-os=linux to switch to system font stack. Respects user's OS font preference — a developer-first accessibility decision.Nota: Cursor usa data-os=linux para cambiar al stack de fuente del sistema. Respeta la preferencia de fuente del OS del usuario — una decisión de accesibilidad developer-first.
Button SystemSistema de Botones
● Primary: bg #f54e00 · radius 6px · padding 8px 16px · font-weight 600 · no border
● Secondary: bg rgba(fff,.07) · border rgba(fff,.12) · radius 4px · font-weight 400
● Accent text buttons: color #f54e00 · bg transparent · hover underline onlyBotones de texto acento: color #f54e00 · bg transparent · hover solo subrayado
● Rule: orange CTA used ONCE per screen. Second most important action is always ghost.Regla: CTA naranja usado UNA VEZ por pantalla. La segunda acción más importante siempre es ghost.
What Shopilot InheritsQué Hereda Shopilot
Split pane 70/30 · WebContentsView architecture · --g/--v base units · opacity token scale · status bar 24px · ultra-minimal shadows · one orange CTA ruleSplit pane 70/30 · Arquitectura WebContentsView · Unidades base --g/--v · Escala de tokens de opacidad · Status bar 24px · Sombras ultra-mínimas · Regla de un CTA naranja
HubSpot / Canvas Design System
CRM & Marketing Platform · Cambridge MA · 2006 · Public ($HUBS)Plataforma CRM y Marketing · Cambridge MA · 2006 · Pública ($HUBS)
Brand PhilosophyFilosofía de Marca
"Sprocket-right: interfaces must work for the user, not impress other designers"
HubSpot's Canvas system represents 20 years of B2B SaaS learning. Their core insight: beautiful design at enterprise scale means designing for efficiency and clarity, not aesthetics. Every component is tested against "does this help the user complete their task faster?" The orange brand color (#ff7a00) was chosen for energy, approachability, and differentiation from blue-dominant CRM competitors (Salesforce). Canvas explicitly codifies the philosophy that function precedes form.El sistema Canvas de HubSpot representa 20 años de aprendizaje en SaaS B2B. Su insight principal: diseño hermoso a escala enterprise significa diseñar para eficiencia y claridad, no estética. Cada componente se prueba contra "¿esto ayuda al usuario a completar su tarea más rápido?". El color naranja de marca (#ff7a00) fue elegido por energía, cercanía y diferenciación de los competidores CRM dominados por azul (Salesforce). Canvas codifica explícitamente la filosofía de que la función precede a la forma.
Color SystemSistema de Color
#ffffff
Base White
Primary background
#2D3E50
Midnight Blue
Primary text · headers
#ff7a00
Calypso Orange
Brand · CTAs
#00BDA5
Teal
Success · secondary CTA
#F5C26B
Flax
Warning · alerts
#EAF0F6
Mist Gray
Panel backgrounds
#516F90
Regent Gray
Secondary text
#F2545B
Alizarin
Error · destructive
Typography + ButtonsTipografía + Botones
FontsFuentes
Display: HubSpot Serif (custom, Typekit)
UI: HubSpot Sans (custom, Typekit)
Code: Lucida Console / Courier New (fallback)
Scale: 12 · 14 · 16 · 20 · 24 · 32 · 40 · 48px
Radius: --cl-radius ~6px standard
Icons: SVG fill:currentColor · 2rem default · .cl-icon class
ButtonsBotones
What Shopilot InheritsQué Hereda Shopilot
Merchant-first philosophy · Data table density · Function over aesthetics principle · Multiple semantic colors for different alert types · Sprocket-right thinkingFilosofía merchant-first · Densidad de tablas de datos · Principio función sobre estética · Múltiples colores semánticos para tipos de alerta · Pensamiento Sprocket-right
Linear
Project Management Tool · San Francisco · 2019 · YC W20Herramienta de Gestión de Proyectos · San Francisco · 2019 · YC W20
Brand PhilosophyFilosofía de Marca
"Speed is a feature — every interaction must feel instantaneous"
Linear's brand is built on the premise that design debt in productivity tools costs people hours every week. Their aesthetic is extreme minimalism — not because it looks good, but because every unnecessary element steals attention. The indigo brand color (#5e6ad2) was chosen for calm authority: it communicates "serious tool for serious work" without being cold or aggressive. Background Woodsmoke (#1a1a1e) is the darkest of the reference brands — near-black, but slightly purple-shifted for warmth.La marca de Linear se construye sobre la premisa de que la deuda de diseño en herramientas de productividad le cuesta a la gente horas cada semana. Su estética es minimalismo extremo — no porque se vea bien, sino porque cada elemento innecesario roba atención. El color índigo de marca (#5e6ad2) fue elegido por autoridad tranquila: comunica "herramienta seria para trabajo serio" sin ser frío ni agresivo. El fondo Woodsmoke (#1a1a1e) es el más oscuro de las marcas de referencia — casi negro, pero ligeramente desplazado hacia el púrpura para dar calidez.
Color SystemSistema de Color
#1a1a1e
Woodsmoke
Primary bg · dark
#111116
Sidebar BG
Navigation panel
#5e6ad2
Indigo Brand
Logo · selected · CTAs
#8b8fa8
Oslo Gray
Secondary text
#25252a
Surface
Card backgrounds
#2e3035
Hover Surface
Row hover state
#4cb782
Done Green
Completed state
#eb5757
Cancelled Red
Error · blocked state
● Design rules:Reglas de diseño: No gradients ever · No decorative shadows · Use opacity over new colors · Border: 1px rgba(255,255,255,.06) onlySin gradients nunca · Sin sombras decorativas · Usar opacidad en lugar de nuevos colores · Borde: solo 1px rgba(255,255,255,.06)
● Keyboard-first: every action reachable without mouse. Speed is communicated through interaction, not animation.cada acción alcanzable sin mouse. La velocidad se comunica a través de la interacción, no de la animación.
Typography + ButtonsTipografía + Botones
Display: Inter Display · weights 300 (light) + 700 (bold)
UI: Inter · weights 400/500
Code: JetBrains Mono · 12–13px
Scale: 11 · 12 · 13 · 14 · 16 · 20 · 28 · 40px
Line height: 1.4 UI · 1.6 body
What Shopilot InheritsQué Hereda Shopilot
No gradients / no decorative shadows · Opacity token approach · Keyboard-first mindset · Dark bg with slight warm purple shift · Extreme information density without visual noiseSin gradients / sin sombras decorativas · Enfoque de tokens de opacidad · Mentalidad keyboard-first · Fondo oscuro con ligero tono púrpura cálido · Densidad de información extrema sin ruido visual
Vercel / Geist Design System
Frontend Cloud Platform · San Francisco · 2015 · YC W16Plataforma Cloud Frontend · San Francisco · 2015 · YC W16
Brand PhilosophyFilosofía de Marca
"Black canvas: dark mode is not a theme, it's the identity"
Vercel's brand is the most radical of the six. Pure black (#000000) as the primary background — not dark navy, not warm dark, pure black. This is intentional: developers live in dark mode, and Vercel wants to be the platform that feels like the best developer tool they've ever used. Maximum contrast, maximum focus. The Geist typeface (custom, now open source) was designed specifically for developer interfaces: geometric sans for UI, geometric mono for code. No accent color — pure black/white/gray hierarchy.La marca de Vercel es la más radical de las seis. Negro puro (#000000) como fondo primario — no navy oscuro, no oscuro cálido, negro puro. Esto es intencional: los developers viven en dark mode, y Vercel quiere ser la plataforma que se siente como la mejor herramienta de developer que han usado. Contraste máximo, foco máximo. La tipografía Geist (custom, ahora open source) fue diseñada específicamente para interfaces de developer: geométrica sans para UI, geométrica mono para código. Sin color de acento — jerarquía pura negro/blanco/gris.
Color System — Pure Grayscale + FunctionalSistema de Color — Escala de Grises Pura + Funcional
#000
#111
#333
#444
#666
#888
#eaeaea
#fafafa
Blue · Links · Info
Cyan · Success
Pink · Error/Warning
TypographyTipografía
Display/UI: Geist Sans (open source, Google Fonts)
Code/Data: Geist Mono (open source, Google Fonts)
Scale: 12 · 14 · 16 · 20 · 24 · 32 · 48 · 64px
Weight: 400 body · 500 medium · 600 semibold · 700 bold
Radius: 6px standard · 8px cards · 12px modal
ButtonsBotones
What Shopilot InheritsQué Hereda Shopilot
Dark-first approach · Pure functional color (no decoration) · High contrast focus ring · Developer-dense information hierarchy · Geist Mono (open source alternative to JetBrains Mono)Enfoque dark-first · Color puramente funcional (sin decoración) · Focus ring alto contraste · Jerarquía de información densa para developers · Geist Mono (alternativa open source a JetBrains Mono)
Shopify / Polaris Design System
Commerce Platform · Ottawa · 2006 · Public ($SHOP)Plataforma de Comercio · Ottawa · 2006 · Pública ($SHOP)
Brand PhilosophyFilosofía de Marca
"Merchant-first: every decision evaluated from the merchant's perspective"
Polaris is the most mature design system in this study — 7+ years of iteration, thousands of components, and a philosophy that has been consistently proven: clarity beats elegance. Shopify's merchant is not a designer or developer — they're a small business owner who needs to act fast and make money. The design system's entire vocabulary is optimized for task completion speed, not visual delight. The green brand color grew from the Shopify logo and represents growth, money, and success.Polaris es el sistema de diseño más maduro de este estudio — 7+ años de iteración, miles de componentes, y una filosofía consistentemente probada: la claridad supera a la elegancia. El comerciante de Shopify no es diseñador ni developer — es un dueño de pequeño negocio que necesita actuar rápido y ganar dinero. El vocabulario completo del sistema de diseño está optimizado para la velocidad de completar tareas, no para el deleite visual. El color verde de marca creció del logo de Shopify y representa crecimiento, dinero y éxito.
Color SystemSistema de Color
#FAFAFA
Background
Light mode primary
#202223
Ink
Primary text
#008060
Interactive Green
CTAs · brand
#95BF47
Logo Green
Brand logo only
#5C5F62
Subdued
Secondary text
#D82C0D
Critical
Error · destructive
#FFC453
Warning
Alert states
#AEE9D1
Success Light
Success bg tint
TypographyTipografía
All: Inter (UI) · system-ui fallback
Scale: 12 · 14 · 16 · 20 · 26 · 32px
Radius: 4px inputs · 8px cards · 12px modals
Data Viz Rules:Reglas de Data Viz:
● Totals bold + row 1 · Focus: 1 insight/chartTotales en negrita + fila 1 · Foco: 1 insight/chart
● Multiple data formats (table + chart always)Múltiples formatos de datos (tabla + chart siempre)
ButtonsBotones
What Shopilot InheritsQué Hereda Shopilot
Seller-first decision framework · Data viz rules (totals first, 1 insight) · Semantic color discipline · Clarity > elegance principle · A11y requirements for data tablesFramework de decisiones seller-first · Reglas data viz (totales primero, 1 insight) · Disciplina de color semántico · Principio claridad > elegancia · Requisitos a11y para tablas de datos
Brex
Corporate Fintech · San Francisco · 2017 · YC W17 · $12.3B valuationFintech Corporativo · San Francisco · 2017 · YC W17 · Valoración $12.3B
Brand PhilosophyFilosofía de Marca
"Make money management effortless for ambitious companies"
Brex is the closest contextual analogue to Shopilot in terms of trust architecture. Both handle real money on behalf of businesses, both require the UI to communicate precision and authority. Brex's design evolved from a startup-y orange era to a mature, premium dark theme. Current palette: near-black backgrounds (#0E0E0E), warm coral/salmon accent for CTA emphasis, Söhne as the premium custom typeface. The warm coral (not pure orange) signals "approachable financial authority" — slightly warmer than corporate, slightly cooler than consumer fintech.Brex es el análogo contextual más cercano a Shopilot en términos de arquitectura de confianza. Ambos manejan dinero real en nombre de negocios, ambos requieren que la UI comunique precisión y autoridad. El diseño de Brex evolucionó de una era naranja de startup a un tema oscuro premium maduro. Paleta actual: fondos casi-negros (#0E0E0E), acento coral/salmón cálido para énfasis CTA, Söhne como la tipografía premium custom. El coral cálido (no naranja puro) señala "autoridad financiera accesible" — ligeramente más cálido que el corporativo, ligeramente más frío que el fintech consumer.
Color SystemSistema de Color
#0E0E0E
Background Dark
Near-black · product UI
#FFFDF9
Background Light
Warm off-white
#F27B6B
Coral Accent
CTAs · brand emphasis
#FF5200
Hot Orange
High-urgency CTAs
#1A1A1A
Surface
Cards, panels
#2D2D2D
Border/Stroke
Dividers, outlines
#00C278
Success Green
Positive states
#FF4444
Error Red
Errors · blocks
TypographyTipografía
Display: Söhne (Klim Type Foundry) · €€€
UI: Söhne · weights 300/400/600
Data: Söhne Mono (tabular figures)
Scale: 11 · 13 · 15 · 18 · 24 · 36 · 48px
Key: Tabular figures for all financial data (tnum feature)Figuras tabulares para todos los datos financieros (feature tnum)
ButtonsBotones
Key Insights for ShopilotInsights Clave para Shopilot
● Trust architecture: Trust-critical data (balances, transactions) gets highest contrast (white-on-black). Secondary info gets progressively less contrast.Arquitectura de confianza: Los datos críticos de confianza (balances, transacciones) obtienen el mayor contraste (blanco sobre negro). La info secundaria obtiene progresivamente menos contraste.
● Tabular nums: All financial data uses font-variant-numeric: tabular-nums so numbers align vertically in tables.Nums tabulares: Todos los datos financieros usan font-variant-numeric: tabular-nums para que los números se alineen verticalmente en tablas.
● Could inspire Shopilot: Near-black background · Coral warm accent · Tabular nums for prices · Söhne inspiration (use Inter + JetBrains Mono as accessible equivalent)Podría inspirar a Shopilot: Fondo casi-negro · Acento coral cálido · Nums tabulares para precios · Inspiración Söhne (usar Inter + JetBrains Mono como equivalente accesible)
Mercury
Neobank for Startups · San Francisco · 2019 · YC S19 · $1.62B valuationNeobank para Startups · San Francisco · 2019 · YC S19 · Valoración $1.62B
Brand PhilosophyFilosofía de Marca
"Banking that gets out of your way"
Mercury achieved something extremely rare: making banking software look desirable. Their dark-mode-first interface (a radical choice for financial software in 2019) communicated that they understood their customer — tech founders who live in dark terminals. The Mercury Sans custom typeface has a slight humanist influence that prevents the bank UI from feeling cold and bureaucratic. The teal/blue accent is intentionally understated — mercury (the element) is subtle, precise, reflects its environment.Mercury logró algo extremadamente raro: hacer que el software bancario se viera deseable. Su interfaz dark-mode-first (una elección radical para software financiero en 2019) comunicó que entendían a su cliente — fundadores tech que viven en terminales oscuros. La tipografía custom Mercury Sans tiene una ligera influencia humanista que evita que la UI bancaria se sienta fría y burocrática. El acento teal/azul es intencionalmente contenido — el mercurio (el elemento) es sutil, preciso, refleja su entorno.
Color SystemSistema de Color
#0A0A0A
Background
Near-pure black
#FAFAF9
Light BG
Warm off-white
#4AA8FF
Mercury Blue
CTAs · links · selected
#00BFA5
Teal
Balance · positive
#141414
Surface
Cards · panels
#1E1E1E
Hover Surface
Row hover
#FF5F5F
Alert Red
Errors · negative bal.
#F5A623
Warning Amber
Low balance · pending
TypographyTipografía
Display/UI: Mercury Sans (custom, humanist geometric)
Numbers: Tabular lining figures (font-variant-numeric)
Code: Fira Code / iA Writer Mono (code blocks)
Weight: 300 light · 400 regular · 500 medium · 600 semibold
Spacing: letter-spacing: -0.01em for display text
Buttons + UI PatternsBotones + Patrones UI
Radius: 12px (rounded, approachable) · Borders: ultra-subtle rgba · Balance displayed in large mono at top of every pageRadio: 12px (redondeado, accesible) · Bordes: ultra-sutiles rgba · Balance mostrado en mono grande al inicio de cada página
Key Insights for ShopilotInsights Clave para Shopilot
Dark-first banking sets the precedent that serious financial tools CAN be dark mode · Balance/KPI always displayed in large mono (same as Shopilot GMV) · 12px radius makes data dense while remaining approachable · Warm off-white light mode for reports/print contextsEl banking dark-first sienta el precedente de que las herramientas financieras serias PUEDEN ser dark mode · Balance/KPI siempre mostrado en mono grande (igual que GMV de Shopilot) · Radio 12px hace los datos densos mientras permanecen accesibles · Off-white cálido modo claro para reportes/contextos de impresión
Retool
Internal Tools Builder · San Francisco · 2017 · YC S17 · $3.2B valuationConstructor de Herramientas Internas · San Francisco · 2017 · YC S17 · Valoración $3.2B
Brand PhilosophyFilosofía de Marca
"Build internal tools, 10x faster"
Retool is the master of data-dense UI. Their product is literally a table+form builder — every design decision serves the goal of making dense grids of data scannable and actionable. Their canvas-style editor is perhaps the most data-rich interface in SaaS. Blue accent (#3B5EE7) was chosen for authority and trust — similar to financial platforms but more "engineering-y" than coral/orange. The dark background (#202124) is slightly warm-gray, similar to VS Code, which their developer audience knows instinctively.Retool es el maestro de la UI densa en datos. Su producto es literalmente un constructor de tabla+formulario — cada decisión de diseño sirve al objetivo de hacer que las cuadrículas de datos densas sean escaneables y accionables. Su editor canvas es quizás la interfaz más rica en datos del SaaS. El acento azul (#3B5EE7) fue elegido por autoridad y confianza — similar a las plataformas financieras pero más "ingenieril" que coral/naranja. El fondo oscuro (#202124) es ligeramente gris cálido, similar a VS Code, que su audiencia de developers conoce instintivamente.
Color SystemSistema de Color
#202124
BG Dark
Warm gray (VS Code-ish)
#F8F9FA
BG Light
Default canvas
#3B5EE7
Blue Brand
Selected · CTAs
#5C7CFA
Blue Light
Hover · focus
#2C2D30
Surface
Panel bg
#37383B
Border
Dividers
#2ECC71
Success
OK states
#E74C3C
Error
Error states
Data Table Design (Core Pattern)Diseño de Data Table (Patrón Core)
● Row height: 32px compact · 40px default · 48px comfortable (user-configurable)Altura de fila: 32px compacto · 40px default · 48px cómodo (configurable por usuario)
● Header: sticky · sortable · resizable columns · filter per columnHeader: sticky · ordenable · columnas redimensionables · filtro por columna
● Numbers: Right-aligned in all numeric columns · font-variant-numeric: tabular-numsNúmeros: Alineados a la derecha en todas las columnas numéricas · font-variant-numeric: tabular-nums
● Could inspire Shopilot: Compact table density · Column sorting + filtering · Right-aligned numbers · VS Code-familiar warm gray bgPodría inspirar a Shopilot: Densidad de tabla compacta · Ordenación + filtrado de columnas · Números alineados a la derecha · Fondo gris cálido familiar de VS Code
Supabase
Open Source Firebase Alternative · Singapore · 2020 · YC S20 · $200M+ raisedAlternativa Firebase Open Source · Singapur · 2020 · YC S20 · +$200M recaudados
Brand PhilosophyFilosofía de Marca
"Build in a weekend, scale to millions"
Supabase's brand is perhaps the most distinctive in this study: an aggressive, developer-native green (#3ECF8E) on pure dark backgrounds. The green was chosen for its association with databases (terminal text), open source culture (GitHub green), and PostgreSQL. Their brand radiates developer confidence — "we're not trying to be enterprise, we're trying to be the best developer experience." The contrast between near-black backgrounds and the bright emerald is high (7.2:1), making every UI element immediately visible.La marca de Supabase es quizás la más distintiva de este estudio: un verde agresivo y developer-native (#3ECF8E) sobre fondos oscuros puros. El verde fue elegido por su asociación con bases de datos (texto terminal), cultura open source (GitHub verde) y PostgreSQL. Su marca irradia confianza de developer — "no estamos tratando de ser enterprise, estamos tratando de ser la mejor experiencia de developer". El contraste entre fondos casi-negros y el esmeralda brillante es alto (7.2:1), haciendo que cada elemento UI sea inmediatamente visible.
Color SystemSistema de Color
#1C1C1C
BG Dark
Primary background
#111111
BG Deeper
Sidebar / nav
#3ECF8E
Supabase Green
Brand · CTAs · selected
#00C973
Green Vivid
Running / active states
#262626
Surface
Cards
#3F3F3F
Border
Dividers
#F97316
Warning Amber
Attention states
#EF4444
Error Red
Errors · destructive
Typography + Key Insights for ShopilotTipografía + Insights Clave para Shopilot
● Display/UI: Inter (all weights) · Code: Fira Code / UI Monospace
● Radius: 6px uniform — very slightly rounded, feels professional not playfulRadio: 6px uniforme — muy ligeramente redondeado, se siente profesional no juguetón
● Could inspire Shopilot: Proof that a single strong accent color CAN be green for marketplaces (Shopify marketplace tab) · Dark + bright single accent contrast pattern · Warning using orange (candidate reference for Shopilot)Podría inspirar a Shopilot: Prueba de que un solo color de acento fuerte PUEDE ser verde para marketplaces (tab marketplace Shopify) · Patrón de contraste oscuro + acento único brillante · Warning usando naranja (referencia candidata para Shopilot)
PostHog
Open Source Product Analytics · London · 2020 · YC W20 · $225M raisedAnalytics de Producto Open Source · Londres · 2020 · YC W20 · $225M recaudados
Brand PhilosophyFilosofía de Marca
"The only product analytics platform where data stays yours"
PostHog is the most boldly-branded in this study. Hedgehog mascot, golden yellow (#F9BD2B) that actually glows, developer-irreverent tone. Their design deliberately breaks "enterprise SaaS" conventions to signal: we're built by developers, for developers, and we refuse to look boring. However, beneath the playfulness, the data visualization is meticulously precise. Their dark UI (#1D1D27 with purple-shifted dark) keeps analytics dashboards readable 8+ hours a day. The yellow is used sparingly for the most important elements.PostHog es la marca más audaz de este estudio. Mascota de erizo, amarillo dorado (#F9BD2B) que literalmente brilla, tono irreverente de developer. Su diseño rompe deliberadamente las convenciones de "enterprise SaaS" para señalar: somos construidos por developers, para developers, y nos negamos a vernos aburridos. Sin embargo, debajo del juego, la visualización de datos es meticulosamente precisa. Su UI oscura (#1D1D27 con oscuro desplazado hacia púrpura) mantiene los dashboards de analytics legibles 8+ horas al día. El amarillo se usa con moderación para los elementos más importantes.
Color SystemSistema de Color
#1D1D27
BG Dark
Purple-shifted dark
#FFFEF0
BG Light
Golden cream
#F9BD2B
PostHog Yellow
Brand · emphasis
#F54E00
Hot Orange
CTAs · high-priority
#2C2C3A
Surface
Cards · panels
#3C3C50
Border
Dividers
#2AC940
Success
Positive events
#F04438
Error
Error states
Key Insights for ShopilotInsights Clave para Shopilot
Purple-shifted dark backgrounds feel "deeper" than neutral dark — great for analytics views · Data precision underneath playful branding · Yellow used ONLY for the most important metric on screen (same principle that could apply to Shopilot's chosen accent (TBD)) · Chart color palette: 8 distinct hues, all at 60% saturation for harmonyLos fondos oscuros desplazados hacia púrpura se sienten "más profundos" que el oscuro neutro — excelente para vistas de analytics · Precisión de datos bajo una marca juguetona · Amarillo usado SOLO para la métrica más importante en pantalla (mismo principio que podría aplicar al acento elegido de Shopilot (por definir)) · Paleta de colores de charts: 8 tonos distintos, todos al 60% de saturación para armonía
Resend
Developer Email Platform · San Francisco · 2022 · YC W23 · $26M raisedPlataforma de Email para Developers · San Francisco · 2022 · YC W23 · $26M recaudados
Brand PhilosophyFilosofía de Marca
"Email for developers, built by developers"
Resend's brand is pure monochromatic minimalism — perhaps the most extreme in this study. Pure black (#000000), pure grays, one orange accent for the logo and primary CTA only. The philosophy: email infrastructure should be completely invisible, the developer's code is the product. Their UI is so stripped down it looks like GitHub's settings page elevated to art. This design communicates: we're not trying to impress you with UI, we're trying to not get in your way. Strong influence from Vercel's aesthetic (same investor: Guillermo Rauch's orbit).La marca de Resend es minimalismo monocromático puro — quizás el más extremo de este estudio. Negro puro (#000000), grises puros, un acento naranja para el logo y el CTA primario únicamente. La filosofía: la infraestructura de email debe ser completamente invisible, el código del developer es el producto. Su UI está tan despojada que parece la página de configuración de GitHub elevada a arte. Este diseño comunica: no estamos tratando de impresionarte con UI, estamos tratando de no interponernos en tu camino. Fuerte influencia de la estética de Vercel (mismo inversor: órbita de Guillermo Rauch).
Color System — Pure MonochromaticSistema de Color — Monocromático Puro
#000
BG
#0a0a
Surface
#171717
Card
#262626
Border
#525252
Muted
#a3a3a3
Secondary
#ededed
Primary
#fff
Headings
#FF5700
Logo Orange · CTA only
TypographyTipografía
All: Geist Sans + Geist Mono (open source)
Scale: 13 · 14 · 16 · 20 · 28 · 40px
Tracking: letter-spacing: -0.02em headings
Radius: 8px standard (slightly rounded)
ButtonsBotones
Key Insights for ShopilotInsights Clave para Shopilot
Proof that monochromatic + one accent works at scale · #000 vs #171717 vs #262626 — subtle layering creates depth without color · Code + logs = always Geist Mono / JetBrains Mono → reinforces precisionPrueba de que monocromático + un acento funciona a escala · #000 vs #171717 vs #262626 — capas sutiles crean profundidad sin color · Código + logs = siempre Geist Mono / JetBrains Mono → refuerza precisión
Clerk
Authentication Platform · San Francisco · 2021 · YC W22 · $170M raisedPlataforma de Autenticación · San Francisco · 2021 · YC W22 · $170M recaudados
Brand PhilosophyFilosofía de Marca
"The most comprehensive User Management Platform"
Clerk's brand sits at the intersection of developer tools and security software. Purple (#6C47FF) was chosen to differentiate from both the "enterprise blue" space (Okta, Auth0) and the "startup orange" space. It communicates "modern, premium, slightly magical" — auth happens in the background, Clerk makes it elegant. Their dark UI (#131316 — warm-shifted very dark) uses glass-morphism for the prebuilt UI components, an unusual choice that works because authentication is a "gateway moment" that benefits from premium feel.La marca de Clerk se sitúa en la intersección entre herramientas de developer y software de seguridad. El púrpura (#6C47FF) fue elegido para diferenciarse tanto del espacio "azul enterprise" (Okta, Auth0) como del espacio "naranja startup". Comunica "moderno, premium, ligeramente mágico" — la autenticación ocurre en el fondo, Clerk la hace elegante. Su UI oscura (#131316 — muy oscura con tono cálido) usa glass-morphism para los componentes UI prefabricados, una elección inusual que funciona porque la autenticación es un "momento puerta de entrada" que se beneficia de la sensación premium.
Color SystemSistema de Color
#131316
BG Dark
Warm-shifted dark
#FAFAFA
BG Light
Dashboard light mode
#6C47FF
Clerk Purple
Brand · CTAs · focus
#9B7DFF
Purple Light
Hover · secondary
#1C1C21
Surface
Cards
#2C2C35
Border
Dividers
#12B76A
Success
Auth success
#F04438
Error
Auth failure
Key Insights for ShopilotInsights Clave para Shopilot
Glass-morphism for "gateway moments" (login, confirmation dialogs) · Purple differentiation shows you don't need orange to be distinctive · #131316 warm-dark-shifted background similar to Shopilot's own bg · Onboarding modal design: clean step indicators, focus on one action per stepGlass-morphism para "momentos puerta de entrada" (login, diálogos de confirmación) · Diferenciación púrpura muestra que no necesitas naranja para ser distintivo · Fondo oscuro cálido #131316 similar al fondo propio de Shopilot · Diseño de modal de onboarding: indicadores de paso limpios, foco en una acción por paso
Deel
Global HR & Payroll · San Francisco · 2019 · YC W19 · $12B valuationRRHH y Nómina Global · San Francisco · 2019 · YC W19 · Valoración $12B
Brand PhilosophyFilosofía de Marca
"Hire anyone, anywhere — with compliance built in"
Deel handles international payroll for 35,000+ companies — arguably the most complex, trust-critical SaaS product in this study. Their design reflects that weight: corporate navy blue (#1D2130) backgrounds, conservative button styles, clear error states for compliance failures. Nothing flashy — a company trusting you with their global payroll needs you to look like you know what you're doing. The blue palette (#2B6EE4) is authoritative without being aggressive, similar to how a bank presents itself.Deel maneja la nómina internacional de 35,000+ empresas — posiblemente el producto SaaS más complejo y crítico de confianza de este estudio. Su diseño refleja ese peso: fondos azul marino corporativo (#1D2130), estilos de botón conservadores, estados de error claros para fallas de cumplimiento. Nada llamativo — una empresa que te confía su nómina global necesita que parezcas saber lo que estás haciendo. La paleta azul (#2B6EE4) es autoritaria sin ser agresiva, similar a cómo un banco se presenta.
Color SystemSistema de Color
#1D2130
BG Dark Navy
Primary dark surface
#F4F6FA
BG Light
Blue-tinted white
#2B6EE4
Deel Blue
CTAs · brand
#4D8FF0
Blue Light
Hover · secondary
#252A3C
Surface
Cards
#2F3547
Border
Dividers
#00C48C
Success Teal
Paid · approved
#FF647C
Error Coral
Failed · blocked
Key Insights for ShopilotInsights Clave para Shopilot
Navy-shifted dark bg (#1D2130) creates more "financial authority" feel than neutral dark · Compliance status rows: clear color coding (approved=teal, pending=amber, failed=red) · Dense multi-level table hierarchy (company > employee > payment) — similar to Shopilot's ASIN > marketplace > metric hierarchyFondo oscuro desplazado hacia navy (#1D2130) crea más sensación de "autoridad financiera" que el oscuro neutro · Filas de estado de cumplimiento: codificación de color clara (aprobado=teal, pendiente=amber, fallido=rojo) · Jerarquía de tabla multi-nivel densa (empresa > empleado > pago) — similar a jerarquía ASIN > marketplace > métrica de Shopilot
Replit
Browser-based IDE · San Francisco · 2016 · YC W18 · $1.16B valuationIDE en Navegador · San Francisco · 2016 · YC W18 · Valoración $1.16B
Brand PhilosophyFilosofía de Marca
"Code, create, and learn together"
Replit's brand bridges developer-serious and beginner-accessible. Their orange (#F56C2A) is warmer and more playful than Cursor's (#f54e00) — intentional, as Replit serves both students and professionals. The dark background (#0D1117) is identical to GitHub's dark mode — leveraging existing mental models for developers. Their recent pivot to "Replit AI" accelerated their design maturity: more glass effects, more gradient accents, more AI-native patterns. Strong parallel to Shopilot: both are Electron-like experiences where the IDE/marketplace is the primary canvas and AI assistance is the sidebar.La marca de Replit hace un puente entre serio-developer y accesible-principiante. Su naranja (#F56C2A) es más cálido y juguetón que el de Cursor (#f54e00) — intencional, ya que Replit sirve tanto a estudiantes como a profesionales. El fondo oscuro (#0D1117) es idéntico al modo oscuro de GitHub — aprovechando modelos mentales existentes de developers. Su pivot reciente a "Replit AI" aceleró su madurez de diseño: más efectos de vidrio, más acentos degradados, más patrones AI-native. Fuerte paralelismo con Shopilot: ambas son experiencias tipo Electron donde el IDE/marketplace es el canvas primario y la asistencia AI es la sidebar.
Color SystemSistema de Color
#0D1117
BG Dark
GitHub-identical dark
#F6F8FA
BG Light
GitHub-identical light
#F56C2A
Replit Orange
Brand · CTAs
#FF7B54
Orange Light
Hover state
#161B22
Surface
Cards · panels
#21262D
Border
Dividers
#3FB950
Success
Build success
#F85149
Error
Build error
Key Insights for ShopilotInsights Clave para Shopilot
Split IDE+AI sidebar = exact Shopilot architecture · GitHub-familiar dark (#0D1117) leverages existing developer trust · Orange on very dark bg creates high contrast CTA that developers actually click · AI sidebar streaming pattern identical to Shopilot's coaching sidebarSplit IDE+AI sidebar = arquitectura exacta de Shopilot · Oscuro familiar de GitHub (#0D1117) aprovecha confianza existente de developers · Naranja sobre fondo muy oscuro crea CTA de alto contraste que developers realmente hacen click · Patrón de streaming de sidebar AI idéntico a la sidebar de coaching de Shopilot
Luma
Event Platform · San Francisco · 2020 · YC W21 · $150M raisedPlataforma de Eventos · San Francisco · 2020 · YC W21 · $150M recaudados
Brand PhilosophyFilosofía de Marca
"Beautiful event pages that convert"
Luma is the most aesthetically-ambitious brand in this study. Where other products in this list use minimalism as a constraint, Luma uses it as a canvas. Their gradient-based identity (iridescent teal-purple-magenta) feels luxurious without being cluttered. Dark background (#09090B — the darkest in this study, almost absolute black) makes the gradients pop like neon lights in a dark room. Included here because Luma shows what happens when you invest in aesthetic excess as a differentiator — events need to feel exciting, and Luma's brand creates that emotional response. Relevant to Shopilot's onboarding and marketing pages.Luma es la marca más ambiciosa estéticamente de este estudio. Donde otros productos de esta lista usan el minimalismo como restricción, Luma lo usa como lienzo. Su identidad basada en degradados (teal-púrpura-magenta iridiscente) se siente lujosa sin estar saturada. El fondo oscuro (#09090B — el más oscuro de este estudio, casi negro absoluto) hace que los degradados resalten como luces de neón en una habitación oscura. Incluida aquí porque Luma muestra lo que sucede cuando inviertes en exceso estético como diferenciador — los eventos necesitan sentirse emocionantes, y la marca de Luma crea esa respuesta emocional. Relevante para las páginas de onboarding y marketing de Shopilot.
Color SystemSistema de Color
#09090B
BG Absolute
Near-perfect black
#FAFAFA
BG Light
Clean off-white
gradient
Brand Iridescent
Teal→Purple→Pink
#A855F7
Primary Purple
CTAs on dark bg
#141416
Surface
Cards
#1C1C1F
Surface 2
Nested cards
#4FACFE
Teal Blue
Info · links
#EC4899
Pink Accent
Featured · special
Key Insights for ShopilotInsights Clave para Shopilot
Gradient accents for marketing pages only (NOT product UI) — this is the lesson · #09090B absolute black → glass cards on top create incredible depth with zero shadows · Premium "entrance" moments deserve gradient treatment (Shopilot: first-login, marketplace activation) · Inter Display with tight letter-spacing (-0.04em) = expensive look at zero costAcentos degradados solo para páginas de marketing (NO UI de producto) — esta es la lección · Negro absoluto #09090B → tarjetas de vidrio encima crean profundidad increíble con cero sombras · Los momentos "entrada" premium merecen tratamiento degradado (Shopilot: primer login, activación marketplace) · Inter Display con espaciado de letras ajustado (-0.04em) = apariencia costosa a costo cero
Brand Identity: NOT DEFINED YETIdentidad de Marca: AÚN NO DEFINIDA
This section is a decision framework — a structured guide to the brand choices that must be made before any design system can be built. Nothing here is decided. The references above are inspiration material only.Esta sección es un framework de decisiones — una guía estructurada de las decisiones de marca que deben tomarse antes de construir cualquier design system. Nada aquí está decidido. Las referencias anteriores son solo material de inspiración.
Brand Decision Log — StatusRegistro de Decisiones de Marca — Estado
| # | DecisionDecisión | OptionsOpciones | StatusEstado |
|---|---|---|---|
| 01 | Brand philosophy / taglineFilosofía de marca / tagline | 3 candidates below3 candidatos abajo | PENDING |
| 02 | Primary colorColor primario | 4 palette candidates below4 paletas candidatas abajo | PENDING |
| 03 | Typography stackStack tipográfico | 3 pairings below3 combinaciones abajo | PENDING |
| 04 | Logo directionDirección del logo | Wordmark / Icon+Text / Abstract markWordmark / Icono+Texto / Marca abstracta | PENDING |
| 05 | Dark vs Light vs BothOscuro vs Claro vs Ambos | Lean: dark-first · Risk: alienates someRecomendación: dark-first · Riesgo: aliena a algunos | PENDING |
| 06 | Brand voice / personalityVoz / personalidad de marca | Expert coach / Trusted advisor / Efficient toolCoach experto / Asesor de confianza / Herramienta eficiente | PENDING |
Decision 01 · Brand PhilosophyDecisión 01 · Filosofía de Marca
CHOOSE ONEBased on the 16 brands studied, three directions emerged as viable for Shopilot. Each implies a different visual language, color family, and interaction tone.De las 16 marcas estudiadas, surgieron tres direcciones viables para Shopilot. Cada una implica un lenguaje visual, familia de colores e interacción diferente.
A · "Warm Precision"
Warm neutral backgrounds, orange/amber accent, trust through clarity. References: Linear + HubSpot. Best for: sellers who want a tool that feels like a trusted advisor, not a cold dashboard.Fondos neutrales cálidos, acento naranja/ámbar, confianza a través de la claridad. Referencias: Linear + HubSpot. Mejor para: sellers que quieren una herramienta que se siente como asesor de confianza, no un dashboard frío.
B · "Data Intelligence"
Pure dark, electric blue accent, Bloomberg-inspired density. References: Datadog + Bloomberg. Best for: power sellers who see the product as a professional data terminal, prioritizing information density over warmth.Oscuro puro, acento azul eléctrico, densidad estilo Bloomberg. Referencias: Datadog + Bloomberg. Mejor para: sellers avanzados que ven el producto como terminal de datos profesional, priorizando densidad sobre calidez.
C · "Growth Engine"
Dark with green/teal accent, optimistic tone. References: Shopify + Notion. Best for: growth-focused sellers who associate green with profit and want the tool to feel empowering and action-oriented.Oscuro con acento verde/teal, tono optimista. Referencias: Shopify + Notion. Mejor para: sellers orientados al crecimiento que asocian el verde con ganancia y quieren una herramienta empoderada.
Recomendación del estudio: Direction A ("Warm Precision") differentiates most from Helium 10 (purple/2018), Jungle Scout (green/consumer), and Repricer (corporate blue). It positions Shopilot as the only warm, AI-native seller tool. However — this is a recommendation, not a decision.La dirección A ("Warm Precision") diferencia más de Helium 10 (morado/2018), Jungle Scout (verde/consumidor) y Repricer (azul corporativo). Posiciona a Shopilot como la única herramienta de vendedor cálida y AI-native. Sin embargo — esto es una recomendación, no una decisión.
Decision 02 · Primary Color PaletteDecisión 02 · Paleta de Color Principal
CHOOSE ONEThese 4 candidates were derived from the competitive analysis. Each avoids direct collision with existing tools in the market.Estos 4 candidatos se derivaron del análisis competitivo. Cada uno evita colisión directa con herramientas existentes en el mercado.
Orange — #F97316
Energy + action. Competitive differentiation from purple (Helium 10), blue (Repricer), green (Jungle Scout). HubSpot owns "CRM orange" — risk: some overlap perception.Energía + acción. Diferenciación de morado (Helium 10), azul (Repricer), verde (Jungle Scout). HubSpot posee "CRM naranja" — riesgo: percepción de overlap.
Indigo — #6366F1
Intelligence + trust. Used by Linear. Risk: perceived as too similar to Helium 10's purple. Benefit: associates with AI/tech precision.Inteligencia + confianza. Usado por Linear. Riesgo: percibido demasiado similar al morado de Helium 10. Beneficio: asocia con precisión AI/tech.
Sky Blue — #0EA5E9
Clarity + openness. Clean differentiation. Risk: overly generic in SaaS. Benefit: universally accessible, no color blindness issues.Claridad + apertura. Diferenciación limpia. Riesgo: demasiado genérico en SaaS. Beneficio: universalmente accesible, sin problemas de daltonismo.
Violet — #8B5CF6
Premium + AI. High association with AI products (Claude, Perplexity). Risk: Helium 10 has purple brand equity. Benefit: strong AI-native signal to tech-savvy sellers.Premium + AI. Alta asociación con productos AI (Claude, Perplexity). Riesgo: Helium 10 tiene equity de marca morada. Beneficio: señal AI-native fuerte para sellers tech-savvy.
What the study recommends:Lo que el estudio recomienda: Orange (#F97316) for maximum warm contrast. But this requires a final call from the team — specifically: does Shopilot want to feel more like a financial tool (blue/indigo) or more like an action-oriented coach (orange)?Naranja (#F97316) para máximo contraste cálido. Pero esto requiere una decisión final del equipo — específicamente: ¿quiere Shopilot sentirse más como herramienta financiera (azul/índigo) o más como coach orientado a la acción (naranja)?
Decision 03 · Typography StackDecisión 03 · Stack Tipográfico
CHOOSE ONE| OptionOpción | Display / UIDisplay / UI | Numbers / CodeNúmeros / Código | ReferenceReferencia |
|---|---|---|---|
| A | Inter | JetBrains Mono | Linear, Vercel — neutral, modern, safe |
| B | Geist / DM Sans | JetBrains Mono | Vercel, Framer — slightly more personality |
| C | IBM Plex Sans | IBM Plex Mono | IBM, Datadog — technical authority, B2B trust |
All 3 options are free, widely available, and render well in Electron. The mono font for numbers is non-negotiable across all options — see Section 14 design rationale for why.Las 3 opciones son gratuitas, ampliamente disponibles y renderizan bien en Electron. La fuente mono para números es innegociable en todas las opciones — ver la sección 14 para el fundamento del diseño.
Decision 04 · Logo DirectionDecisión 04 · Dirección de Logo
CHOOSE ONEWordmark Only
Just the "Shopilot" name in custom lettering. Simple, flexible. Risk: hard to use at small sizes (tray icon, favicon).Solo el nombre "Shopilot" en lettering personalizado. Simple, flexible. Riesgo: difícil a tamaños pequeños.
Icon + Wordmark
Symbol that works standalone (tray, favicon, app icon) + name for contexts with space. Most flexible system.Símbolo que funciona solo (tray, favicon, ícono de app) + nombre para contextos con espacio. Sistema más flexible.
Abstract Mark
Unique geometric shape with no letterform. High memorability ceiling. Risk: requires brand awareness to work — too early for a v1 product.Forma geométrica única sin letterform. Alto techo de memorabilidad. Riesgo: requiere conocimiento de marca — demasiado pronto para v1.
Recommended for v1:Recomendado para v1: Option B (Icon + Wordmark). Allows a small icon in the macOS tray, a medium icon in the dock, and full wordmark in the sidebar. But the icon design itself is a separate creative decision — do not ship a placeholder.Opción B (Ícono + Wordmark). Permite un ícono pequeño en el tray de macOS, ícono mediano en el dock, y wordmark completo en el sidebar. Pero el diseño del ícono en sí es una decisión creativa separada — no hacer ship con un placeholder.
Decision 05 · Dark vs Light ModeDecisión 05 · Modo Oscuro vs Claro
CHOOSE ONEDark-first (recommended by study)Dark-first (recomendado por el estudio)
Cursor, Linear, Arc, Datadog, Claude — all dark-first. Reduces eye strain in long sessions. Numbers pop on dark backgrounds. All reference brands studied use dark mode as the primary experience. Competitive differentiation from Helium 10 (light default).Cursor, Linear, Arc, Datadog, Claude — todos dark-first. Reduce fatiga visual en sesiones largas. Los números destacan sobre fondos oscuros. Diferenciación de Helium 10 (claro por defecto).
Risk of dark-onlyRiesgo de solo oscuro
Some sellers work in bright environments (warehouses, offices). If Shopilot is dark-only, it may feel hard to read in those contexts. A light mode in Phase 2 is strongly advisable. V1: dark only to reduce scope.Algunos sellers trabajan en ambientes brillantes (almacenes, oficinas). Si Shopilot es solo oscuro, puede ser difícil de leer en esos contextos. Un modo claro en Fase 2 es muy recomendable. V1: solo oscuro para reducir el alcance.
→ How to use this section→ Cómo usar esta sección
- Review the 16 reference brand books above — understand what each brand does and why.Revisar los 16 brand books de referencia arriba — entender qué hace cada marca y por qué.
- Make a decision on each of the 6 items in the tracker at the top of this section. Pablo + Mateo + Sergio should be in the room.Tomar una decisión en cada uno de los 6 ítems del tracker al inicio de esta sección. Pablo + Mateo + Sergio deben estar presentes.
- Document the chosen direction back into this spec — replace "PENDING" with the decided value and the rationale.Documentar la dirección elegida de vuelta en este spec — reemplazar "PENDING" con el valor decidido y el razonamiento.
- Only then build design tokens (§14 · Stack) — the CSS custom properties, the Tailwind config, the Style Dictionary pipeline. Building tokens before the brand decisions are made is wasted work.Solo entonces construir los design tokens (§14 · Stack) — las propiedades CSS, el config de Tailwind, el pipeline de Style Dictionary. Construir tokens antes de decidir la marca es trabajo desperdiciado.
- Commission a designer for the logo once the color and philosophy direction are locked. Do not use AI-generated or placeholder marks in any public-facing context.Contratar a un diseñador para el logo una vez que la dirección de color y filosofía esté definida. No usar marcas generadas por AI ni placeholders en ningún contexto público.
Study Synthesis — Patterns Found Across All 16 Brands Síntesis del Estudio — Patrones Encontrados en las 16 Marcas
After analyzing 16 world-class products (Anthropic, Cursor, Linear, Arc, Figma, Stripe, Vercel, HubSpot, Shopify, Datadog, Bloomberg, Notion, Intercom, Brex, Mercury, Luma), 7 universal patterns emerged that every top-tier product shares — regardless of industry, color, or audience. These are conclusions, not recommendations for Shopilot. Tras analizar 16 productos de clase mundial, emergieron 7 patrones universales que comparten todos los productos de primer nivel — independientemente de industria, color o audiencia. Estas son conclusiones del estudio, no recomendaciones para Shopilot.
One strong primary color = brand ownership Un color primario fuerte = propiedad de categoría
Every studied brand owns exactly ONE color. Not two, not a gradient system as their identity — one color that is unmistakably theirs. This color appears on buttons, on the favicon, on the loading state, on the cursor. It becomes the brand. Cada marca estudiada posee exactamente UN color. No dos, no un sistema de gradientes como identidad — un color que es inconfundiblemente suyo. Aparece en botones, favicon, estado de carga y cursor. Se convierte en la marca.
Anthropic #CC785C copper
Linear #5e6ad2 indigo
HubSpot #FF7A59 orange
Shopify #96BF48 green
What the study shows:Lo que el estudio muestra: Color category ownership is first-come-first-served. Purple → Figma/Anthropic. Green → Shopify/Notion. Blue → almost every generic SaaS. Orange → HubSpot. The strongest move for a new brand is to claim a color that no dominant competitor owns in its specific category.La propiedad de color por categoría es "el primero en llegar se sirve primero". Morado → Figma/Anthropic. Verde → Shopify/Notion. Azul → casi todo SaaS genérico. Naranja → HubSpot. El movimiento más fuerte para una nueva marca es reclamar un color que ningún competidor dominante posea en su categoría específica.
Power tools are dark-first — light mode is an afterthought Las herramientas de poder son dark-first — el modo claro es secundario
Of the 16 brands studied: Cursor, Linear, Arc, Datadog, Bloomberg, Claude, Vercel, Brex, Mercury, Retool, PostHog — all ship dark as primary. Light mode exists but is not the designed-for experience. The pattern holds across every product category where users are professionals staring at screens for 6+ hours. De las 16 marcas estudiadas: Cursor, Linear, Arc, Datadog, Bloomberg, Claude, Vercel, Brex, Mercury, Retool, PostHog — todas hacen dark como primario. El modo claro existe pero no es la experiencia diseñada. El patrón se mantiene en toda categoría donde los usuarios son profesionales mirando pantallas por 6+ horas.
| ProductProducto | Primary modeModo primario | BackgroundFondo |
|---|---|---|
| Cursor | Dark | #1B1B1F — near-black, warm |
| Linear | Dark | #0F0F11 — pure dark |
| Claude / Anthropic | Dark | #1A1A2E — violet-shifted dark |
| Arc Browser | Dark | #1C1C1E — macOS standard dark |
| Datadog | Dark | #14131A — purple-shifted |
| Vercel | Dark | #000000 — pure black |
| HubSpot / Stripe | Light | #FFFFFF — pure white |
Study finding:Hallazgo del estudio: The dark backgrounds that work best are NOT pure black (#000). They are near-blacks with a hue shift — warm (#1B1B1F), cool (#0F0F11), violet (#14131A), or macOS system (#1C1C1E). Pure black creates harshness; hue-shifted dark creates depth. Also: the darker the background, the more the accent color pops — which is why dark-first products can use a single, lower-saturation accent and still feel branded.Los fondos oscuros que mejor funcionan NO son negro puro (#000). Son near-blacks con un cambio de tono — cálido (#1B1B1F), frío (#0F0F11), violeta (#14131A) o sistema macOS (#1C1C1E). El negro puro crea dureza; el oscuro con tono crea profundidad.
Typography: 2 fonts maximum — one sans, one mono Tipografía: máximo 2 fuentes — una sans, una mono
Every studied product uses a sans-serif for UI text and a monospace font for all data, code, and numbers. No exceptions. The monospace font for numbers is not a stylistic choice — it is functional: proportional fonts create unstable number columns. Monospace makes data scannable. Cada producto estudiado usa una sans-serif para texto de UI y una mono para datos, código y números. Sin excepciones. La fuente mono para números no es elección estilística — es funcional: las fuentes proporcionales crean columnas de números inestables. La mono hace los datos escaneables.
Sans-serif findingsHallazgos sans-serif
- Inter — Linear, Vercel, Notion, PostHog
- Geist — Vercel (custom, based on Inter)
- SF Pro — Arc, Cursor (system default)
- Söhne / Graphik — Anthropic, Figma
- IBM Plex Sans — Datadog, IBM products
Finding: Inter dominates because it's free, variable weight, and optimized for screens. The system font (SF Pro on Mac) is the "invisible" choice that native apps use for maximum rendering quality.Hallazgo: Inter domina por ser gratuita, variable y optimizada para pantallas.
Mono findingsHallazgos mono
- JetBrains Mono — Cursor, Linear, Vercel
- Fira Code — developer tools generally
- SF Mono — Arc, macOS native
- IBM Plex Mono — Datadog, Brex
- Geist Mono — Vercel (v2)
Finding: JetBrains Mono is the modern standard for developer-adjacent tools. Its ligatures are readable at 10–12px which is where data tables live.Hallazgo: JetBrains Mono es el estándar moderno para herramientas para desarrolladores. Sus ligaduras son legibles a 10-12px.
Motion is functional, not decorative — and it's invisible when done right El movimiento es funcional, no decorativo — es invisible cuando está bien hecho
None of the studied products use animation for visual delight. Every transition serves a purpose: orientation (this panel came from the right), state change (this button is now loading), hierarchy (this modal is above the content). The rule: if you can remove the animation and the user still understands what happened, the animation was decorative. Remove it. Ninguno de los productos estudiados usa animación para deleite visual. Cada transición sirve un propósito: orientación, cambio de estado, jerarquía. Regla: si puedes quitar la animación y el usuario aún entiende qué pasó, la animación era decorativa. Elimínala.
| AnimationAnimación | DurationDuración | PurposePropósito | Seen inVisto en |
|---|---|---|---|
| Hover bg change | 100–150ms | Acknowledge interaction | All products |
| Button press scale | 80ms ease-out | Physical click feedback | Linear, Arc, Luma |
| Modal slide-up | 200–250ms spring | Layer hierarchy | Figma, Notion, Linear |
| Streaming text fade | 80ms per word | Show AI is generating | Claude, Cursor |
| Thinking pulse ··· | 1.2s infinite | AI is processing | Claude, Cursor, Copilot |
| Sidebar collapse | 200ms ease-in-out | Preserve spatial orientation | Linear, Arc, Notion |
AI products share a specific visual language for trust and transparency Los productos AI comparten un lenguaje visual específico de confianza y transparencia
The study of Anthropic, Cursor, and Claude Code revealed a distinct pattern absent in non-AI products: every AI action is visually accountable. You always see what the AI is doing, what tool it used, how long it took. There are no black boxes in the UI of the best AI products. El estudio de Anthropic, Cursor y Claude Code reveló un patrón distinto ausente en productos no-AI: cada acción de la IA es visualmente accountable. Siempre ves qué está haciendo, qué herramienta usó, cuánto tardó. No hay cajas negras en la UI de los mejores productos AI.
AI-native patterns (present in all studied AI products)Patrones AI-native (presentes en todos los AI estudiados)
- ✓Streaming first: never show a spinner while generating textnunca mostrar spinner mientras se genera texto
- ✓Tool transparency: show every tool call with name + duration + resultmostrar cada tool call con nombre + duración + resultado
- ✓Reversibility signals: visually distinguish reversible from irreversible actions before confirmationdistinguir visualmente reversible de irreversible antes de confirmar
- ✓Context visibility: always show what the AI knows (context window, memory, recent files)siempre mostrar qué sabe la IA (ventana de contexto, memoria, archivos recientes)
- ✓Interrupt capability: stop button always visible during AI generationbotón de stop siempre visible durante generación
Anti-patterns (absent in top AI products)Anti-patrones (ausentes en top AI products)
- ✗Skeleton loaders for AI output — creates false expectation of content structureSkeleton loaders para output AI — crea expectativa falsa de estructura
- ✗Generic spinners while thinking — no information, builds anxietySpinners genéricos mientras piensa — sin información, genera ansiedad
- ✗Hiding tool execution — users don't know what changed in their systemsOcultar ejecución de herramientas — usuarios no saben qué cambió
- ✗One-shot confirmation dialogs — no diff, no preview, just "Are you sure?"Confirmaciones de un solo paso — sin diff, sin preview, solo "¿Estás seguro?"
Information density is a product decision, not a design afterthought La densidad de información es una decisión de producto, no un afterthought de diseño
The studied products cluster into two density philosophies — and both work, but for different users. The choice of density must be made at the product level before any design work begins, because it determines spacing tokens, component heights, font sizes, and the entire information architecture. Los productos estudiados se agrupan en dos filosofías de densidad — ambas funcionan, pero para usuarios distintos. La elección de densidad debe hacerse a nivel de producto antes de cualquier trabajo de diseño, porque determina tokens de espaciado, alturas de componentes, tamaños de fuente y toda la arquitectura de información.
High density — expert toolsAlta densidad — herramientas expertas
Bloomberg, Datadog, Retool, Brex. Row height ≈ 32px. Font size: 11–12px. Assume users know what they're looking at. More information per screen = fewer clicks. Used by professionals who stare at it for hours.Bloomberg, Datadog, Retool, Brex. Altura de fila ≈ 32px. Tamaño de fuente: 11-12px. Los usuarios saben lo que están mirando. Más información por pantalla = menos clics.
Comfortable density — balanced toolsDensidad confortable — herramientas balanceadas
Linear, Notion, Intercom, Luma. Row height ≈ 44px. Font size: 13–14px. Sufficient whitespace to feel premium without hiding data. Works for both new and expert users.Linear, Notion, Intercom, Luma. Altura de fila ≈ 44px. Tamaño de fuente: 13-14px. Suficiente espacio en blanco para sentirse premium sin ocultar datos.
Brand = how you speak, not just how you look La marca es cómo hablas, no solo cómo te ves
The strongest brands in the study have a distinct voice in every single word of their UI — button labels, error messages, onboarding copy, empty states, confirmation dialogs. The voice is as distinctive as the color. Stripe writes error messages like a knowledgeable friend. Linear writes UI copy with extreme brevity. Anthropic writes with careful epistemic humility ("I think", "Based on what I know"). Las marcas más fuertes del estudio tienen una voz distintiva en cada palabra de su UI — etiquetas de botones, mensajes de error, copy de onboarding, estados vacíos, diálogos de confirmación. La voz es tan distintiva como el color.
Stripe
Error: "Your card was declined. This sometimes happens if the issuing bank suspects fraud. Try a different card or contact your bank."Error: "Tu tarjeta fue rechazada. A veces ocurre si el banco sospecha fraude. Intenta con otra tarjeta."
Linear
Error: "Failed to sync." ← That's it. No explanation. They trust users to understand context. Extreme brevity as brand.Error: "No se pudo sincronizar." ← Eso es todo. Sin explicación. Brevedad extrema como marca.
Anthropic / Claude
Response: "I'm not certain, but based on what I know..." — epistemic humility baked into every sentence.Respuesta: "No estoy seguro, pero basándome en lo que sé..." — humildad epistémica en cada frase.
Summary — What all world-class products shareResumen — Lo que comparten todos los productos de clase mundial
| DimensionDimensión | Universal patternPatrón universal | Applies to Shopilot?¿Aplica a Shopilot? |
|---|---|---|
| Color | 1 primary accent, 2 functional (success/error), neutral scale | Yes — must decide |
| Background | Near-black with hue shift (not #000 or #111) | Yes — must decide hue |
| Typography | 1 sans for UI + 1 mono for all numbers/data | Yes — must choose pair |
| Motion | 100–250ms, purposeful only, spring easing | Yes — adopt directly |
| AI states | Streaming text, thinking pulse, tool transparency | Yes — core requirement |
| Density | Choose high or comfortable — don't mix | Yes — must decide |
| Voice | Every word of UI reflects brand personality | Yes — must define |
| Logo | Works at 16px (favicon/tray) AND at 200px | Yes — must commission |
What Shopilot Needs — Design Requirements Analysis Lo que Shopilot Necesita — Análisis de Requerimientos de Diseño
Based on the study synthesis and Shopilot's product definition (AI-native Electron desktop app for e-commerce sellers, 70/30 split, 36 tools, marketplace integration), here is every design element the product needs — independent of brand decisions. These are requirements, not solutions. Basado en la síntesis del estudio y la definición del producto Shopilot (app Electron desktop AI-native para sellers de e-commerce, split 70/30, 36 herramientas, integración de marketplace), aquí están todos los elementos de diseño que el producto necesita — independientemente de las decisiones de marca. Estos son requerimientos, no soluciones.
The 15 things Shopilot must complete to have a world-class designLas 15 cosas que Shopilot debe completar para tener un diseño de clase mundial
Single source of truth. Everything in one place. The detailed breakdown is in the categories below — this is the executive view.Fuente única de verdad. Todo en un lugar. El desglose detallado está en las categorías debajo — esta es la vista ejecutiva.
Phase 1 — Brand IdentityFase 1 — Identidad de Marca (before writing a single line of UI code)
| # | TaskTarea | OutputOutput | OwnerOwner | StatusEstado |
|---|---|---|---|---|
| 01 | Run brand workshop — choose Brand Philosophy (what emotion does Shopilot own?)Realizar brand workshop — elegir Filosofía de Marca (¿qué emoción posee Shopilot?) | 1-sentence brand positionPosición de marca en 1 oración | Pablo | PENDING |
| 02 | Decide primary brand color — pick from candidates (see §Brand Decision Framework)Decidir color primario de marca — elegir de candidatos (ver §Brand Decision Framework) | 1 hex value, named, documented1 valor hex, nombrado, documentado | Pablo + team | PENDING |
| 03 | Choose typography pair — UI sans + data mono (see §24 References for options)Elegir par tipográfico — UI sans + data mono (ver §24 Referencias para opciones) | 2 font names, weight scale defined2 nombres de fuentes, escala de pesos definida | Pablo + Sergio | PENDING |
| 04 | Build the color system — dark bg scale (4 tones) + text scale (4 levels) + semantic colorsConstruir el sistema de color — escala dark bg (4 tonos) + escala de texto (4 niveles) + colores semánticos | design-tokens.json — color sectiondesign-tokens.json — sección de color | Sergio | BLOCKED by 02 |
| 05 | Commission logo — wordmark + icon mark, works at 16px and 512pxEncargar logo — wordmark + icon mark, funciona a 16px y 512px | SVG files: logo.svg, icon.svg, favicon.svgArchivos SVG: logo.svg, icon.svg, favicon.svg | Pablo (hire) | BLOCKED by 01+02 |
Phase 2 — UI FoundationFase 2 — Fundación UI (tokens → CSS vars → Tailwind config, semanas 1–2)
| # | TaskTarea | OutputOutput | OwnerOwner | StatusEstado |
|---|---|---|---|---|
| 06 | Complete design-tokens.json — spacing (--g / --v system), radii, shadows, durationCompletar design-tokens.json — espaciado (sistema --g / --v), radios, sombras, duración | tokens.json W3C DTCG format |
Sergio + Mateo | BLOCKED by 04 |
| 07 | Run Style Dictionary pipeline — tokens.json → CSS :root vars + tailwind.config.jsEjecutar pipeline Style Dictionary — tokens.json → CSS :root vars + tailwind.config.js | tokens.css, tailwind.config.js |
Mateo | BLOCKED by 06 |
| 08 | Build Electron window shell — frameless + drag region + macOS traffic lights + 70/30 splitConstruir shell de ventana Electron — frameless + drag region + botones macOS + split 70/30 | Running Electron with correct window chromeElectron corriendo con chrome de ventana correcto | Sergio | PENDING |
| 09 | Implement base atoms — Button (6 variants), Badge, Input, Spinner, Tooltip, DividerImplementar átomos base — Button (6 variantes), Badge, Input, Spinner, Tooltip, Divider | 6 React components using tokens6 componentes React usando tokens | Sergio | BLOCKED by 07 |
Phase 3 — Core ComponentsFase 3 — Componentes Core (semanas 2–6)
| # | TaskTarea | OutputOutput | OwnerOwner | StatusEstado |
|---|---|---|---|---|
| 10 | Build Coach screen — streaming text cursor ▊ + thinking pulse ··· + tool accordion (4 states) + chat inputConstruir pantalla Coach — cursor de texto streaming ▊ + pulso thinking ··· + tool accordion (4 estados) + input de chat | Functional coach view with AI state machineVista coach funcional con máquina de estados AI | Sergio | BLOCKED by 09 |
| 11 | Build Confirmation Dialog — reversible (amber) vs irreversible (red) variants + diff displayConstruir Confirmation Dialog — variantes reversible (amber) vs irreversible (rojo) + diff display | ConfirmationDialog.tsx 2 variants |
Sergio | BLOCKED by 09 |
| 12 | Build KPI card + data table (sortable) + delta badges — the 80% of the Dashboard screenConstruir KPI card + data table (sortable) + delta badges — el 80% de la pantalla Dashboard | Dashboard screen with real dataPantalla Dashboard con datos reales | Sergio + Andrés | BLOCKED by 09 |
| 13 | Build status bar (24px) — agent state dot left + credits + model name rightConstruir status bar (24px) — punto de estado del agente izquierda + créditos + nombre de modelo derecha | StatusBar.tsx always visible |
Sergio | BLOCKED by 08 |
| 14 | Build context bar — active ASIN + marketplace dot + context window progress barConstruir context bar — ASIN activo + punto de marketplace + barra de progreso de context window | ContextBar.tsx |
Sergio | BLOCKED by 09 |
| 15 | Accessibility audit — WCAG AA contrast check on all components, keyboard nav, focus ringsAuditoría de accesibilidad — verificación de contraste WCAG AA en todos los componentes, navegación por teclado, focus rings | 0 WCAG AA violations0 violaciones WCAG AA | Sergio + Andrés | BLOCKED by 09-14 |
Critical path:Ruta crítica: 01 (brand workshop) unblocks everything. Nothing else can start until the team aligns on what emotion Shopilot owns. That's the only decision that can't be delegated or automated.01 (brand workshop) desbloquea todo. Nada más puede empezar hasta que el equipo se alinee en qué emoción posee Shopilot. Es la única decisión que no puede ser delegada ni automatizada.
Category 1 — Brand Identity Elements (detail)Categoría 1 — Elementos de Identidad de Marca (detalle)
ALL MISSINGTODO FALTANTE| ElementElemento | Why neededPor qué se necesita | Used whereUsado dónde | StatusEstado |
|---|---|---|---|
| Logo mark (icon) | Works at 16px — macOS dock, tray, favicon | Electron dock icon, tray, browser tab | MISSING |
| Wordmark (logotype) | Full name, readable at 120px+ | App sidebar header, landing page, screenshots | MISSING |
| Primary brand color | Buttons, links, active states, focus rings | Everywhere interactive — 200+ UI elements | PENDING DECISION |
| Background color scale | Base, surface, card, elevated — 4 dark tones | Every screen, every component | PENDING DECISION |
| Foreground color scale | Primary text, secondary, muted, disabled — 4 levels | All text, labels, placeholders | DERIVES FROM BG |
| Functional colors | Success (green), Warning (amber), Error (red), Info (blue) | Alerts, badges, status indicators, audit log | STANDARD — PICK |
| UI typography (sans) | All text except numbers | Labels, paragraphs, headings, button text | PENDING DECISION |
| Data typography (mono) | All numbers, prices, percentages, code | KPI cards, tables, status bar, audit log | PENDING DECISION |
Category 2 — UI Components Required by the ProductCategoría 2 — Componentes UI Requeridos por el Producto
These are derived from Shopilot's 36 tools and 4 core screens (Coach view, Dashboard, Settings, Billing). Not a design choice — a product requirement.Se derivan de las 36 herramientas de Shopilot y 4 pantallas principales. No es elección de diseño — es un requerimiento del producto.
Foundation (week 1)Fundación (semana 1)
- • Design tokens (CSS vars)
- • Button (6 variants)
- • Input / Textarea
- • Badge / Tag
- • Icon system (Lucide)
- • Tooltip
- • Spinner / Loading
- • Divider
Coach screen (week 2-3)Pantalla Coach (semana 2-3)
- • Chat message (user/AI)
- • Streaming text cursor ▊
- • Thinking pulse ···
- • Tool accordion (4 states)
- • Confirmation dialog
- • Proactive suggestion card
- • Context bar (ASIN + tokens)
- • Chat input + send button
Data screens (week 4-6)Pantallas de datos (semana 4-6)
- • KPI metric card
- • Data table (sortable)
- • Buy Box indicator
- • Price delta bar
- • BSR sparkline
- • Audit log timeline
- • Credit economy bar
- • Fraud alert banner
Category 3 — Electron Desktop-Specific RequirementsCategoría 3 — Requerimientos Específicos de Desktop Electron
These have no equivalent in web apps. Required because Shopilot ships as a native macOS/Windows app, not a browser tab.No tienen equivalente en apps web. Requeridos porque Shopilot es app nativa macOS/Windows, no una pestaña de browser.
- →Title bar: frameless window with drag region + macOS traffic lightsventana sin marco con región de arrastre + botones macOS
- →Tab bar: marketplace switcher (Amazon / MeLi / Shopify) with colored dotsswitcher de marketplace (Amazon / MeLi / Shopify) con puntos de color
- →Status bar: 24px bottom bar — agent state left, credits + model rightbarra inferior 24px — estado del agente izq, créditos + modelo der
- →Tray icon: 16x16 mono SVG + badge count for alertsSVG mono 16x16 + badge para alertas
- →70/30 split: marketplace WebView (left) + React sidebar (right) — visual seam between themWebView de marketplace (izq) + sidebar React (der) — costura visual entre ellos
- →Update modal: version info + changelog + progress + restart buttoninfo de versión + changelog + progreso + botón de reinicio
- →Notification system: 3 levels: in-app banner → OS push → tray badge3 niveles: banner in-app → push OS → badge del tray
- →App icon: 1024×1024px for App Store + 512px for macOS dock1024×1024px para App Store + 512px para dock macOS
Workflow 0 → Complete Brand — The Efficient Path Workflow 0 → Marca Completa — El Camino Eficiente
The most efficient process to go from "no brand" to a production-ready design system that rivals Anthropic, Cursor, or Linear. This is the process — not based on opinion, but on how the reference brands actually built their design systems. El proceso más eficiente para ir de "sin marca" a un design system listo para producción que rivalice con Anthropic, Cursor o Linear. Este es el proceso — no basado en opinión, sino en cómo las marcas de referencia construyeron sus design systems.
The 5-Phase ProcessEl Proceso de 5 Fases
Brand WorkshopBrand Workshop
1–2 days · Pablo + Mateo + Sergio1-2 días · Pablo + Mateo + SergioMake the 6 brand decisions from the Decision Framework above. No design tools needed — just a whiteboard or Notion doc. Output: a 1-page brand brief with every decision locked.Tomar las 6 decisiones de marca del Framework de Decisiones anterior. No se necesitan herramientas de diseño — solo una pizarra o doc de Notion. Output: un brand brief de 1 página con cada decisión bloqueada.
Decisions to lock in this phase:Decisiones a bloquear en esta fase:
Visual Identity in FigmaIdentidad Visual en Figma
3–5 days · Designer (contract) + Pablo review3-5 días · Diseñador (contrato) + revisión PabloThis is where Figma enters — but only for visual identity exploration, not for UI design. The goal is to validate color, logo, and typography before writing a single line of code. Figma is used here because visual decision-making is faster with a canvas tool than in code.Aquí es donde entra Figma — pero solo para exploración de identidad visual, no para diseño de UI. El objetivo es validar color, logo y tipografía antes de escribir una sola línea de código. Figma se usa aquí porque la toma de decisiones visuales es más rápida con una herramienta canvas.
What goes into Figma in Phase 2:Qué va a Figma en la Fase 2:
- • Logo mark explorations (6–10 directions)Exploraciones del logo (6-10 direcciones)
- • Color palette validation (light + dark test)Validación de paleta (test claro + oscuro)
- • Typography specimens (all weights + sizes)Especímenes tipográficos (todos los pesos + tamaños)
- • 3 brand application mockups (app icon, sidebar header, marketing screenshot)3 mockups de aplicación de marca
What does NOT go into Figma in Phase 2:Qué NO va a Figma en la Fase 2:
- • Full UI screens — premature without tokensPantallas completas de UI — prematuro sin tokens
- • Component library — built in code, not FigmaLibrería de componentes — se construye en código
- • User flows — too earlyUser flows — demasiado pronto
Tools for Phase 2:Herramientas para la Fase 2: Figma (free tier is enough) · fontpair.co for typography pairing · Coolors.co or Realtime Colors for palette generation · Adobe Color for accessibility check · Contrast.app for WCAG validationFigma (tier gratuito es suficiente) · fontpair.co para combinación tipográfica · Coolors.co o Realtime Colors para generación de paleta · Adobe Color para verificación de accesibilidad
Design Tokens → CodeDesign Tokens → Código
2 days · Sergio + Mateo2 días · Sergio + MateoOnce brand decisions are locked from Phase 2, translate them into code immediately. This is where Figma connects to Claude Code: take the approved color values and typography from Figma, encode them as design tokens, and generate the CSS + Tailwind config. Claude Code accelerates this from 2 days to 4 hours.Una vez bloqueadas las decisiones de marca de la Fase 2, traducirlas a código inmediatamente. Aquí es donde Figma se conecta con Claude Code: tomar los valores de color y tipografía aprobados de Figma, codificarlos como design tokens, y generar el CSS + Tailwind config.
Figma → Claude Code integration flow:Flujo de integración Figma → Claude Code:
- Export approved brand values from Figma as JSON (Figma Variables → JSON via plugin "Variables Import Export")Exportar valores de marca aprobados desde Figma como JSON (Figma Variables → JSON via plugin)
- Paste JSON into Claude Code: "Convert these brand values to a W3C DTCG tokens.json file"Pegar JSON en Claude Code: "Convierte estos valores de marca a un archivo tokens.json DTCG W3C"
- Claude Code generates: tokens.json + style-dictionary.config.mjs + globals.css + tailwind.config.tsClaude Code genera: tokens.json + style-dictionary.config.mjs + globals.css + tailwind.config.ts
- Run Style Dictionary → CSS custom properties are live in the appEjecutar Style Dictionary → propiedades CSS custom están vivas en la app
- Validate: open Electron app, confirm colors match Figma specValidar: abrir app Electron, confirmar que los colores coinciden con el spec de Figma
Component Library with Claude CodeLibrería de Componentes con Claude Code
3–6 weeks · Sergio (primary) + Claude Code3-6 semanas · Sergio (principal) + Claude CodeThis is the main build phase. All components are defined in Figma (#18 Design System) following Atomic Design (atoms, molecules, organisms, templates, pages). Claude reads the Figma via Figma MCP and implements matching React components in #1 Native Shell. No components are created outside of what is defined in the Figma.Esta es la fase de construcción principal. Todos los componentes están definidos en Figma (#18 Design System) siguiendo Atomic Design (átomos, moléculas, organismos, plantillas, páginas). Claude lee el Figma via Figma MCP e implementa componentes React en #1 Native Shell. No se crean componentes fuera de lo definido en el Figma.
How Claude Code works in this phase:Cómo trabaja Claude Code en esta fase:
- • Spec → component:Spec → componente: Give Claude Code a description from this spec (e.g., "build ToolAccordion with 4 states: queued/running/success/error, using design tokens from globals.css") → it generates the full TypeScript componentDarle a Claude Code una descripción de este spec → genera el componente TypeScript completo
- • Figma → React:Figma → React: Claude reads the Figma component via Figma MCP and generates all variants automatically with matching props and statesClaude lee el componente en Figma via Figma MCP y genera todas las variantes automáticamente con props y estados que coinciden
- • Accessibility audit:Auditoría de accesibilidad: "Review this component for WCAG AA compliance and fix any issues" — Claude Code runs the audit inline"Revisa este componente para cumplimiento WCAG AA y arregla los problemas"
Velocity benchmark:Benchmark de velocidad: A senior engineer without AI: 1 component/week (design + code + test + docs). With Claude Code: 1 component/day. 25 core components in 5 weeks instead of 25 weeks. This is the 5x leverage.Un ingeniero senior sin IA: 1 componente/semana. Con Claude Code: 1 componente/día. 25 componentes core en 5 semanas en lugar de 25. Este es el apalancamiento 5x.
First Real Screen → Test with SellersPrimera Pantalla Real → Test con Sellers
1 week · Full team1 semana · Equipo completoAssemble the Coach View (the 70/30 split screen) using the built components and tokens. Show it to 3 real sellers. At this point the brand is real — not a Figma mockup, not a code spec, but a running Electron application with real brand tokens, real components, and real data. Collect feedback. Iterate.Ensamblar el Coach View (pantalla 70/30) usando los componentes y tokens construidos. Mostrárselo a 3 sellers reales. En este punto la marca es real — no un mockup de Figma, no un code spec, sino una aplicación Electron corriendo con tokens de marca reales, componentes reales y datos reales.
Figma vs Code — When to Use EachFigma vs Código — Cuándo Usar Cada Uno
This is the most common source of wasted effort in early-stage product design. The answer depends on what you're deciding, not on preference.Esta es la fuente más común de esfuerzo desperdiciado en diseño de producto en etapas tempranas. La respuesta depende de qué estás decidiendo, no de preferencia.
| TaskTarea | Use Figma?¿Usar Figma? | WhyPor qué |
|---|---|---|
| Logo explorationExploración de logo | Yes — requiredSí — requerido | Bezier curves, vector editing, proportions — impossible to do well in codeCurvas bezier, edición vectorial — imposible hacerlo bien en código |
| Color palette validationValidación de paleta de color | Yes — fastSí — rápido | Seeing colors in context (on dark bg, next to text) is faster in Figma than spinning up codeVer colores en contexto es más rápido en Figma que arrancar el código |
| Typography testingTesting de tipografía | Yes — fastSí — rápido | Font pairing decisions are visual, not technical. Figma + Google Fonts is 10x faster than code for thisDecisiones de pares de fuentes son visuales. Figma + Google Fonts es 10x más rápido que código para esto |
| User flow diagramsDiagramas de flujo de usuario | OptionalOpcional | Can also use FigJam, Miro, or paper. The flow is the output, not the toolTambién se puede usar FigJam, Miro o papel. El flujo es el output, no la herramienta |
| Individual component designDiseño de componente individual | OccasionallyOcasionalmente | Only for complex components (confirmation dialog, onboarding flow). Simple components: just build in code with Claude CodeSolo para componentes complejos. Simples: construir directo en código con Claude Code |
| Component libraryLibrería de componentes | Yes — source of truthSí — fuente de verdad | Figma (#18 Design System) is the single source of truth following Atomic Design. Claude reads via Figma MCP and implements matching React components. No components created outside FigmaFigma (#18 Design System) es la fuente única de verdad siguiendo Atomic Design. Claude lee via Figma MCP e implementa componentes React. No se crean componentes fuera del Figma |
| Design tokensDesign tokens | No — live in tokens.jsonNo — viven en tokens.json | Figma Variables exist but are secondary. The tokens.json → CSS pipeline is the real systemFigma Variables existen pero son secundarias. El pipeline tokens.json → CSS es el sistema real |
| Full screen prototypesPrototipos de pantalla completa | No — build in ElectronNo — construir en Electron | A running Electron app with real data is a better prototype than any Figma mockup. With Claude Code, the delta in effort is smallUna app Electron corriendo con datos reales es mejor prototipo que cualquier mockup de Figma |
Time to World-Class Brand — Realistic EstimateTiempo para Marca de Clase Mundial — Estimado Realista
Phase 1Fase 1
2d
Brand workshopBrand workshop
Phase 2Fase 2
5d
Visual identityIdentidad visual
Phase 3Fase 3
2d
Tokens → codeTokens → código
Phase 4Fase 4
6w
Component libraryLibrería componentes
Phase 5Fase 5
1w
First real screenPrimera pantalla real
Total: ~8 weeks from zero to a brand that rivals Linear or Cursor. The bottleneck is Phase 2 (finding a designer) and Phase 4 (component build). Everything else is decisions + Claude Code automation.Total: ~8 semanas de cero a una marca que rivaliza con Linear o Cursor. El cuello de botella es la Fase 2 (encontrar diseñador) y la Fase 4 (construcción de componentes). Todo lo demás son decisiones + automatización de Claude Code.
References — Figma, OS Design Systems & Desktop Apps Referencias — Figma, Design Systems de SO y Apps Desktop
The authoritative sources every world-class desktop app is built on: Apple's Human Interface Guidelines, Microsoft Fluent Design, how the best companies use Figma, what Figma Community files to download today, and visual references of the exact apps Shopilot should emulate as a macOS Electron product. Las fuentes autoritativas sobre las que se construye toda app desktop de clase mundial: Apple Human Interface Guidelines, Microsoft Fluent Design, cómo las mejores empresas usan Figma, qué archivos de Figma Community descargar hoy, y referencias visuales de las apps exactas que Shopilot debe emular como producto Electron macOS.
Apple Human Interface Guidelines (HIG)
developer.apple.com/design/human-interface-guidelines · The bible for macOS app designLa biblia del diseño de apps macOS
Every app that feels "native" on macOS — Arc, Cursor, Notion, Linear — follows Apple's HIG. Not as rules, but as a foundation. Understanding HIG tells you why certain things feel right on Mac and wrong on Windows, and what Shopilot must do to feel like a first-class macOS citizen. Cada app que se siente "nativa" en macOS — Arc, Cursor, Notion, Linear — sigue el HIG de Apple. No como reglas, sino como base. Entender el HIG explica por qué ciertas cosas se sienten bien en Mac y mal en Windows.
6 Core HIG Principles — and what they mean for Shopilot6 Principios HIG — y qué significan para Shopilot
1 · Aesthetic Integrity
The app's visual appearance and behavior must be consistent with its purpose. A data tool (Shopilot) should look precise and professional — not playful. Applies to: spacing consistency, typography alignment, color restraint.La apariencia visual y comportamiento deben ser consistentes con el propósito. Una herramienta de datos (Shopilot) debe verse precisa y profesional. Aplica a: consistencia de espaciado, alineación tipográfica, restricción de color.
2 · Consistency
Use standard macOS controls and terminology where possible. Users already know what a sidebar, toolbar, and panel are on Mac. Don't reinvent them — use them. Shopilot's window chrome (title bar, traffic lights, resize handle) must behave as users expect.Usar controles y terminología estándar de macOS donde sea posible. Los usuarios ya saben qué es un sidebar, toolbar y panel en Mac. El chrome de ventana de Shopilot debe comportarse como esperan.
3 · Direct Manipulation
Users should feel they're directly controlling the content on screen. For Shopilot: clicking an ASIN row should immediately feel responsive. Dragging, hovering, and focusing must have immediate visual feedback (≤100ms).Los usuarios deben sentir que controlan directamente el contenido en pantalla. Para Shopilot: hacer clic en una fila ASIN debe sentirse inmediatamente responsivo. Hover y foco deben tener respuesta visual inmediata (≤100ms).
4 · Feedback
Every action must acknowledge the user. Shopilot specifics: button press = visual depress + sound (optional). Loading = progress indicator, not frozen UI. AI thinking = animated cursor ▊ or pulse ···. Error = banner with next action, not silent failure.Cada acción debe reconocer al usuario. Botón = depresión visual. Carga = indicador de progreso. IA pensando = cursor animado. Error = banner con siguiente acción.
5 · User Control
Users — not the app — initiate actions. The AI coach can suggest, but must not act without confirmation on irreversible actions. HIG says: "people should always be in control." This is the origin of Shopilot's reversibility system.Los usuarios — no la app — inician acciones. El coach AI puede sugerir, pero no debe actuar sin confirmación en acciones irreversibles. Esta es la base del sistema de reversibilidad de Shopilot.
6 · Metaphors
Use familiar real-world concepts. Shopilot uses the "coach" metaphor — a trusted advisor who sees the same screen you do and gives guidance. This is why the sidebar is positioned like a coach standing next to you: right side, always visible, never blocking the main view.Usar conceptos reales familiares. Shopilot usa la metáfora del "coach" — un asesor de confianza que ve la misma pantalla. Por eso el sidebar está a la derecha, siempre visible, sin bloquear la vista principal.
macOS Patterns that Shopilot must implement correctlyPatrones macOS que Shopilot debe implementar correctamente
| PatternPatrón | HIG specSpec HIG | Shopilot implementationImplementación Shopilot |
|---|---|---|
| Traffic lights | Red/Yellow/Green at 12px diameter, 8px gap, 20px from left | Frameless window + titleBarStyle:'hiddenInset' preserves native buttons |
| Sidebar | Min width 220px, vibrancy background, grouped sections with headers | Shopilot right sidebar 320px — deviates intentionally (coach, not nav) |
| Toolbar | Height 52px, icon + label, unified with title bar on macOS 11+ | Tab bar (marketplace switcher) sits at top of left pane, height 40px |
| Menu bar | Every Mac app has native menu bar: File, Edit, View, Window, Help | Electron: Menu.setApplicationMenu() — must exist, even if minimal |
| Keyboard shortcuts | Cmd+W close, Cmd+Q quit, Cmd+, preferences — always expected | Must register all standard Mac shortcuts + Shopilot custom (Cmd+K = chat) |
| System colors | Use NSColor system colors that adapt to dark/light automatically | In Electron: CSS env(--system-background-color) or manual token switch |
| Focus ring | Blue ring 3px at system accent color — do NOT remove, required for a11y | Override with brand accent color ring, same shape — never remove entirely |
Reference: What Arc Browser takes from HIGReferencia: Lo que Arc Browser toma del HIG
Arc uses native macOS vibrancy for its sidebar, native traffic lights at the exact HIG position, native context menus via NSMenu, native keyboard shortcut conventions, and the native font stack (SF Pro) for all system-level text. Where Arc deviates from HIG is intentional and branded: the tab bar is vertical instead of horizontal, the command bar replaces the URL bar, the sidebar IS the app chrome. Deviation from HIG is a product decision — but you must know the rules before you break them.Arc usa vibrancy nativa de macOS para su sidebar, traffic lights en la posición exacta del HIG, menús contextuales nativos, convenciones de teclado nativas, y SF Pro para todo el texto del sistema. Donde Arc se desvía del HIG es intencional y de marca: la barra de tabs es vertical, la barra de comandos reemplaza la URL. La desviación del HIG es una decisión de producto — pero debes conocer las reglas antes de romperlas.
Microsoft Fluent Design System 2
fluent2.microsoft.design · Windows 11 design languageLenguaje de diseño Windows 11
Shopilot targets macOS first, but Windows build comes in Sprint 11-12. Fluent Design 2 is the official design system for Windows 11 apps. Understanding it now prevents a costly redesign later — and it informs several patterns (Acrylic material, Mica background) that translate beautifully to dark Electron apps on both platforms. Shopilot apunta a macOS primero, pero el build de Windows viene en Sprint 11-12. Fluent Design 2 es el design system oficial para apps Windows 11. Entenderlo ahora previene un rediseño costoso después.
5 Fluent Design Principles5 Principios de Fluent Design
Light
Light as a design element — Reveal highlight: a subtle glow appears under the cursor on interactive elements. Creates depth without shadows. In Electron: CSS radial-gradient on mousemove.La luz como elemento de diseño — Reveal highlight: brillo sutil bajo el cursor en elementos interactivos. En Electron: CSS radial-gradient en mousemove.
Depth
Layers at different Z-levels with Acrylic (frosted glass) and Mica (wallpaper-blended background) materials. For Shopilot: the glass-card pattern directly adopts this — backdrop-filter: blur() is Electron's Acrylic.Capas en diferentes niveles Z con materiales Acrílico (cristal esmerilado) y Mica. Para Shopilot: el patrón glass-card adopta esto — backdrop-filter: blur() es el Acrílico de Electron.
Motion
Connected animations — elements travel between states instead of disappearing and reappearing. Fluent easing: cubic-bezier(0.1, 0.9, 0.2, 1). Used by VS Code, Microsoft Edge, Teams.Animaciones conectadas — los elementos viajan entre estados en lugar de desaparecer y reaparecer. Easing Fluent: cubic-bezier(0.1, 0.9, 0.2, 1).
Material
Acrylic: backdrop-filter: blur(30px) saturate(180%) — used for sidebars, flyouts, menus. Mica: wallpaper color extracted and used as tint in app chrome. Both create sense of app being part of the OS.Acrílico: backdrop-filter: blur(30px) saturate(180%) — para sidebars, flyouts, menús. Mica: color del fondo del escritorio extraído como tinte en el chrome de la app.
Scale
Design for multiple device types. In Shopilot's context: design for minimum 900×600px window, scale gracefully to 2560×1440 (UltraWide). Touch targets minimum 44×44px even on desktop (for touch-screen Windows laptops).Diseñar para múltiples tipos de dispositivos. Contexto Shopilot: mínimo 900×600px, escalar a 2560×1440. Touch targets mínimo 44×44px incluso en desktop.
Fluent Typography — Segoe UI VariableTipografía Fluent — Segoe UI Variable
Windows 11 uses Segoe UI Variable — a variable font that covers all weights and optical sizes. On Windows, Electron apps that use Inter or system-ui automatically map to Segoe UI Variable. No action needed for the font on Windows builds.Windows 11 usa Segoe UI Variable — fuente variable que cubre todos los pesos. En Windows, apps Electron que usan Inter o system-ui mapean automáticamente a Segoe UI Variable.
Fluent Type Ramp (Windows 11):Escala tipográfica Fluent (Windows 11):
- Caption · 12px · Regular
- Body · 14px · Regular
- Body Strong · 14px · Semibold
- Subtitle · 20px · Semibold
- Title · 28px · Semibold
- Title Large · 40px · Semibold
- Display · 68px · Semibold
Key difference vs Apple HIG:Diferencia clave vs Apple HIG:
Apple HIG uses 17pt as base body size (SF Pro at 17pt = Inter at ~14px). Fluent uses 14px body. On Windows, everything feels slightly larger. If you design for macOS at 13px body text, Windows will look right at 14px. Build token --body-size to switch per platform.Apple HIG usa 17pt como base (SF Pro 17pt = Inter ~14px). Fluent usa 14px body. En Windows, todo se ve ligeramente más grande. Construir el token --body-size para cambiar por plataforma.
How the Best Companies Use FigmaCómo Usan Figma las Mejores Empresas
figma.com · The industry standard for design — and how to use it efficientlyEl estándar de la industria para diseño — y cómo usarlo eficientemente
Figma is not a drawing tool — it's a design system management platform. Companies like Vercel, Linear, Airbnb, and Shopify use Figma as their source of truth for visual decisions, but NOT for everything. Understanding what they put in Figma vs what they build directly in code is what separates efficient teams from slow ones. Figma no es una herramienta de dibujo — es una plataforma de gestión de design systems. Empresas como Vercel, Linear, Airbnb y Shopify usan Figma como fuente de verdad para decisiones visuales, pero NO para todo.
The 5 ways top companies use FigmaLas 5 formas en que las mejores empresas usan Figma
Figma Variables = Design Tokens (the right way)Figma Variables = Design Tokens (la forma correcta)
Since Figma 2023, Variables replace Styles for colors, spacing, radii, and typography. Variables in Figma map 1:1 to CSS custom properties. The best companies (Vercel, Shopify, Atlassian) define their entire token system in Figma Variables, then export to JSON using the "Variables Import/Export" plugin (free). This JSON becomes the tokens.json that feeds Style Dictionary.Desde Figma 2023, Variables reemplaza Styles para colores, espaciado, radios y tipografía. Variables en Figma mapean 1:1 a propiedades CSS custom. Las mejores empresas definen su sistema de tokens en Figma Variables, luego exportan a JSON usando el plugin "Variables Import/Export". Este JSON se convierte en el tokens.json que alimenta Style Dictionary.
Figma Variable group → CSS output:Grupo de Variables Figma → output CSS:
color/brand/primary → --color-brand-primary: #F97316
spacing/4 → --spacing-4: 16px
radius/lg → --radius-lg: 8px
Auto Layout = Responsive Components that match CSS FlexboxAuto Layout = Componentes Responsivos que coinciden con CSS Flexbox
Figma's Auto Layout mirrors CSS Flexbox exactly. When a designer builds a button with Auto Layout (direction, gap, padding, alignment), it translates directly to a Tailwind class. This is how Linear, Vercel, and Shopify achieve zero friction between design and code: the designer thinks in flex terms, the developer writes flex terms.El Auto Layout de Figma refleja CSS Flexbox exactamente. Cuando un diseñador construye un botón con Auto Layout, se traduce directamente a una clase de Tailwind. Así Linear, Vercel y Shopify logran cero fricción entre diseño y código.
In Figma Auto Layout:En Figma Auto Layout:
Direction: Horizontal
Gap: 8px
Padding: 10px 16px
Align: Center
In Tailwind CSS:En Tailwind CSS:
flex
gap-2
px-4 py-2.5
items-center
Component Properties = Variant SystemComponent Properties = Sistema de Variantes
Top companies define every component with Properties (variant=primary/secondary/ghost, size=sm/md/lg, state=default/hover/disabled/loading). This creates a single source of truth for all component states. In Figma, you see all variants in one frame. In code, this maps to props. The designer and developer speak the same language.Las mejores empresas definen cada componente con Properties (variante=primary/secondary/ghost, tamaño=sm/md/lg, estado=default/hover/disabled/loading). Esto crea una fuente de verdad para todos los estados. El diseñador y el desarrollador hablan el mismo idioma.
Button component properties:Propiedades del componente Button:
variant: primary | secondary | ghost | danger | outline | link
size: sm | md | lg
state: default | hover | focus | disabled | loading
icon: none | left | right | only
Dev Mode = the handoff from designer to Claude CodeDev Mode = el handoff del diseñador a Claude Code
Figma Dev Mode (free for 1 viewer) lets developers inspect every design decision: exact pixel values, spacing, CSS properties, and exported assets. The workflow for Shopilot: designer finalizes a complex component in Figma → developer opens Dev Mode → copies the exact values into a prompt for Claude Code: "Build this component using these exact specs from Figma Dev Mode: [paste]." Claude Code generates the TypeScript in seconds.Figma Dev Mode permite a los desarrolladores inspeccionar cada decisión de diseño: valores exactos en píxeles, espaciado, propiedades CSS, y assets exportados. El flujo para Shopilot: diseñador finaliza componente → desarrollador abre Dev Mode → pega valores exactos en prompt para Claude Code.
The Claude Code + Figma prompt template:Template de prompt Claude Code + Figma:
"Build a React TypeScript component for [ComponentName]. Read the Figma component via Figma MCP for exact specs (dimensions, colors, spacing, states, variants). Use design tokens from globals.css. Include all states defined in the Figma component.""Construye un componente React TypeScript para [NombreComponente]. Lee el componente en Figma via Figma MCP para las specs exactas (dimensiones, colores, espaciado, estados, variantes). Usa los design tokens de globals.css. Incluye todos los estados definidos en el componente de Figma."
Figma as the single source of truth for all visual componentsFigma como fuente única de verdad para todos los componentes visuales
The Figma file (#18 Design System, core-product-design-system) follows Atomic Design (atoms, molecules, organisms, templates, pages) and is the single source of truth. Claude reads Figma via Figma MCP and implements matching React components in #1 Native Shell. No React components are created outside of what is defined in the Figma. The external design team maintains Figma; the engineering team consumes it.El archivo Figma (#18 Design System, core-product-design-system) sigue Atomic Design (átomos, moléculas, organismos, plantillas, páginas) y es la fuente única de verdad. Claude lee Figma via Figma MCP e implementa componentes React en #1 Native Shell. No se crean componentes React fuera de lo definido en el Figma. El equipo externo de diseño mantiene Figma; el equipo de ingeniería lo consume.
Figma Community Files — Download These TodayArchivos de Figma Community — Descargar Hoy
These are official or highly-used public Figma files from the reference companies. Duplicating them to your Figma account is free. Study how they structure components, Variables, and design systems — this is how the best companies work.Estos son archivos públicos de Figma oficiales o muy utilizados de las empresas de referencia. Duplicarlos a tu cuenta de Figma es gratuito. Estudia cómo estructuran componentes, Variables y design systems.
| FileArchivo | PublisherEditor | What to studyQué estudiar | Search in CommunityBuscar en Community |
|---|---|---|---|
| Apple Design Resources | Apple (official) | macOS UI components, SF Symbols, HIG spacing | "Apple Design Resources macOS" |
| Microsoft Fluent 2 | Microsoft (official) | Fluent component library, Acrylic, tokens system | "Microsoft Fluent 2 Web" |
| Vercel Design System | Vercel (official) | Dark-first tokens, Geist font usage, Storybook link | "Vercel Design" |
| Shadcn/ui Figma Kit | Community (official-ish) | How shadcn components map to Figma — the bridge | "shadcn ui" |
| Tailwind CSS UI Kit | Community | Tailwind spacing / color scales in Figma Variables | "Tailwind CSS UI Kit" |
| Linear App Design | Community recreation | Dark sidebar, speed-first interactions, kbd badges | "Linear design system" |
| Electron UI Patterns | Community | Title bar, tray, window chrome patterns for Electron | "Electron desktop UI" |
| Figma Variables Starter | Figma (official) | How to structure Variables for a design system | "Variables starter kit Figma" |
How to use these files:Cómo usar estos archivos: Don't copy components. Study structure. Look at: how they name Variables (tokens), how they organize component pages, how they document states, what their spacing system looks like. These are the patterns to replicate in Shopilot's Figma file when the brand is decided.No copiar componentes. Estudiar la estructura. Ver: cómo nombran Variables (tokens), cómo organizan páginas de componentes, cómo documentan estados, cómo se ve su sistema de espaciado. Estos son los patrones a replicar en el archivo Figma de Shopilot cuando la marca esté decidida.
Desktop App Visual References — What to EmulateReferencias Visuales de Apps Desktop — Qué Emular
These are the specific macOS Electron apps that Shopilot should study in detail as running software — not in Figma, but as installed apps. Each has a specific pattern Shopilot must adopt or consciously decide to deviate from.Estas son las apps Electron macOS específicas que Shopilot debe estudiar en detalle como software corriendo — no en Figma, sino como apps instaladas. Cada una tiene un patrón específico que Shopilot debe adoptar o decidir conscientemente desviarse.
Cursor — cursor.sh
MOST RELEVANT — study firstMÁS RELEVANTE — estudiar primeroThe closest structural reference to Shopilot. Both are: Electron, AI-native, dark-first, split-pane (editor left + chat right). Download and install. Study: how the title bar works, how the chat panel opens/closes, how the AI response streams, how tool calls (terminal runs) are displayed, how the status bar at the bottom shows AI state. This is the gold standard for Shopilot's interaction model.La referencia estructural más cercana a Shopilot. Ambos son: Electron, AI-native, dark-first, split-pane. Descargar e instalar. Estudiar: cómo funciona la title bar, cómo abre/cierra el panel de chat, cómo hace streaming la respuesta AI, cómo se muestran las tool calls, cómo muestra el estado AI en el status bar. Este es el estándar de oro para el modelo de interacción de Shopilot.
Adopt from Cursor:Adoptar de Cursor:
- Status bar 24px bottom
- Streaming word-by-word
- Tool call accordion
- Thinking indicator
Adapt for Shopilot:Adaptar para Shopilot:
- Split: code→marketplace
- Tabs: files→marketplaces
- Context: project→ASIN
Don't copy:No copiar:
- Code editor UI
- File tree sidebar
- Diff view
Arc Browser — arc.net
The reference for rethinking desktop chrome. Arc proves that you can break HIG conventions (vertical tabs instead of horizontal, sidebar IS the app, no visible URL bar) and still feel native and premium. Study specifically: how Arc handles the title bar with traffic lights + drag region + custom controls in the same 40px zone. This is exactly what Shopilot's top bar needs to solve.La referencia para repensar el chrome de desktop. Arc prueba que puedes romper las convenciones HIG (tabs verticales, sidebar ES la app) y aún sentirte nativo y premium. Estudiar específicamente: cómo Arc maneja la title bar con traffic lights + drag region + controles custom en la misma zona de 40px. Esto es exactamente lo que necesita resolver el top bar de Shopilot.
Key lesson:Lección clave: Arc's sidebar gradient background (multi-color per space) is possible in Electron via CSS linear-gradient on the sidebar container. The space color customization is what makes Arc feel personal — a pattern Shopilot could adopt for marketplace color coding (Amazon=orange, MeLi=yellow, Shopify=green).El gradiente del sidebar de Arc es posible en Electron via CSS. La personalización de color por espacio hace que Arc se sienta personal — un patrón que Shopilot podría adoptar para codificación de colores por marketplace.
Linear — linear.app
The reference for performance as a design value. Every interaction in Linear is under 100ms. Study: the keyboard shortcut system (every action has a shortcut visible in the UI), the command palette (Cmd+K), the sidebar collapse behavior, and most importantly — how Linear handles empty states (no data = inspirational, not depressing). Also study: the data tables. Linear's issue list is the closest reference to Shopilot's ASIN product list.La referencia para el rendimiento como valor de diseño. Cada interacción en Linear es menor de 100ms. Estudiar: el sistema de atajos de teclado, la paleta de comandos (Cmd+K), el comportamiento de colapso del sidebar, los estados vacíos, y las tablas de datos — la lista de issues de Linear es la referencia más cercana a la lista de productos ASIN de Shopilot.
Notion — notion.so
The reference for Electron done right at scale (30M+ users). Study: how Notion handles window resizing (the sidebar collapses progressively), how they manage a complex sidebar with nested items without it feeling cluttered, and their hover-reveal interactions (properties appear on hover, not always). Also: Notion's dark mode implementation is one of the cleanest in any Electron app — study how they handle the transition between surface layers.La referencia para Electron bien hecho a escala (30M+ usuarios). Estudiar: cómo maneja el redimensionado de ventana (el sidebar colapsa progresivamente), el sidebar con items anidados sin sentirse abarrotado, interacciones hover-reveal, y la implementación del modo oscuro — una de las más limpias en cualquier app Electron.
VS Code — code.visualstudio.com
THE Electron referenceLA referencia ElectronVS Code is the most used Electron app in the world with 30M+ daily active users. It is the definitive reference for what is possible technically and visually in Electron. Study: the status bar (bottom, 22px, same as Shopilot's 24px), the split pane system, the extension panel (same concept as Shopilot's sidebar), the command palette, and the theming system. VS Code themes are CSS token swaps — identical to what Shopilot's design token system will do. The VS Code GitHub repo is public — the theming architecture is directly applicable.VS Code es la app Electron más usada del mundo con 30M+ usuarios activos diarios. Es la referencia definitiva para lo que es posible en Electron. Estudiar: el status bar (inferior, 22px, similar a los 24px de Shopilot), el sistema de split pane, el panel de extensiones, la paleta de comandos, y el sistema de theming. Los temas de VS Code son intercambios de tokens CSS — idéntico a lo que hará el sistema de tokens de diseño de Shopilot.
Action: Install and study these 5 apps this weekAcción: Instalar y estudiar estas 5 apps esta semana
Cursor
cursor.sh
Arc
arc.net
Linear
linear.app
Notion
notion.so
VS Code
code.visualstudio.com
For each: spend 30 min using it normally, then 30 min inspecting specific patterns (title bar, sidebars, status bar, hover states, loading states, dark mode). Document what you want to adopt, adapt, or avoid. This is the most efficient design research you can do before the brand workshop.Para cada una: 30 min usándola normalmente, luego 30 min inspeccionando patrones específicos (title bar, sidebars, status bar, hover states, loading states, dark mode). Documentar qué adoptar, adaptar o evitar. Esta es la investigación de diseño más eficiente que se puede hacer antes del brand workshop.
Essential Figma Plugins for this WorkflowPlugins Esenciales de Figma para este Workflow
| PluginPlugin | What it doesQué hace | PhaseFase | CostCosto |
|---|---|---|---|
| Variables Import/Export | Exports Figma Variables to JSON → feeds tokens.json | Phase 2→3 bridge | Free |
| Tokens Studio | Full design token management in Figma (W3C DTCG format) | Phase 2→3 bridge | $20/mo |
| Contrast | WCAG AA/AAA contrast checker on any color pair in canvas | Phase 2 · color decisions | Free |
| Able | Accessibility checker — contrast, focus order, WCAG annotations | Phase 4 · component review | Free |
| Iconify | All Lucide icons available in Figma — same library as the code | Phase 2+ ongoing | Free |
| Figma to Code | Exports Figma frames as HTML/Tailwind/React snippets | Phase 4 · component start | Free |
| Color Blind | Simulates 8 types of color blindness on any frame | Phase 2 · color decisions | Free |
Full-Stack Design IntegrationIntegración Full-Stack de Diseño
The missing 30%: exact technology stacks, how everything wires together, Claude API integration patterns with real code, what's still undocumented, and 2026 AI-native design methodology. Actionable — not theoretical.El 30% que faltaba: stacks tecnológicos exactos, cómo todo se conecta, patrones de integración Claude API con código real, qué aún está sin documentar, y metodología de diseño AI-native 2026. Accionable — no teórico.
01 · The 6-Layer Stack — How Everything Connects01 · El Stack de 6 Capas — Cómo Todo Se Conecta
LAYER 6 · Quality Gates
Figma ↔ Code consistency review · axe-core a11y · Playwright e2e · PR blocked if component deviates from Figma
LAYER 5 · Claude AI Integration
Anthropic SDK v0.30+ · Messages streaming API · Tool use (36 tools) · Prompt caching · Multi-LLM router
LAYER 4 · Electron App Shell
Electron 33+ · WebContentsView (70%) · React 19 sidebar (30%) · IPC contextBridge · Auto-updater
LAYER 3 · React Component Library
shadcn/ui (Radix primitives) · Figma Atomic Design (#18) · Figma MCP · Tailwind 4 · Framer Motion 11
LAYER 2 · Design Token Pipeline
tokens.json (W3C DTCG) → Style Dictionary 4 → CSS custom properties → tailwind.config.ts → CSS vars
LAYER 1 · Design Spec (This File)
shopilot_v6.html · Single source of truth · Pablo approves · Sergio implements · Mateo owns tokens
Complete Package Manifest
| Package | Version | Purpose | Layer | Owner |
|---|---|---|---|---|
| @anthropic-ai/sdk | ^0.30 | Claude API: streaming, tools, caching | 5 | Andrés |
| electron | ^33 | Desktop shell, WebContentsView, IPC | 4 | Mateo |
| react + react-dom | ^19 | UI renderer, concurrent features | 3 | Sergio |
| tailwindcss | ^4 | Utility CSS, token consumption | 3 | Sergio |
| @radix-ui/react-* | latest | Accessible primitives (via shadcn) | 3 | Sergio |
| shadcn/ui | CLI 2.x | Component generator on Radix + Tailwind | 3 | Sergio |
| framer-motion | ^11 | Animations: word-stream, slide-up, spring | 3 | Sergio |
| lucide-react | ^0.43 | Icon library — 1.5px stroke, currentColor | 3 | Sergio |
| recharts | ^2 | Charts only (BSR sparkline, KPI gauge) | 3 | Andrés |
| style-dictionary | ^4 | Token transform: JSON → CSS → Tailwind | 2 | Mateo |
| @axe-core/react | ^4 | Accessibility audit (WCAG AA) | 6 | Sergio |
| zod | ^3 | Tool input/output validation schema | 5 | Andrés |
| zustand | ^5 | Agent state machine store | 3-5 | Sergio |
02 · Design Token Pipeline — tokens.json → Production CSS02 · Pipeline de Tokens — tokens.json → CSS Producción
tokens.json
W3C DTCG format · source of truth
style-dictionary build
→
design-tokens.css + tailwind-tokens.ts
auto-generated, never edit manually
▶ tokens.json — Full Example (W3C DTCG format)▶ tokens.json — Ejemplo Completo (formato W3C DTCG)
{
"$schema": "https://design-tokens.org/schema.json",
"sp": {
"color": {
"bg": {
"base": { "$value": "#0A0A0F", "$type": "color", "$description": "App background — near-black warm" },
"01": { "$value": "#0F0F18", "$type": "color" },
"02": { "$value": "#14141F", "$type": "color" },
"03": { "$value": "#1A1A28", "$type": "color" }
},
"orange": {
"50": { "$value": "rgba(249,115,22,0.08)", "$type": "color" },
"500": { "$value": "#F97316", "$type": "color", "$description": "CANDIDATE — replace with decided brand color" },
"600": { "$value": "#EA6005", "$type": "color" }
},
"fg": {
"100": { "$value": "#F4F4F6", "$type": "color", "$description": "Primary text" },
"80": { "$value": "#D4D4E4", "$type": "color" },
"60": { "$value": "#A4A4B8", "$type": "color" },
"40": { "$value": "#7A7A90", "$type": "color" }
},
"success": { "$value": "#22C55E", "$type": "color" },
"warning": { "$value": "#F59E0B", "$type": "color" },
"error": { "$value": "#EF4444", "$type": "color" },
"info": { "$value": "#3B82F6", "$type": "color" }
},
"space": {
"g": { "$value": "10px", "$type": "dimension", "$description": "base grid unit" },
"v": { "$value": "22px", "$type": "dimension", "$description": "vertical rhythm" },
"4": { "$value": "4px", "$type": "dimension" },
"8": { "$value": "8px", "$type": "dimension" },
"12": { "$value": "12px", "$type": "dimension" },
"16": { "$value": "16px", "$type": "dimension" },
"24": { "$value": "24px", "$type": "dimension" },
"32": { "$value": "32px", "$type": "dimension" }
},
"radius": {
"sm": { "$value": "4px", "$type": "dimension" },
"md": { "$value": "6px", "$type": "dimension" },
"lg": { "$value": "8px", "$type": "dimension" },
"xl": { "$value": "12px", "$type": "dimension" },
"2xl": { "$value": "16px", "$type": "dimension" },
"full": { "$value": "9999px", "$type": "dimension" }
},
"duration": {
"instant": { "$value": "80ms", "$type": "duration" },
"fast": { "$value": "150ms", "$type": "duration" },
"normal": { "$value": "200ms", "$type": "duration" },
"slow": { "$value": "350ms", "$type": "duration" },
"scenic": { "$value": "500ms", "$type": "duration" }
}
}
}
▶ style-dictionary.config.mjs — Build Config▶ style-dictionary.config.mjs — Configuración de Build
// style-dictionary.config.mjs
import StyleDictionary from 'style-dictionary';
export default {
source: ['tokens.json'],
platforms: {
// → CSS custom properties (--sp-color-orange-500)
css: {
transformGroup: 'css',
files: [{
destination: 'src/styles/design-tokens.css',
format: 'css/variables',
options: { selector: ':root', outputReferences: true }
}]
},
// → Tailwind config (for extend.colors, extend.spacing)
tailwind: {
transformGroup: 'js',
files: [{
destination: 'src/styles/tailwind-tokens.ts',
format: 'javascript/esm'
}]
}
}
}
// Run: npx style-dictionary build
// Output:
// src/styles/design-tokens.css ← import in main.tsx
// src/styles/tailwind-tokens.ts ← import in tailwind.config.ts
▶ tailwind.config.ts — Token Consumption▶ tailwind.config.ts — Consumo de Tokens
// tailwind.config.ts
import type { Config } from 'tailwindcss'
const config: Config = {
content: ['./src/**/*.{ts,tsx}'],
theme: {
extend: {
colors: {
// Reference CSS custom properties so Tailwind + Style Dictionary stay in sync
'sp-bg-base': 'var(--sp-color-bg-base)',
'sp-orange': 'var(--sp-color-orange-500)',
'sp-fg-100': 'var(--sp-color-fg-100)',
'sp-success': 'var(--sp-color-success)',
'sp-warning': 'var(--sp-color-warning)',
'sp-error': 'var(--sp-color-error)',
},
spacing: {
'sp-g': 'var(--sp-space-g)', // 10px
'sp-v': 'var(--sp-space-v)', // 22px
},
borderRadius: {
'sp-sm': 'var(--sp-radius-sm)',
'sp-lg': 'var(--sp-radius-lg)',
'sp-xl': 'var(--sp-radius-xl)',
},
fontFamily: {
'display': ['Inter Display', 'Inter', 'sans-serif'],
'mono': ['JetBrains Mono', 'Fira Code', 'monospace'],
},
transitionDuration: {
'sp-fast': 'var(--sp-duration-fast)',
'sp-normal': 'var(--sp-duration-normal)',
'sp-slow': 'var(--sp-duration-slow)',
}
}
},
plugins: []
}
export default config
03 · shadcn/ui Integration with Shopilot Tokens03 · Integración shadcn/ui con Tokens Shopilot
shadcn/ui is NOT a component library — it's a code generator. Components are copied into your repo and 100% customizable. Use it for accessibility-correct primitives, then override with Shopilot tokens.shadcn/ui NO es una librería — es un generador de código. Los componentes se copian a tu repo y son 100% personalizables. Úsalo para primitivas accesibles, luego sobrescribe con los tokens Shopilot.
▶ Setup Commands + globals.css Override▶ Comandos de Setup + Override globals.css
# 1. Init shadcn (say YES to CSS variables, pick Neutral base)
npx shadcn@latest init
# When prompted:
# ✓ Style: Default
# ✓ Base color: Neutral (we override below)
# ✓ CSS variables: YES (critical — this is how tokens flow in)
# ✓ src directory: YES
# 2. Add the components Shopilot needs (never add all at once)
npx shadcn@latest add button
npx shadcn@latest add dialog
npx shadcn@latest add dropdown-menu
npx shadcn@latest add tooltip
npx shadcn@latest add select
npx shadcn@latest add scroll-area
npx shadcn@latest add collapsible # ← ToolAccordion base
npx shadcn@latest add badge
npx shadcn@latest add separator
npx shadcn@latest add progress # ← ContextWindowBar
# 3. Override src/app/globals.css with Shopilot tokens:
@import 'design-tokens.css'; /* Style Dictionary output */
@layer base {
:root {
/* Map shadcn vars → Shopilot tokens */
--background: 240 6% 7%; /* #0A0A0F */
--foreground: 240 6% 96%; /* #F4F4F6 */
--card: 240 6% 10%; /* #14141F */
--card-foreground: 240 6% 87%; /* #D4D4E4 */
--popover: 240 6% 10%;
--popover-foreground: 240 6% 96%;
--primary: 25 95% 53%; /* CANDIDATE: #F97316 orange — replace once brand color decided */
--primary-foreground: 0 0% 100%;
--secondary: 240 4% 16%; /* #28283C */
--secondary-foreground: 240 6% 87%;
--muted: 240 4% 16%;
--muted-foreground: 240 6% 47%; /* #7A7A90 */
--accent: 25 95% 53%; /* orange accent */
--accent-foreground: 0 0% 100%;
--destructive: 0 84% 60%; /* #EF4444 */
--border: 240 6% 20%; /* rgba(255,255,255,.06) approx */
--input: 240 6% 16%;
--ring: 25 95% 53%; /* orange focus ring */
--radius: 0.5rem; /* 8px = --sp-radius-lg */
}
}
# Result: shadcn components automatically use Shopilot colors.
# Edit src/components/ui/button.tsx to change size tokens to sp-* vars.
Which shadcn components to use vs build customCuáles usar de shadcn vs construir custom
| Component | Source | Why |
|---|---|---|
| Button (6 variants) | shadcn base → customize | Radix provides correct focus/disabled states; we override styles |
| Dialog / Confirmation Card | shadcn Dialog → customize | Radix handles focus trap + aria-modal correctly; style from scratch |
| Tooltip | shadcn Tooltip → light override | Positioning engine is complex; only needs color/font token override |
| Select / Dropdown | shadcn → heavy customize | Radix handles keyboard nav; we rebuild visual completely |
| Tool Accordion | BUILD CUSTOM | Streaming state machine, badge states, JSON viewer — too specific |
| ReAct Stream | BUILD CUSTOM | Word-by-word animation, thinking pulse — unique to Shopilot |
| KPI Card | BUILD CUSTOM | JetBrains Mono + delta badge + sparkline — fully custom |
| Context Window Bar | shadcn Progress → customize | Stacked segments on top of Progress primitive |
| Data Table | shadcn Table + TanStack Table | TanStack handles sort/filter; shadcn provides base HTML table |
| Proactive Suggestion Card | BUILD CUSTOM | Animated slide-up, dismiss swipe, max-2-simultaneous logic |
| Date Picker | react-day-picker (NEVER BUILD) | Calendar UI is complex; use library, override tokens only |
| Charts (sparkline, gauge) | recharts (NEVER BUILD) | Math-heavy; only override colors and font |
04 · Claude API Streaming Integration — Real Implementation04 · Integración Claude API Streaming — Implementación Real
The complete chain from user input → Claude API → word-by-word UI animation → tool execution display. Every piece has a specific design pattern.La cadena completa desde input del usuario → Claude API → animación palabra-a-palabra → display de tool execution. Cada pieza tiene un patrón de diseño específico.
Agent State Machine
→
thinking ···
CSS: opacity 0.4→1→0.4, 1.2s infinite · NO elapsed time shown · Status bar: animated dot
streaming ▊
Each word: fadeIn 80ms ease-out · Cursor: blinking 0.6s · NO skeleton, NO spinner
awaiting_confirm
Confirmation card slide-up 250ms spring · Input disabled · Backdrop dims 20%
▶ useStream.ts — Complete React Hook Implementation▶ useStream.ts — Implementación Completa del React Hook
// src/hooks/useStream.ts
import { useState, useCallback, useRef } from 'react';
import Anthropic from '@anthropic-ai/sdk';
import { shopilotTools } from '@/tools/definitions';
import { useAgentStore } from '@/stores/agentStore';
type AgentState =
| 'idle' | 'thinking' | 'streaming'
| 'tool_running' | 'awaiting_confirm' | 'done' | 'error';
interface StreamMessage {
role: 'user' | 'assistant';
content: string;
}
export function useStream() {
const [agentState, setAgentState] = useState<AgentState>('idle');
const [words, setWords] = useState<string[]>([]);
const [currentToolCall, setCurrentToolCall] = useState<string | null>(null);
const abortRef = useRef<AbortController | null>(null);
const { addTool, updateTool } = useAgentStore();
const stream = useCallback(async (messages: StreamMessage[]) => {
abortRef.current = new AbortController();
setWords([]);
setAgentState('thinking');
// NOTE: In Electron, Anthropic SDK runs in main process.
// Renderer sends via IPC → main runs SDK → streams back via IPC.
// This hook shows the renderer-side pattern.
try {
const client = new Anthropic(); // API key from env via contextBridge
const stream = await client.messages.stream({
model: 'claude-opus-4-6',
max_tokens: 8192,
system: SHOPILOT_SYSTEM_PROMPT,
messages,
tools: shopilotTools,
// Prompt caching — reduces cost 60-80% on repeated context:
betas: ['prompt-caching-2024-07-31'],
});
for await (const event of stream) {
switch (event.type) {
case 'content_block_start':
if (event.content_block.type === 'text') {
setAgentState('streaming');
}
if (event.content_block.type === 'tool_use') {
setAgentState('tool_running');
const toolId = event.content_block.id;
const toolName = event.content_block.name;
setCurrentToolCall(toolName);
addTool({ id: toolId, name: toolName, state: 'running', startMs: Date.now() });
}
break;
case 'content_block_delta':
if (event.delta.type === 'text_delta') {
// Word-by-word: split on spaces, animate each word
const newWords = event.delta.text.split(/(?<=\s)/);
setWords(prev => [...prev, ...newWords]);
}
break;
case 'content_block_stop':
setCurrentToolCall(null);
break;
case 'message_stop':
setAgentState('done');
break;
}
}
} catch (err) {
if ((err as Error).name !== 'AbortError') {
setAgentState('error');
}
}
}, [addTool]);
const abort = useCallback(() => {
abortRef.current?.abort();
setAgentState('idle');
setWords([]);
}, []);
return { agentState, words, currentToolCall, stream, abort };
}
▶ StreamingText.tsx — Word-by-Word Animation Component▶ StreamingText.tsx — Componente de Animación Palabra a Palabra
// src/components/StreamingText.tsx
import { motion, AnimatePresence } from 'framer-motion';
interface StreamingTextProps {
words: string[];
isStreaming: boolean;
}
// Design rule: each word fades in at 80ms.
// Cursor blinks at 0.6s cycle when streaming.
// No skeleton, no placeholder, no loading bar.
export function StreamingText({ words, isStreaming }: StreamingTextProps) {
return (
<div className="text-sp-fg-100 text-sm leading-relaxed">
{words.map((word, i) => (
<motion.span
key={i}
initial={{ opacity: 0 }}
animate={{ opacity: 1 }}
transition={{ duration: 0.08, ease: 'easeOut' }} // 80ms per word
>
{word}
</motion.span>
))}
{/* Blinking cursor — only while streaming */}
<AnimatePresence>
{isStreaming && (
<motion.span
initial={{ opacity: 1 }}
animate={{ opacity: [1, 0, 1] }}
transition={{ duration: 0.6, repeat: Infinity, ease: 'linear' }}
className="inline-block ml-0.5 font-mono text-sp-orange"
style={{ fontFamily: 'JetBrains Mono' }}
>
▊
</motion.span>
)}
</AnimatePresence>
</div>
);
}
// ThinkingPulse — shown when agent is thinking (no tokens yet)
export function ThinkingPulse() {
return (
<motion.span
animate={{ opacity: [0.4, 1, 0.4] }}
transition={{ duration: 1.2, repeat: Infinity, ease: 'easeInOut' }}
className="text-sp-fg-40 font-mono text-sm"
>
···
</motion.span>
);
}
★ Prompt Caching — 60-80% Cost Reduction★ Prompt Caching — Reducción de Costo 60-80%
Mark static parts of context with cache_control: {type: 'ephemeral'} — system prompt + marketplace context + seller profile. TTL: 5 minutes. Every subsequent request in a session reuses cached tokens. At 1,000 sellers × 50 requests/day = $4,800/mo → $960/mo with caching.Marca las partes estáticas del contexto con cache_control: {type: 'ephemeral'} — system prompt + contexto marketplace + perfil del vendedor. TTL: 5 minutos. Cada request subsiguiente en sesión reutiliza tokens cacheados. A 1,000 vendedores × 50 requests/día = $4,800/mes → $960/mes con caching.
05 · Tool Call UI — Visual Patterns for 36 Tools05 · UI de Tool Calls — Patrones Visuales para 36 Tools
This was the biggest gap identified in the audit: the spec described the tool accordion but never showed the complete visual spec or component code. Fixed here.Este era el mayor gap identificado en el audit: el spec describía el tool accordion pero nunca mostraba el spec visual completo ni el código del componente. Corregido aquí.
Live Tool Accordion States
analyze_buy_box ✓ 847ms
Input
{ "asin": "B08XYZABC",
"marketplace": "amazon_mx" }Output
{ "buybox_winner": "us",
"our_share": 0.78,
"competitors": 3 }▶ ToolAccordion.tsx — Complete Component▶ ToolAccordion.tsx — Componente Completo
// src/components/ToolAccordion.tsx
import { motion } from 'framer-motion';
import { Check, X, AlertTriangle, Loader2 } from 'lucide-react';
type ToolState = 'queued' | 'running' | 'success' | 'error' | 'awaiting_confirm';
type RiskLevel = 'read_only' | 'reversible' | 'irreversible';
interface ToolAccordionProps {
id: string;
name: string;
state: ToolState;
riskLevel: RiskLevel;
durationMs?: number;
input?: Record<string, unknown>;
output?: Record<string, unknown>;
errorMessage?: string;
onConfirm?: () => void;
onCancel?: () => void;
}
const stateConfig = {
queued: { icon: null, color: '#7A7A90', bg: 'rgba(122,122,144,0.06)', border: 'rgba(122,122,144,0.2)' },
running: { icon: 'spin', color: '#3B82F6', bg: 'rgba(59,130,246,0.05)', border: 'rgba(59,130,246,0.2)' },
success: { icon: 'check', color: '#22C55E', bg: 'rgba(34,197,94,0.05)', border: 'rgba(34,197,94,0.2)' },
error: { icon: 'x', color: '#EF4444', bg: 'rgba(239,68,68,0.05)', border: 'rgba(239,68,68,0.2)' },
awaiting_confirm: { icon: 'warn', color: '#F59E0B', bg: 'rgba(245,158,11,0.05)', border: 'rgba(245,158,11,0.25)' },
};
export function ToolAccordion({ id, name, state, riskLevel, durationMs, input, output, errorMessage, onConfirm, onCancel }: ToolAccordionProps) {
const cfg = stateConfig[state];
const isDestructive = riskLevel === 'irreversible';
return (
<motion.div
layout
initial={{ opacity: 0, y: 4 }}
animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.2, ease: [0.16, 1, 0.3, 1] }}
style={{ background: cfg.bg, border: `1px solid ${cfg.border}`, borderRadius: 10 }}
>
<details>
<summary style={{ display: 'flex', alignItems: 'center', gap: 10, padding: '10px 16px', cursor: 'pointer', listStyle: 'none' }}>
{/* State icon */}
{state === 'running' && <Loader2 size={14} color={cfg.color} className="animate-spin" />}
{state === 'success' && <Check size={14} color={cfg.color} strokeWidth={2.5} />}
{state === 'error' && <X size={14} color={cfg.color} strokeWidth={2.5} />}
{state === 'awaiting_confirm' && <AlertTriangle size={14} color={cfg.color} />}
<span style={{ fontSize: 12, fontWeight: 500, color: '#D4D4E4', flex: 1 }}>{name}</span>
{/* Right badges */}
{isDestructive && (
<span style={{ fontSize: 9, fontWeight: 700, color: '#EF4444', textTransform: 'uppercase', letterSpacing: '0.1em' }}>
IRREVERSIBLE
</span>
)}
{state === 'success' && durationMs && (
<span style={{ fontSize: 10, fontFamily: 'JetBrains Mono', color: cfg.color }}>
✓ {durationMs}ms
</span>
)}
{state === 'error' && (
<span style={{ fontSize: 10, fontFamily: 'JetBrains Mono', color: cfg.color }}>
✗ Error
</span>
)}
</summary>
{/* Expanded content */}
<div style={{ padding: '0 16px 12px', borderTop: '1px solid rgba(255,255,255,0.05)' }}>
{/* Confirmation card for irreversible actions */}
{state === 'awaiting_confirm' && (
<ConfirmationCard input={input} riskLevel={riskLevel} onConfirm={onConfirm} onCancel={onCancel} />
)}
{/* JSON viewer for success/error */}
{(state === 'success' || state === 'error') && (
<JsonViewer input={input} output={output} error={errorMessage} />
)}
</div>
</details>
</motion.div>
);
}
06 · Previously Undocumented Patterns — Now Complete06 · Patrones Previamente Indocumentados — Ahora Completos
Empty States — 8 Variants
No ASINs Yet
First-run. CTA: "Add your first product"
No Search Results
Show query, suggest correction
All Caught Up
No pending actions. Positive reinforcement.
Sync Pending
Data loading from marketplace. Progress bar.
Not Connected
OAuth not done. CTA: "Connect marketplace"
No History
Audit log empty. "Actions will appear here"
Credits Zero
Agent paused. Upgrade CTA dominant.
No Reports
Pro feature gate. "Available in Pro plan"
Empty State Rules:
- ① Icon: 32px, colored by context (orange=action, blue=info, green=success, red=error)
- ② Title: max 4 words, sentence case, no period
- ③ Description: 1 line, explains why + what to do next
- ④ CTA: only if there's a direct action. Never show CTA on "All Caught Up"
- ⑤ Never show empty state while loading — show progress instead
Error State Taxonomy — 3 Categories
API timeout, validation error, missing field. Amber border + icon. Show specific message + retry button. Auto-retry after 3s with countdown.Timeout API, error de validación, campo faltante. Borde ámbar + ícono. Mensaje específico + botón retry. Auto-retry después de 3s con countdown.
Auth revoked, account suspended, critical DB error. Red banner. Explain what happened, what user must do. No auto-retry. Support link if relevant.Auth revocado, cuenta suspendida, error crítico de DB. Banner rojo. Explica qué pasó, qué debe hacer el usuario. Sin auto-retry. Link de soporte si es relevante.
Rate limit, credit exhausted, feature not in plan. Blue info banner. Calm tone. Clear path forward (upgrade, wait, etc). Agent pauses gracefully.Rate limit, créditos agotados, feature no en el plan. Banner azul informativo. Tono calmado. Camino claro hacia adelante (upgrade, esperar, etc). El agente pausa graciosamente.
Accessibility — WCAG AA Contrast Ratios
| Text | Background | Ratio | WCAG AA | Use |
|---|---|---|---|---|
| #F4F4F6 | #0A0A0F | 15.8:1 | PASS AAA | Primary text on bg |
| #A4A4B8 | #0A0A0F | 7.1:1 | PASS AA | Secondary text |
| #F97316 | #0A0A0F | 5.8:1 | PASS AA | Orange on bg |
| #FFFFFF | #F97316 | 3.2:1 | PASS (large only) | White on orange btn |
| #7A7A90 | #0A0A0F | 4.2:1 | PASS AA | Tertiary text |
| #54546A | #0A0A0F | 2.8:1 | FAIL — captions only | Placeholder, metadata (decorative) |
| #22C55E | #0A0A0F | 7.0:1 | PASS AA | Success text |
| #EF4444 | #0A0A0F | 4.8:1 | PASS AA | Error text |
⚠ #54546A fails WCAG AA — use only for decorative metadata (timestamps, IDs) where context is clear. Never for interactive or status-critical text.
07 · 2026 AI-Native Design Trends — Applied to Shopilot07 · Tendencias de Diseño AI-Native 2026 — Aplicadas a Shopilot
Agentic UI — "Doing" not "Saying"
2025 was chatbots. 2026 is agents that take actions in real systems. The UI must show what the agent DID, not just what it said. Tool call accordion + audit log + rollback panel are the core of agentic UI.2025 fue chatbots. 2026 son agentes que toman acciones en sistemas reales. La UI debe mostrar lo que el agente HIZO, no solo lo que dijo. Tool accordion + audit log + rollback panel son el núcleo de la UI agéntica.
Shopilot: ✓ Already built — tool accordion + audit log + rollback token system
Progressive Disclosure for AI Outputs
Show the answer first, reasoning chain on demand. Users want results, not process. Collapsed tool calls are default; expanded for debugging. This is different from traditional progressive disclosure — AI reasoning is ALWAYS secondary.Muestra la respuesta primero, la cadena de razonamiento bajo demanda. Los usuarios quieren resultados, no proceso. Tool calls colapsados por defecto; expandidos para debugging. Esto es diferente — el razonamiento del AI es SIEMPRE secundario.
Shopilot: ✓ Tool accordion collapsed by default · expandable for transparency
Trust Signals — Provenance & Reversibility
2026 users are sophisticated — they've been burned by AI hallucinations. Every AI action must show: where did this data come from? Can I undo this? This is UX as trust infrastructure. Timestamp + source API + rollback button = trust signal trifecta.Los usuarios de 2026 son sofisticados — los han quemado las alucinaciones del AI. Cada acción AI debe mostrar: ¿de dónde vienen estos datos? ¿Puedo deshacer esto? Timestamp + fuente API + botón rollback = tríada de señal de confianza.
Shopilot: ✓ REVERSIBLE/IRREVERSIBLE labels · rollback tokens · marketplace source badges
Ambient Intelligence — Proactive without Interrupting
The trend away from "ask AI" toward "AI notices things." Proactive suggestion cards that appear from below without stealing focus. Max 2 simultaneous. Swipe to dismiss. The agent watches the marketplace and surfaces insights unprompted.La tendencia de "preguntar al AI" hacia "el AI nota cosas." Tarjetas de sugerencia proactiva que aparecen desde abajo sin robar el foco. Máximo 2 simultáneas. Deslizar para descartar. El agente monitorea el marketplace y presenta insights sin ser solicitado.
Shopilot: ✓ Proactive suggestion cards · slide-up 350ms · max-2-simultaneous rule
Memory Persistence UI — What Does AI Know?
Emerging pattern: showing users what the AI has "learned" about them. CLAUDE.md in Claude Code → SELLER_PROFILE.md in Shopilot. The UI should make this visible (settings panel "Your Coach Profile") and editable. Users trust AI more when they can see and correct its memory.Patrón emergente: mostrar a usuarios qué ha "aprendido" el AI sobre ellos. CLAUDE.md en Claude Code → SELLER_PROFILE.md en Shopilot. La UI debe hacerlo visible (panel settings "Tu Perfil de Coach") y editable. Los usuarios confían más en el AI cuando pueden ver y corregir su memoria.
Shopilot: ⚠ Partially — SELLER_PROFILE injected but not exposed in UI. Add "Coach Memory" settings panel in Phase 2.
Context Window as First-Class UI Element
Power users want to know what the AI "has in mind." The Context Window Bar (already in Section 14 state 26) is now mainstream. Show: system prompt tokens, conversation tokens, marketplace context, available. Compaction banner when 80% full. This is now table stakes for AI products in 2026.Los usuarios avanzados quieren saber qué tiene el AI "en mente." La Context Window Bar (ya en Section 14 estado 26) es ahora mainstream. Muestra: tokens de system prompt, conversación, contexto marketplace, disponibles. Banner de compaction cuando está al 80%. Esto es ahora estándar en productos AI 2026.
Shopilot: ✓ Context Window Bar in sidebar · compaction banner on 80% full
Multi-Model Transparency
Pro users in 2026 expect to know which model is running. Status bar bottom-right shows active model: "claude-opus-4-6" or "gpt-4o" based on LLM router decision. This is transparency as a feature, not just a debug tool. Linear shows which integrations are active — same pattern.Los usuarios Pro en 2026 esperan saber qué modelo está corriendo. La status bar bottom-right muestra el modelo activo: "claude-opus-4-6" o "gpt-4o" según la decisión del LLM router. Transparencia como feature, no solo herramienta de debug.
Shopilot: ✓ Status bar right: marketplace icon · model name · credit balance
Streaming as the Core Interaction Metaphor
The loading spinner is dead in 2026. Everything streams. Chat responses, tool results, sync status, data updates. The visual language of "tokens arriving" (word fade-in 80ms, blinking cursor) is now the universal AI loading pattern. Shopify just adopted it. HubSpot is adopting it.El spinner de carga está muerto en 2026. Todo hace streaming. Respuestas de chat, resultados de tools, estado de sync, actualizaciones de datos. El lenguaje visual de "tokens llegando" (word fade-in 80ms, cursor parpadeante) es ahora el patrón universal de loading AI.
Shopilot: ✓ Word-by-word streaming · blinking cursor · NO spinners, NO skeletons (by design)
08 · Development Methodology — How the 4-Person Team Ships08 · Metodología de Desarrollo — Cómo el Equipo de 4 Shippe
Phase 1 — Weeks 1-3
Design-in-Code · Ship Tokens + Atoms
- → Mateo: tokens.json + Style Dictionary setup
- → Sergio: Electron shell + React sidebar skeleton
- → Sergio: shadcn/ui init + Button + Input + Badge
- → Andrés: Anthropic SDK + IPC bridge + tool router
- → Pablo: this spec + design review on each PR
Deliverable: Electron window opens · sidebar renders · "Hello Claude" works
Phase 2 — Weeks 4-6
AI Agent Loop · Core Organisms
- → Sergio: StreamingText + ThinkingPulse + ToolAccordion
- → Sergio: ConfirmationCard + RollbackPanel
- → Andrés: 10 core tools (price read, competitor, buy box)
- → Mateo: Figma MCP integration + token pipeline setup
- → Pablo: design review of all organisms against Figma specs
Deliverable: Full coach loop working · tool calls visible · confirm/cancel works
Phase 3 — Weeks 7-10
Data + Quality Gates
- → Andrés: DataTable + KPI cards + Context Window Bar
- → Andrés: Audit log + proactive suggestion cards
- → Mateo: axe-core a11y audit + Figma ↔ Code consistency gates
- → Sergio: Empty states + error states all variants
- → Pablo: Beta onboarding + first seller feedback
Deliverable: Beta-ready · full data views · quality gates passing
Design Review Checklist — Every UI PRChecklist de Design Review — Cada PR de UI
Design System Maturity Score — Current State
Next: Sergio starts Week 1 → tokens.json file + shadcn init + Button component. Mateo sets up Style Dictionary. The spec is ready. Now we build.
World-Class Design StrategyEstrategia de Diseño de Clase Mundial
The gap between "good design" and "world-class" is not more components — it's precision at the product level: how screens compose, how the competition fails, what makes sellers trust the UI instantly, and the 20 invisible decisions that separate tier-1 products.La diferencia entre "buen diseño" y "clase mundial" no es más componentes — es precisión a nivel de producto: cómo se componen las pantallas, cómo falla la competencia, qué hace que los vendedores confíen en la UI al instante, y las 20 decisiones invisibles que separan los productos de primer nivel.
01 · B2B Product UX References — Not Brand Books, Product Patterns01 · Referencias de UX de Producto B2B — No Brand Books, Patrones de Producto
These 8 products are referenced for their UX patterns — specific interaction and layout decisions Shopilot should adopt. Different from Section 15 which analyzed brand identity.Estos 8 productos se referencian por sus patrones de UX — decisiones específicas de interacción y layout que Shopilot debe adoptar. Diferente a la Sección 15 que analizó identidad de marca.
Metric Card Pattern
Big mono number (42px) → small label above → percentage delta below with color · No chart inside the card (chart is separate) · Hover reveals tooltip with exact timestamp
→ Shopilot: KPI cards follow this exact hierarchy
Activity Timeline
Every action logged with: type icon + description + amount + timestamp · Clickable row reveals full detail · Infinite scroll (no pagination) · Timeline = trust
→ Shopilot: Audit Log follows this pattern exactly
Developer-first but accessible
API keys visible in UI · Raw JSON expandable · But non-technical users see clean summaries · Same data, two perspectives on same screen
→ Shopilot: Tool accordion shows summary + expandable JSON
Linear
Keyboard-first B2B product · Speed as primary UX feature
Speed as Marketing
Linear measured and published their p50/p95 load times. "Built for speed" is a design statement. Every interaction under 100ms feels intentional. This is a UX strategy, not just engineering.
→ Shopilot: Measure + display model response time. Make it a feature.
Status as Color Only
No status text ("In Progress", "Done") on lists — just colored dots. Experts read the color map in <0.5s. Power users trained to read color grids in one glance.
→ Shopilot: Buy Box status = orange dot, not "You have buy box: YES"
Kbd Shortcut Badges Everywhere
Every action in dropdown shows keyboard shortcut. This teaches users → makes them faster → makes them dependent → reduces churn. Shortcut visibility = retention feature.
→ Shopilot: All dropdowns show Cmd+K, Cmd+1, Esc shortcuts inline
Figma
Complex tool with zero cognitive friction · Panel architecture
3-Panel Information Architecture
Left: navigation/layers · Center: work surface · Right: contextual properties. This is the master pattern for complex tools. The content always has max space. Panels are tools, not content.
→ Shopilot: Marketplace=center, Sidebar=right panel. Left nav deferred to v2.
Context-Sensitive Right Panel
The right panel changes based on what's selected. Select a component → see its properties. Click away → see general settings. Sidebar in Shopilot should adapt to the marketplace page being viewed.
→ Shopilot Phase 2: sidebar context = active ASIN on marketplace page
Multiplayer Visual Cues
Other users visible as colored cursors. Multi-tab shows who's looking at what. In Shopilot context: the AI "cursor" — the coach's attention indicator (which tool it's running, what data it's looking at right now).
→ Shopilot: "Coach is analyzing ASIN B08XYZ" status in sidebar header
Datadog
The benchmark for monitoring dashboards · Density without chaos
Time as Primary Axis
Every metric in Datadog is a time series. The X axis is always time. This trains users to think in trends, not point-in-time snapshots. For Shopilot: Buy Box % over 30d is more actionable than Buy Box % right now.
→ Shopilot: All KPIs have 7d/30d sparklines. Point values only for current.
Alert Integration in Charts
Threshold lines appear ON charts, not in separate alerts. When a metric crosses a line, the chart background changes color. Alert IS the chart. No separate notification panel for threshold breaches.
→ Shopilot: Price threshold line on competitor chart. Red zone when below margin.
Faceted Filtering
Left sidebar has real-time faceted filters that update counts as you click. Tags/dimensions are first-class citizens. For sellers: filter by marketplace + category + status simultaneously. Update counts in real-time.
→ Shopilot Phase 2: ASIN table with faceted filters (marketplace, category, status)
Arc Browser
The best Electron app built — breaks every browser convention and wins
Sidebar IS the App Chrome
Arc moved ALL chrome (tabs, bookmarks, history) to the left sidebar. The content area is 100% undecorated. This is the insight: in Electron, the sidebar is where your app lives. The WebContentsView is sacred space.
→ Shopilot: sidebar has zero visual decoration except the chat + tool calls + status
Custom Title Bar Done Right
Arc's frameless window with custom controls that feel MORE native than native. The traffic light buttons are in their correct position, drag region is the entire top bar, full-screen transitions are perfect.
→ Shopilot: frameless + native traffic lights + 32px drag region + tab bar after
Command Bar as Primary Navigation
Arc's Cmd+T opens a search-everything command bar. This is the #1 power user feature. Arc trained millions of users to navigate entirely by keyboard. Once users find the command bar, they never use menus again.
→ Shopilot: Cmd+K opens command palette: "analyze B08XYZ", "reprice all", "show alerts"
Bloomberg Terminal
The extreme end of data density done right · Reference for seller data density
Density as Expertise Signal
Bloomberg is deliberately dense. It signals: "this is for professionals." The density IS the marketing — it makes users feel expert just by using it. Shopilot sellers are professionals. They can handle density. Don't dumb it down.
→ Shopilot: Don't simplify competitor tables. Show all 8 columns. Professionals want data.
Color = Directionality Only
Bloomberg uses green/red ONLY for up/down price movement. No other meaning. Nothing else is green or red. This absolute discipline means users process market data at a glance without thinking about color meaning.
→ Shopilot: #22C55E = price up / won Buy Box. #EF4444 = price down / lost Buy Box. Nothing else.
Monospace as Alignment
All Bloomberg data is monospace because financial data must align vertically. The $1,234.56 must be perfectly below $98.76 and $12,300.00. Misalignment breaks scanning. Monospace is structural, not decorative.
→ Shopilot: JetBrains Mono for all numbers is Bloomberg discipline applied to e-commerce.
Notion
Progressive disclosure master · Slash commands as interaction metaphor
Slash Command = AI Interaction
Notion's "/" opens inline commands. Claude Code uses the same pattern. This is now the universal AI interaction metaphor. Shopilot's chat input should support "/" for quick actions: "/reprice", "/analyze", "/report".
→ Shopilot: "/" in chat input opens quick-action palette with 36 tools
Properties Reveal on Hover
Notion rows show only essential data by default. Hover reveals additional properties. This keeps lists clean while preserving data access. For Shopilot: ASIN rows show Name + Price + Buy Box. Hover reveals: SKU, inventory, last sync.
→ Shopilot: ASIN row hover reveals secondary metrics (expandable hover card)
Everything is a Block
Notion's single abstraction ("a block") unifies all content types. For Shopilot: every item in the sidebar is "a message" — user message, assistant message, tool call, confirmation card, proactive suggestion. Same base type, different renders.
→ Shopilot: MessageBlock type with discriminated union: text | tool | confirm | proactive
Intercom
Fin AI + human handoff · The original AI product with trust signals
AI vs Human Indicator
Intercom shows whether Fin AI or a human is responding. The AI has a bot icon; human has a photo. For Shopilot: the coach always shows "Powered by Claude Opus 4.6" + current model. Users trust labeled AI more than unlabeled AI.
→ Shopilot: sidebar header shows model name + version. Always visible, never hidden.
Proactive + Reactive in Same UI
Intercom shows proactive campaigns AND reactive inbox in same interface. Two modes: outbound (AI initiates) and inbound (user initiates). For Shopilot: the coach can initiate conversations ("I noticed X") and respond to queries.
→ Shopilot: proactive suggestion cards (coach-initiated) + chat input (user-initiated) in same sidebar
Context Panel Always Visible
Intercom inbox shows customer context alongside every conversation — purchase history, previous tickets, plan level. The agent never has to "look it up." For Shopilot: the coach always has seller profile + marketplace data visible in context.
→ Shopilot: context bar top of sidebar shows active marketplace + seller plan + top ASIN count
02 · Screen Compositions — What Each Main Screen Actually Looks Like02 · Composiciones de Pantalla — Cómo se Ven Realmente las Pantallas Principales
The biggest gap in the spec before this section. Components are defined; screens are not. These CSS mockups show exact proportions, component placement, and information hierarchy.El mayor gap del spec antes de esta sección. Los componentes están definidos; las pantallas no. Estos mockups CSS muestran proporciones exactas, ubicación de componentes y jerarquía de información.
Screen 01 · Coach View — Main Application Screen (70/30)
Title bar
32px · frameless · traffic lights · tab bar after buttons · drag region
Marketplace 70%
WebContentsView · URL bar 28px · content scrolls natively · no interference
Sidebar 30%
React · header 36px · context bar · chat scroll · input sticky · status 20px
Status bar
20px · left: marketplace status · right: credit balance (JetBrains Mono)
Screen 02 · Dashboard View — Sidebar in "Overview" Mode
Dashboard mode: sidebar replaces chat history with KPI summary + opportunity list when agent is idle. Chat input always present. Click any opportunity → coach activates and analyzes it.
03 · Competitive Design Matrix — Why Shopilot Looks Different03 · Matriz Competitiva de Diseño — Por Qué Shopilot Se Ve Diferente
The existing seller tools (Helium 10, SellerBoard, Jungle Scout, Repricer.com) were designed in 2012-2018. They solve the right problems with completely wrong design language for 2026. This is Shopilot's visual competitive moat.Las herramientas actuales para vendedores fueron diseñadas en 2012-2018. Resuelven los problemas correctos con un lenguaje de diseño completamente equivocado para 2026. Esta es la ventaja competitiva visual de Shopilot.
| Dimension | Helium 10 | SellerBoard | Jungle Scout | Repricer.com | Shopilot ★ |
|---|---|---|---|---|---|
| Design Era | 2018 · SaaS purple | 2015 · Excel aesthetic | 2017 · Consumer green | 2013 · Corporate blue | 2026 · AI-native dark |
| Primary BG | #6B4FBB purple | #FFF white | #1D6F42 green | #1B4F8A navy | #0A0A0F near-black |
| AI Integration | Bolt-on chatbot (2024) | None | AI keywords only | Rule-based only | AI-first · agent loop · 36 tools |
| Number Display | Default browser font | Arial/Helvetica | Proxima Nova regular | System serif | JetBrains Mono always |
| Dark Mode | ✗ Light only | ✗ Light only | ⚠ Toggle (half done) | ✗ Light only | ✓ Dark-first · identity |
| Desktop App | ✗ Web only | ✗ Web only | ✗ Web only | ✗ Web only | ✓ Electron · native feel |
| Reversibility | ✗ Not labeled | ✗ Not labeled | ✗ Not labeled | ⚠ Confirm dialog only | ✓ REVERSIBLE/IRREVERSIBLE · rollback tokens |
| Typography system | 1-2 fonts, no scale | System fonts | Proxima Nova only | System fonts | Inter Display + JetBrains Mono · full scale |
| Context awareness | ✗ Manual switch | ✗ Manual switch | ✗ Manual switch | ✗ Manual switch | ✓ Coach sees active marketplace page |
| Perceived quality | Tool (functional) | Spreadsheet | Consumer app | Legacy SaaS | Precision instrument · Bloomberg meets Claude |
★ The Core Design Insight★ El Insight Central de Diseño
Every competitor was designed by engineers for engineers. Shopilot is designed by a seller who has used all of these tools and knows exactly what they get wrong. The dark + professional + monospace + AI-native aesthetic isn't a trend — it's the natural design language of a serious professional tool for 2026. This is the same design evolution that happened in finance (Bloomberg → Robinhood), in code (Eclipse → VS Code → Cursor), and in project management (JIRA → Linear).Cada competidor fue diseñado por ingenieros para ingenieros. Shopilot es diseñado por un vendedor que ha usado todas estas herramientas y sabe exactamente qué hacen mal. La estética dark + profesional + monospace + AI-native no es una tendencia — es el lenguaje de diseño natural de una herramienta profesional seria para 2026.
04 · Emotional Design Map — From First Install to Power User04 · Mapa de Diseño Emocional — Del Primer Install al Power User
0s · First Impression
"This looks serious"
Dark canvas opens. Orange accent. Shopilot logo. No splash screen, no loading animation. App IS the window.
Design: near-black bg · frameless · logo mark visible · zero clutter
30s · Onboarding
"This is fast"
5-step wizard. Step 1: value prop. Step 2: OAuth in 30s. Step 3: language/category. Skip from step 3.
Design: progress dots · one action per step · CTA dominant · NO form fields until step 3
2min · First Tool Call
"The AI knows my data"
Coach runs first analysis unprompted. Tool accordion shows real API calls to their real store. This is the trust moment.
Design: tool accordion opens · real ASIN names · JetBrains Mono numbers · "From Amazon API"
5min · Aha Moment
"I didn't know this"
Coach surfaces an insight the seller didn't have: "You lost Buy Box on 8 ASINs in the last 24h. Here's why." This is the aha moment.
Design: proactive card slides up · specific numbers · one-click action · orange CTA
Day 1 · First Win
"It actually worked"
Price was changed. Buy Box % goes up. Confirmation with actual before/after. The coach says "Buy Box recovered to 91%."
Design: success state · green + orange celebrate · audit log entry · rollback still visible
Week 1 · Habit
"I check this every morning"
Dashboard view shows overnight changes. 3 opportunities queued. Seller opens app and acts on them before coffee is done.
Design: dashboard mode · opportunities sorted by $$$ impact · 1-click actions · <60s daily ritual
Month 1 · Expert
"I can't operate without this"
Power user. Knows Cmd+K, "/" commands. Audit log is their source of truth. Coach Memory has learned their preferences.
Design: keyboard shortcuts visible · command palette muscle memory · history as data
Designed Delight Moments — The Details That StickMomentos de Deleite Diseñados — Los Detalles que Se Quedan
First Buy Box Win Celebration
When buy box goes from ✗ to ✓, the status dot pulses green 3x with scale(1.4). Subtle. Not a confetti explosion. Professional delight.
Typing Indicator Before Coach Responds
The ··· thinking pulse with "Shopilot is analyzing your store" appears immediately when user sends message. Never a blank moment.
Rollback Success State
When rollback completes, the audit log entry shows "↩ Reversed · 2.3s ago" in green. The system communicates "you're safe, it worked."
Coach Memory Acknowledgment
When coach uses seller's stored preference, it says "(using your saved preference: always protect margins >30%)". Shows it's paying attention.
Competitor Detected Alert
When a new seller lists on one of your ASINs, the proactive card appears with their name, price, and rating. Feels like having eyes everywhere.
Credit Milestone
When seller uses their 100th credit, a discreet banner: "100 actions taken · Avg response: 1.2s · $847 in revenue impacts attributed." Numbers build pride.
05 · E-Commerce Domain Visual Patterns — What No Other Design System Has05 · Patrones Visuales Específicos de E-Commerce — Lo que Ningún Otro Design System Tiene
Generic design systems cover buttons and inputs. Shopilot needs patterns specific to e-commerce seller intelligence. These are the domain-specific visual components that make the product feel built BY a seller.Los design systems genéricos cubren botones e inputs. Shopilot necesita patrones específicos de inteligencia de vendedores e-commerce. Estos son los componentes visuales específicos del dominio que hacen que el producto se sienta construido POR un vendedor.
Buy Box Indicator — 4 States
Rule: Buy Box % is ALWAYS JetBrains Mono. Color = status only. No text labels on list view (dot only). Labels on detail view.
Price Delta Display — Competitor Comparison
You row = orange bar. Winner row = highlighted. Relative bar shows price position visually. Delta shown as absolute + direction. Never percentage-only.
BSR Trend Sparkline — Inline in Table
BSR: LOWER = BETTER (rank #1 = bestseller). Sparkline: green slope = improving (going toward #1). ALWAYS show direction word, not just number. Color shadow band adds weight without legend.
Inventory Health Grid — Portfolio View
Each cell = one ASIN. Color = stock health (green >60d / amber 15-60d / red <15d). Number = days remaining. Glanceable portfolio status. No labels needed — color + number is sufficient.
06 · Color Blindness Safety — Accessible for All Sellers06 · Seguridad para Daltonismo — Accesible para Todos los Vendedores
~8% of men and ~0.5% of women have red-green color blindness. For Shopilot, this means Buy Box won (green) vs lost (red) may be indistinguishable to ~1 in 12 male sellers. The fix: never use color alone for meaning. Always pair with icon, text, or shape.~8% de hombres y ~0.5% de mujeres tienen daltonismo rojo-verde. Para Shopilot esto significa que Buy Box ganado (verde) vs perdido (rojo) puede ser indistinguible para ~1 de cada 12 vendedores hombres. La solución: nunca usar solo el color para transmitir significado.
Deuteranopia (Red-Green Blind)
Most common: green-blind. Reds appear brownish-yellow. Greens appear similar to orange.
Problem: green and red dots look identical to deuteranopes. Users can't distinguish Buy Box won vs lost by color alone.
Fix: Shape + Color (WCAG 1.4.1)
Never use color alone. Always pair color with shape, icon, or text pattern.
Solution: ✓/✗/— icons work even without color. Color still helps non-colorblind users scan faster.
Safe Color Pairs (Accessible)
These color combinations are distinguishable under all common color blindness types:
Testing tools: Figma "Color Blind" plugin · Chrome DevTools accessibility panel · coblis.de online simulator
07 · The 20 Invisible Decisions That Make Products World-Class07 · Las 20 Decisiones Invisibles que Hacen los Productos de Clase Mundial
Users can't name these details. But they feel them. A user who says "this just feels premium" is responding to some combination of these 20 decisions. None of them take more than a few hours to implement. All of them matter.Los usuarios no pueden nombrar estos detalles. Pero los sienten. Un usuario que dice "esto simplemente se siente premium" está respondiendo a alguna combinación de estas 20 decisiones. Ninguna toma más de pocas horas de implementar. Todas importan.
① Letter-spacing on headings
-0.03em on h2 makes text look designed, not default. Default tracking = amateur.
② Consistent 4px grid
Every spacing value divisible by 4. Not "16px here, 18px there." Inconsistency is invisible but users sense the chaos as "roughness."
③ Inset shadow on cards
inset 0 1px 0 rgba(255,255,255,.06) adds glass depth. Without it, dark cards look flat and dead.
④ Transition on color changes
transition: background 150ms ease, color 150ms ease on all interactive elements. Instant color changes feel abrupt and cheap.
⑤ Border on focus, not outline
Browser default outline is ugly. Replace with box-shadow: 0 0 0 2px rgba(249,115,22,.5). Same a11y benefit, premium look.
⑥ Disabled ≠ invisible
Disabled elements at 50% opacity tell users "this exists but you can't use it yet." Not display:none. Visibility + opacity = correct pattern.
⑦ Line-height on body text = 1.5
Dense data UIs are tempting to set to 1.2. Don't. AI-generated text needs 1.5 minimum for readability. Chat messages need 1.6.
⑧ Cursor: pointer on interactive divs
If it's clickable, it needs cursor: pointer. Forgetting this on tool accordions or proactive cards breaks the interaction expectation.
⑨ Tabular nums on ALL numbers
font-variant-numeric: tabular-nums makes numbers align in columns. Without it, a table of prices is unreadable.
⑩ Scrollbars styled or hidden
Default scrollbars look terrible on dark UIs. Either hide with ::-webkit-scrollbar or make them thin + dark. Visible ugly scrollbars = unfinished product.
⑪ No horizontal scroll on mobile
Electron windows can be resized smaller than expected. overflow-x:hidden on body, overflow-x:auto on tables only.
⑫ Semantic HTML elements
Use <button> not <div onclick>. <time> for timestamps. <output> for live AI output. Semantic = better a11y + better dev experience.
⑬ Will-change on animated elements
will-change: transform, opacity on sliding cards and streaming text. Moves animation to GPU. Eliminates jank at 60fps.
⑭ Error messages explain what to DO
"Error 403" = terrible. "Your Amazon credentials expired. Click Reconnect to re-authorize in 30 seconds." = world-class. Every error has a next step.
⑮ Timestamps in user timezone
Never show UTC. Use Intl.DateTimeFormat(locale, {timeZone}). "2:34 PM" not "19:34 UTC". Sellers check timestamps constantly.
⑯ Number formatting by locale
MeLi sellers in Mexico: $1,847.50 not 1847,50 MXN. Use Intl.NumberFormat. Wrong number format breaks trust immediately.
⑰ Empty inputs have placeholder text
Chat input: "Ask your coach about any ASIN, competitor, or pricing decision..." Not "Type here" or blank. Placeholder teaches the product's power.
⑱ Correct text cursor in inputs
Input fields: cursor: text. Buttons: cursor: pointer. Disabled: cursor: not-allowed. Every cursor state must be right.
⑲ Data source attribution
Below every KPI: "From Amazon Seller Central API · Synced 4 min ago" in 10px gray. This is the invisible trust builder. Users who see source attribution trust the numbers more.
⑳ Reduce motion for vestibular
@media (prefers-reduced-motion: reduce) { * { animation-duration: 0.01ms; } } — respects OS accessibility settings. Required for WCAG 2.3.3.
Production Readiness — Critical Gaps Listo para Producción — Brechas Críticas
30-point audit results · 14 gaps identified · All HIGH/MEDIUM severity specs Resultados de auditoría 30 puntos · 14 brechas identificadas · Specs severidad ALTA/MEDIA
This section was generated from a systematic 30-point codebase audit. Each sub-section contains actionable implementation specs. Address HIGH items before public beta. MEDIUM items before v1.0 GA. Esta sección fue generada a partir de una auditoría sistemática de 30 puntos. Cada sub-sección contiene specs de implementación accionables. Resolver ítems HIGH antes del beta público. MEDIUM antes de v1.0 GA.
HIGH 01 · Update Notification System 01 · Sistema de Notificación de Actualizaciones
MISSINGelectron-updater is configured for auto-download but the user-facing update experience is completely unspecified. Silent updates break trust — users need to know when and why the app changed. electron-updater está configurado para auto-descarga pero la experiencia de actualización para el usuario no está especificada. Las actualizaciones silenciosas rompen la confianza.
Update State Machine
Update Available Modal — Live Spec
Shopilot 1.3.0 available
You have version 1.2.4. Download is ready.
What's new
- Coach: 3x faster tool execution with parallel calls
- MercadoLibre: new competitor tracking for MX sellers
- Fixed: Rollback confirmation not dismissing on success
- Fixed: Credit balance not updating after top-up
Implementation — main process (click to expand)
// main/updater.ts
import { autoUpdater } from 'electron-updater';
import { BrowserWindow, dialog } from 'electron';
autoUpdater.autoDownload = true;
autoUpdater.autoInstallOnAppQuit = true;
autoUpdater.on('update-available', (info) => {
mainWindow.webContents.send('update:available', {
version: info.version,
releaseNotes: info.releaseNotes,
});
});
autoUpdater.on('download-progress', (progress) => {
mainWindow.webContents.send('update:progress', {
percent: Math.round(progress.percent),
bytesPerSecond: progress.bytesPerSecond,
});
});
autoUpdater.on('update-downloaded', () => {
mainWindow.webContents.send('update:ready');
});
// IPC handler — user clicks "Restart & Update"
ipcMain.handle('update:install', () => {
autoUpdater.quitAndInstall(false, true); // isSilent=false, forceRunAfter=true
});
// Check interval: on launch + every 4 hours
autoUpdater.checkForUpdatesAndNotify();
setInterval(() => autoUpdater.checkForUpdatesAndNotify(), 4 * 60 * 60 * 1000);
| State | UI Pattern | Dismissible? |
|---|---|---|
| checking | Status bar dot pulses blue — silent | Auto |
| available | In-app banner: "New version available. View details" | Yes (persists until restart) |
| downloading | Modal with changelog + progress bar (auto-shown) | Yes (download continues) |
| ready | Modal: "Ready to install. Restart now?" with changelog | Yes (installs on quit) |
| error | Silent (logged to Sentry) — do not bother user for update errors | N/A |
HIGH 02 · Local Chat Persistence 02 · Persistencia Local del Chat
MISSINGChat sessions vanish on app restart. No localStorage, no IndexedDB, no Zustand persist spec exists anywhere in the codebase. Sellers who close the app lose all context — a critical trust failure. Las sesiones de chat desaparecen al reiniciar la app. No hay spec de localStorage, IndexedDB, ni Zustand persist en todo el codebase. Los sellers que cierran la app pierden todo el contexto.
Data Model — What to Persist
ChatSession (IndexedDB — shopilot-chat store)
interface ChatSession {
id: string; // uuid
marketplaceId: 'amazon' | 'meli' | 'shopify';
asin?: string; // active context when session started
messages: Message[]; // all messages including tool calls
createdAt: number; // unix ms
updatedAt: number;
tokenCount: number; // for context window visualization
title?: string; // auto-generated from first user message (truncated 60 chars)
}
Zustand Store — React State Layer
import { create } from 'zustand';
import { persist, createJSONStorage } from 'zustand/middleware';
// Lightweight: only persist session index (not full messages)
// Full messages go to IndexedDB via idb-keyval
const useChatStore = create(persist(
(set, get) => ({
sessions: [] as SessionMeta[], // { id, title, updatedAt, marketplace }
activeSessionId: null as string | null,
setActiveSession: (id: string) => set({ activeSessionId: id }),
addSession: (meta: SessionMeta) =>
set(s => ({ sessions: [meta, ...s.sessions].slice(0, 100) })), // keep last 100
}),
{
name: 'shopilot-chat-store',
storage: createJSONStorage(() => localStorage), // session index only
}
));
// Full messages: idb-keyval (no serialization overhead)
import { get as idbGet, set as idbSet, del as idbDel } from 'idb-keyval';
export const loadSession = (id: string) => idbGet<ChatSession>(`session:${id}`);
export const saveSession = (s: ChatSession) => idbSet(`session:${s.id}`, s);
export const deleteSession = (id: string) => idbDel(`session:${id}`);
| Storage | What's stored | Retention | Size limit |
|---|---|---|---|
| localStorage | Session index (id, title, timestamp) | 100 sessions | ~20 KB |
| IndexedDB | Full message arrays with tool calls | 90 days, then pruned | ~50 MB soft cap |
| safeStorage | API keys, marketplace credentials | Until user logout | Negligible |
| SQLite (main) | Audit log, price history, snapshots | 180 days | 500 MB max |
Session History UI — Sidebar Panel
When chat input is empty: show last 5 sessions as clickable cards below input. Each card: title (auto) + marketplace icon + relative time. Clicking loads the session and resumes context. Pattern adopted from Claude.ai sidebar.
HIGH 03 · GDPR, Data Export & Account Deletion 03 · GDPR, Exportación de Datos y Eliminación de Cuenta
MISSINGZero documentation of user data download, account deletion, or data retention. Required by GDPR (EU), LGPD (Brazil — critical for MeLi sellers), and expected by Apple App Store Review. Must exist before any public release. Sin documentación de descarga de datos, eliminación de cuenta o retención. Requerido por GDPR (UE), LGPD (Brasil — crítico para sellers de MeLi), y App Store Review. Debe existir antes de cualquier lanzamiento público.
Personal Data Inventory (PII Map)
| Data Type | Where stored | Purpose | Retention | Exportable? |
|---|---|---|---|---|
| Email address | Supabase auth.users | Account identity | Until deletion | Yes |
| Marketplace credentials | Electron safeStorage (local) | API access | Until revoke | No (keys) |
| Chat history | Local IndexedDB | Session continuity | 90 days | Yes (JSON) |
| Audit log | Local SQLite | Rollback & trust | 180 days | Yes (CSV) |
| Usage telemetry | PostHog (cloud) | Product analytics | 24 months | On request |
| Credit transactions | Supabase billing | Billing history | 7 years (legal) | Yes (PDF) |
| Error/crash reports | Sentry (cloud) | Bug fixing | 90 days | No (aggregate) |
Data Export Package — ZIP Structure
shopilot-export-{userId}-{YYYYMMDD}.zip
├── README.txt # What's in this export, data policy link
├── account/
│ ├── profile.json # email, plan, created_at, last_login
│ └── billing_history.csv # date, amount, credits, description
├── chat_history/
│ ├── sessions_index.json # session metadata (title, date, marketplace)
│ └── session_{id}.json × N # full message arrays per session
├── audit_log/
│ └── actions.csv # timestamp, action, asin, old_value, new_value, reversible
└── telemetry_summary.json # aggregate usage stats (no PII included)
Account Deletion Flow (GDPR Article 17 — Right to Erasure)
- User navigates to Settings → Account → "Delete Account"
- Modal: "This will permanently delete your account and all data. Export your data first?" with [Export Data] + [Continue to Delete] buttons
- Type "DELETE" in text field to confirm (same pattern as Vercel, Supabase)
- Server-side: mark account deleted_at → Supabase Edge Function queues hard delete in 30 days (grace period for disputes)
- Local: clear all IndexedDB stores + localStorage + SQLite + safeStorage keys on next launch
- Confirmation email: "Your Shopilot account will be permanently deleted on {date+30d}. Cancel: {link}"
HIGH 04 · Observability & Error Tracking (Sentry) 04 · Observabilidad y Seguimiento de Errores (Sentry)
PARTIALSentry is mentioned in the stack but sampling rates, PII filtering, event taxonomy, and performance monitoring thresholds are not specified. Under-instrumented apps have silent failures in production. Sentry aparece en el stack pero sin tasas de muestreo, filtrado de PII, taxonomía de eventos ni umbrales de performance. Las apps sub-instrumentadas tienen fallos silenciosos en producción.
Sentry Configuration Spec (click to expand)
// renderer/main.tsx — Sentry init
import * as Sentry from '@sentry/electron/renderer';
Sentry.init({
dsn: process.env.VITE_SENTRY_DSN,
environment: process.env.NODE_ENV,
release: app.getVersion(),
// Sampling — aggressive in dev, conservative in prod
tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
profilesSampleRate: 0.05, // CPU profiling — 5% of transactions
// PII Scrubbing — NEVER send user content to Sentry
beforeSend(event) {
// Strip message content (chat messages may contain business data)
if (event.extra?.messages) delete event.extra.messages;
if (event.extra?.prompt) delete event.extra.prompt;
// Strip marketplace credentials from breadcrumbs
event.breadcrumbs?.values?.forEach(crumb => {
if (crumb.data?.token) crumb.data.token = '[Filtered]';
if (crumb.data?.apiKey) crumb.data.apiKey = '[Filtered]';
});
return event;
},
// Integrations
integrations: [
Sentry.browserTracingIntegration(),
Sentry.replayIntegration({
maskAllText: true, // block all text from session replay
blockAllMedia: true,
}),
],
});
Custom Event Taxonomy
| Event Name | Trigger | Severity | Alert? |
|---|---|---|---|
| tool_execution_failed | Tool returns error after 3 retries | Warning | No |
| irreversible_action_taken | Price change / inventory update confirmed | Info | No |
| credit_exhausted | Balance hits 0 | Warning | Yes (Slack) |
| marketplace_auth_expired | API returns 401/403 | Error | Yes (Slack) |
| claude_api_error | Anthropic API returns 5xx | Error | Yes (PagerDuty) |
| ipc_bridge_timeout | IPC call > 5s with no response | Critical | Yes (PagerDuty) |
| rollback_failed | Rollback tool returns error | Critical | Yes (PagerDuty) |
Performance Thresholds (alert if exceeded)
- App cold start: > 3s → warning
- IPC round-trip: > 500ms → warning
- Tool execution: > 10s → log
- First token latency: > 2s → log
- Chat render FPS: < 30fps → log
User Context (always attach)
Sentry.setUser({
id: userId, // NOT email
plan: 'pro',
});
Sentry.setContext('marketplace', {
active: 'amazon',
region: 'US',
});
// NEVER set: email, apiKey, sellerId
MEDIUM 05 · In-App Support & Help Center 05 · Soporte In-App y Centro de Ayuda
MISSINGNo help center, FAQ panel, or support chat specified. B2B desktop apps need accessible support without leaving the app. Pattern: ? button in status bar → slide-over panel with search + articles + live chat. Sin centro de ayuda, panel FAQ ni chat de soporte especificado. Las apps B2B necesitan soporte accesible sin salir de la app.
Support Entry Points
- ①Status bar ? — always visible, 24px tall, right corner. Opens help slide-over. Free plan gets async email; Pro gets live chat widget (Crisp or Intercom).
- ②Error recovery banners — Type A errors include "Need help?" link that pre-fills support form with error context.
- ③Keyboard: Cmd+Shift+? — opens help slide-over from anywhere in the app.
- ④First run onboarding — 3-step coach intro with "How does this work?" expandable FAQ inline.
Help Panel Anatomy
Popular Articles
Integration Recommendation: Crisp.chat (not Intercom)
Crisp is $25/mo vs Intercom's $74/mo minimum. Crisp has a WebView embed that works in Electron without SDK conflicts. For v1: embed Crisp chatbox in the help slide-over WebContentsView. For v2: evaluate Intercom when MRR > $10K.
MEDIUM 06 · Demo & Trial Mode 06 · Modo Demo y Trial
MISSINGNo sandbox or mock data strategy exists. New users who haven't connected a marketplace account see a blank app. Every B2B tool that converts well has a demo mode that shows the product's value immediately. No existe estrategia de sandbox o datos de prueba. Los usuarios nuevos sin cuenta marketplace conectada ven una app vacía. Todo B2B que convierte bien tiene un modo demo que muestra el valor del producto inmediatamente.
Demo Data Strategy
// demo/fixtures.ts
export const DEMO_SELLER = {
marketplace: 'amazon',
region: 'US',
storeName: 'Acme Electronics',
plan: 'pro',
};
export const DEMO_ASINS = [
{ asin: 'B08N5WRWNW', title: 'Wireless Earbuds Pro',
price: 49.99, buyBox: 78, bsr: 1247, stock: 342 },
{ asin: 'B09G9FPHY6', title: 'USB-C Hub 7-in-1',
price: 34.99, buyBox: 0, bsr: 891, stock: 12 }, // stock warning
{ asin: 'B0BDJ179PH', title: 'Phone Stand Aluminum',
price: 19.99, buyBox: 34, bsr: 3401, stock: 98 },
];
export const DEMO_COMPETITORS = {
'B08N5WRWNW': [
{ seller: 'TechDirect', price: 47.99, bbPercent: 22 },
{ seller: 'ElectroHub', price: 51.99, bbPercent: 0 },
],
};
// Demo coach responses — scripted for max "aha moment"
export const DEMO_CHAT_SCRIPT = [
{
trigger: 'first_message',
response: 'I can see your Buy Box win rate dropped 23% this week on B08N5WRWNW. Your main competitor TechDirect lowered their price to $47.99 two days ago. Want me to analyze if repricing to $46.49 would recover the Buy Box while maintaining your margin?',
},
];
Demo Banner — Persistent Indicator
Demo Mode
Simulated data — no real changes will be made
Demo Mode Rules
- • All tool calls return fixture data, never real API
- • Confirmation dialogs work but action is a no-op
- • Credits don't decrement (infinite demo credits)
- • Audit log shows demo actions with 🎭 prefix
- • "Connect Account" CTA always visible in sidebar
- • Demo mode auto-activates if no marketplace connected
MEDIUM 07 · Multi-Account Management 07 · Gestión Multi-Cuenta
MISSINGPower sellers operate 2-5 marketplace accounts (Amazon US + MX, MeLi MX + CO). No account switching UI is specified. This is a v1 blocker for agency users and will be requested in the first week of beta. Los sellers avanzados operan 2-5 cuentas de marketplace. No existe UI para cambio de cuenta. Es un bloqueador v1 para usuarios agencia.
Account Data Model
interface MarketplaceAccount {
id: string; // uuid
marketplace: 'amazon' | 'meli' | 'shopify';
region: string; // 'US' | 'MX' | 'CO' | etc.
displayName: string; // "Acme US Store"
avatarInitials: string; // "AC"
avatarColor: string; // auto-assigned from palette
lastSynced: number;
isDefault: boolean;
credentialKey: string; // safeStorage key reference
}
// Max accounts per plan:
// Free: 2 accounts
// Pro: 10 accounts
// (encourages Pro upsell for agencies)
Account Switcher UI
Switch Account
Acme US
Amazon US · ✓ active
Acme MX
Amazon MX
Add account
Account Switch Behavior
- • Context isolation: chat history, ASIN lists, and audit logs are scoped per account — switching loads the other account's data
- • Keyboard shortcut: Cmd+Shift+A opens account switcher dropdown
- • Status bar: shows active account name truncated to 20 chars + marketplace icon
- • Switch is instant: no reload, React state swap — chat input clears, context bar updates, tab bar highlights appropriate marketplace
MEDIUM 08 · Desktop OS Integration — Missing Specs 08 · Integración con el SO Desktop — Specs Faltantes
PARTIALSeveral Electron desktop OS integration points are specified at a high level but lack implementation detail: single-instance lock, deep link protocol, right-click context menus, tray badge counts, and drag-and-drop. Varios puntos de integración con el SO están especificados a alto nivel pero sin detalle de implementación.
Single-Instance Lock (prevents duplicate windows)
// main/index.ts
const gotTheLock = app.requestSingleInstanceLock();
if (!gotTheLock) {
app.quit(); // Second instance — quit immediately
} else {
// First instance: handle second-instance attempt
app.on('second-instance', (event, commandLine) => {
if (mainWindow) {
if (mainWindow.isMinimized()) mainWindow.restore();
mainWindow.focus();
// If launched with deep link (e.g., shopilot://auth/callback?code=...)
const deepLink = commandLine.find(arg => arg.startsWith('shopilot://'));
if (deepLink) handleDeepLink(deepLink);
}
});
}
Deep Link Protocol — shopilot://
// main/index.ts — Protocol registration
if (process.defaultApp) {
if (process.argv.length >= 2) {
app.setAsDefaultProtocolClient('shopilot', process.execPath, [path.resolve(process.argv[1])]);
}
} else {
app.setAsDefaultProtocolClient('shopilot');
}
// Supported deep link routes:
// shopilot://auth/callback?code=&state= → OAuth2 callback (Amazon/MeLi)
// shopilot://asin/{asin} → Focus chat on specific ASIN
// shopilot://alert/{alertId} → Open specific fraud/price alert
// shopilot://billing/upgrade → Jump to billing settings
function handleDeepLink(url: string) {
const parsed = new URL(url);
switch (parsed.pathname) {
case '/auth/callback':
mainWindow.webContents.send('auth:callback', {
code: parsed.searchParams.get('code'),
state: parsed.searchParams.get('state'),
});
break;
case `/asin/${parsed.pathname.split('/')[2]}`:
mainWindow.webContents.send('navigate:asin', parsed.pathname.split('/')[2]);
break;
}
}
Right-Click Context Menus
// main/contextMenu.ts
import { Menu, MenuItem, ipcMain } from 'electron';
ipcMain.on('show-context-menu', (event, context) => {
const menu = new Menu();
if (context.type === 'asin') {
menu.append(new MenuItem({
label: `Analyze ${context.asin}`,
click: () => event.sender.send('coach:analyze', context.asin),
}));
menu.append(new MenuItem({
label: 'View on Amazon',
click: () => shell.openExternal(`https://amazon.com/dp/${context.asin}`),
}));
menu.append(new MenuItem({ type: 'separator' }));
menu.append(new MenuItem({
label: 'Copy ASIN',
click: () => clipboard.writeText(context.asin),
}));
}
if (context.type === 'price') {
menu.append(new MenuItem({ label: 'Copy price', click: () => clipboard.writeText(context.value) }));
menu.append(new MenuItem({ label: 'Ask coach about this price', click: () => event.sender.send('coach:ask', `Why is this price ${context.value}?`) }));
}
menu.popup({ window: BrowserWindow.fromWebContents(event.sender)! });
});
Tray Menu + Badge Counts
// Update tray badge when alerts arrive
function updateTrayBadge(count: number) {
if (process.platform === 'darwin') {
app.dock.setBadge(count > 0 ? String(count) : '');
}
tray.setToolTip(`Shopilot — ${count > 0 ? `${count} alerts` : 'All clear'}`);
}
// Tray context menu
const trayMenu = Menu.buildFromTemplate([
{ label: 'Open Shopilot', click: () => mainWindow.show() },
{ label: 'Pause Coach', type: 'checkbox', checked: false,
click: (item) => mainWindow.webContents.send('coach:pause', item.checked) },
{ type: 'separator' },
{ label: 'Check for Updates', click: () => autoUpdater.checkForUpdatesAndNotify() },
{ label: 'Quit', click: () => app.quit() },
]);
PARTIAL 09 · E2E Testing Framework 09 · Framework de Pruebas E2E
INCOMPLETE SPECUnit tests and component tests are implied but no E2E testing framework is explicitly specified. For an Electron app making real API calls and marketplace mutations, E2E tests are non-negotiable before beta. Las pruebas unitarias están implícitas pero no se especifica framework de E2E. Para una app Electron que hace mutaciones reales en marketplaces, las pruebas E2E son no-negociables antes del beta.
Testing Pyramid for Shopilot
Critical E2E Test Cases (must pass before beta)
| Test Case | Why critical | Mode |
|---|---|---|
| App launches, shows demo mode, chat accepts input | Smoke test — must always pass | Demo |
| Connect Amazon account via OAuth → tokens stored in safeStorage | Auth is the first real action | Sandbox |
| Send message → tool executes → confirmation appears → user approves → audit log written | Core happy path | Mock API |
| Approve irreversible action → confirm with typed text → action recorded → rollback available | Trust-critical flow | Mock API |
| Credits hit 0 → coach blocks → credit exhausted modal shows → upgrade flow opens | Revenue-critical guard | Mock API |
| App restart → chat history loads from IndexedDB → last session visible | Persistence correctness | Mock API |
| Update available → modal shows → user clicks Restart → app re-opens at same state | Update UX must not lose work | Mocked updater |
Playwright + Electron Setup (click to expand)
// e2e/setup.ts
import { _electron as electron } from 'playwright';
import { test, expect } from '@playwright/test';
let electronApp: ElectronApplication;
test.beforeAll(async () => {
electronApp = await electron.launch({
args: ['dist/main/index.js'],
env: {
...process.env,
NODE_ENV: 'test',
SHOPILOT_DEMO_MODE: 'true', // use fixture data
},
});
});
test.afterAll(async () => {
await electronApp.close();
});
// Example test: coach chat flow
test('coach responds to ASIN query', async () => {
const window = await electronApp.firstWindow();
await window.fill('[data-testid="chat-input"]', 'What is happening with B08N5WRWNW?');
await window.press('[data-testid="chat-input"]', 'Enter');
await expect(window.locator('[data-testid="coach-response"]')).toBeVisible({ timeout: 10000 });
await expect(window.locator('[data-testid="tool-accordion"]')).toBeVisible();
});
10 · Production Readiness Checklist 10 · Checklist de Listo para Producción
GATE CRITERIAThese gates must pass before each release milestone. No gate can be manually overridden without written sign-off from CEO + CTO. Estos gates deben pasar antes de cada milestone de release. Ningún gate puede omitirse sin aprobación escrita del CEO + CTO.
GATE 1 — Private Beta (before any external user)
| ✓ | Requirement | Owner |
|---|---|---|
| ☐ | All 7 E2E test cases pass on macOS 14 + macOS 15 | Sergio |
| ☐ | Code signing + notarization working (Apple Developer cert) | Mateo |
| ☐ | Sentry DSN configured, PII filter verified, test event sent | Andrés |
| ☐ | Chat persistence: sessions survive app restart | Sergio |
| ☐ | Single-instance lock prevents duplicate window | Mateo |
| ☐ | Demo mode works without any marketplace credentials | Sergio |
| ☐ | Update notification modal tested with mock version bump | Mateo |
| ☐ | Privacy policy published at shopilot.ai/privacy | Pablo |
GATE 2 — Public Beta (before paid users)
| ✓ | Requirement | Owner |
|---|---|---|
| ☐ | GDPR data export (ZIP) working for all users | Andrés |
| ☐ | Account deletion flow tested end-to-end | Andrés |
| ☐ | In-app support (Crisp) embedded and tested | Sergio |
| ☐ | Multi-account: 2+ accounts with correct context isolation | Sergio |
| ☐ | Deep link protocol (shopilot://) working for OAuth callback | Mateo |
| ☐ | Tray menu + badge count for unread alerts | Sergio |
| ☐ | Right-click context menus on ASIN rows and prices | Sergio |
| ☐ | Terms of Service published + accepted on first launch | Pablo |
| ☐ | Stripe webhooks tested for subscription lifecycle | Andrés |
GATE 3 — v1.0 GA
| ✓ | Requirement | Owner |
|---|---|---|
| ☐ | Figma Atomic Design library complete (atoms + molecules + organisms) | External Design Team |
| ☐ | Figma MCP integration working (Claude reads components directly) | Mateo |
| ☐ | WCAG AA audit passing (axe-playwright on all screens) | Sergio |
| ☐ | Performance: cold start < 3s on 2019 MBP (8GB RAM) | Mateo |
| ☐ | Windows 11 build passing (secondary target) | Mateo |
| ☐ | SOC 2 Type I audit initiated (required for enterprise) | Pablo |
The 80/20 Rule for Production Readiness La Regla 80/20 para Estar Listo para Producción
80% of production incidents come from 20% of neglected areas: auth edge cases, update failures, data loss on crash, and silent API errors. This section addresses all four. Ship Gate 1 within the first 3 weeks of dev, Gate 2 before any paid user, and Gate 3 before any press mention. El 80% de los incidentes en producción vienen del 20% de áreas descuidadas: edge cases de auth, fallos de actualización, pérdida de datos en crash, y errores silenciosos de API.